Extracting Information from Business Documents using Linguistic and Rule-Based System

ORCiD

Leili Javadpour: 0000-0003-4004-1950

Document Type

Conference Presentation

Conference Title

Industrial and Systems Engineering Research Conference (ISERC)

Location

Orlando, FL

Conference Dates

May 19-23, 2012

Date of Presentation

5-1-2012

First Page

2626

Last Page

2631

Abstract

Large amounts of business and engineering knowledge is located in financial and project management reports, business case analysis reports, standard operating procedures, employee hand book, and other types of technical documents which are located outside traditional databases and therefore not easily accessible to database query and mining techniques. There is a growing need for information technologies to extract knowledge from these unstructured data sources. In this study, a corpus of technical documents has been compiled. New algorithms have been developed for automatically extracting domain knowledge from the corpus of technical reports. New methods have been developed for text processing, business rule and taxonomic data extraction from corpus of such reports. For process extraction, rhetorical structure analysis has been used and for concept validation, Wordnet based word sense disambiguation has been used.

Share

COinS