"Extracting Information from Business Documents using Linguistic and Ru" by Avimanyu Halder, Leili Javadpour et al.

Extracting Information from Business Documents using Linguistic and Rule-Based System


Leili Javadpour: 0000-0003-4004-1950

Document Type

Conference Presentation

Conference Title

Industrial and Systems Engineering Research Conference (ISERC)


Orlando, FL

Conference Dates

May 19-23, 2012

Date of Presentation


First Page


Last Page



Large amounts of business and engineering knowledge is located in financial and project management reports, business case analysis reports, standard operating procedures, employee hand book, and other types of technical documents which are located outside traditional databases and therefore not easily accessible to database query and mining techniques. There is a growing need for information technologies to extract knowledge from these unstructured data sources. In this study, a corpus of technical documents has been compiled. New algorithms have been developed for automatically extracting domain knowledge from the corpus of technical reports. New methods have been developed for text processing, business rule and taxonomic data extraction from corpus of such reports. For process extraction, rhetorical structure analysis has been used and for concept validation, Wordnet based word sense disambiguation has been used.
