Discourse Structure Identification for Knowledge Extraction
ORCiD
Leili Javadpour: 0000-0003-4004-1950
Document Type
Conference Presentation
Conference Title
Industrial and Systems Engineering Research Conference (ISERC)
Location
San Juan, Puerto Rico
Conference Dates
May 18-22, 2013
Date of Presentation
5-1-2013
First Page
214
Last Page
223
Abstract
Identification of a document's discourse structure - what each part contributes to the ideas presented, such as hypothesis, support, comparison, and results - is a key precursor to improving knowledge extraction from technical documents. As yet, only a few efforts have been made at automating discourse structure identification, with limited success. The current state-of-the-art discourse parser, SPADE, is limited to parsing discourse within a single sentence. HILDA extends the parsing abilities of SPADE to the document level structure, but with a significant decrease in performance. Both are based on Rhetorical Structure Theory (RST), a widely accepted approach for analyzing discourse coherence, and which holds that coherent text can be placed into a hierarchical organization of interrelated clauses. This paper documents the first part of a study that will achieve RST-based document-level discourse parsing without sacrificing performance. It addresses the first two steps of discourse parsing: structuring and nuclearity labeling. An algorithm was developed for classifying relation existence and nuclearity that improved upon previous methods.
Recommended Citation
Guidry, J.,
Javadpour, L.,
Knapp, G. M.,
&
Guidry, J.
(2013).
Discourse Structure Identification for Knowledge Extraction.
Paper presented at Industrial and Systems Engineering Research Conference (ISERC) in San Juan, Puerto Rico.
https://scholarlycommons.pacific.edu/esob-facpres/388