Discourse Structure Identification for Knowledge Extraction

ORCiD

Leili Javadpour: 0000-0003-4004-1950

Document Type

Conference Presentation

Conference Title

Industrial and Systems Engineering Research Conference (ISERC)

Location

San Juan, Puerto Rico

Conference Dates

May 18-22, 2013

Date of Presentation

5-1-2013

First Page

214

Last Page

223

Abstract

Identification of a document's discourse structure - what each part contributes to the ideas presented, such as hypothesis, support, comparison, and results - is a key precursor to improving knowledge extraction from technical documents. As yet, only a few efforts have been made at automating discourse structure identification, with limited success. The current state-of-the-art discourse parser, SPADE, is limited to parsing discourse within a single sentence. HILDA extends the parsing abilities of SPADE to the document level structure, but with a significant decrease in performance. Both are based on Rhetorical Structure Theory (RST), a widely accepted approach for analyzing discourse coherence, and which holds that coherent text can be placed into a hierarchical organization of interrelated clauses. This paper documents the first part of a study that will achieve RST-based document-level discourse parsing without sacrificing performance. It addresses the first two steps of discourse parsing: structuring and nuclearity labeling. An algorithm was developed for classifying relation existence and nuclearity that improved upon previous methods.

Share

COinS