Natural Language Discourse Processing

Learn about Natural Language Discourse Processing, Coherence, Algorithms for Discourse Segmentation, and Reference Resolution in NLP.



Natural Language Discourse Processing in NLP

Natural language processing (NLP) is a major challenge in artificial intelligence (AI), especially when it comes to understanding how sentences connect to form coherent and meaningful discourse. This process is known as discourse processing, which involves creating theories and models to explain how utterances relate to one another within a coherent group, rather than being isolated and unrelated like random movie quotes. These structured groups of sentences are called discourses.

Understanding Coherence in Discourse

Coherence is essential in discourse, and it directly affects the quality of text generated by NLP systems. To understand coherence, consider this: if we randomly collected sentences from different pages of a newspaper, would they form a coherent discourse? The answer is no, because these sentences lack meaningful connections. A coherent discourse must have:

1. Coherence Relation Between Utterances

A discourse is coherent when there are meaningful connections between its utterances. This is known as coherence relation, where some form of explanation or justification connects the utterances.

2. Relationship Between Entities

Coherence can also be achieved when the entities within the discourse share a certain kind of relationship. This type of coherence is known as entity-based coherence.

3. Discourse Structure

The structure of a discourse depends on how it is segmented. Discourse segmentation involves determining the types of structures for large discourses, which is crucial for applications like information retrieval, text summarization, and information extraction. Though challenging, it plays a vital role in making discourses understandable and coherent.

Algorithms for Discourse Segmentation

Unsupervised Discourse Segmentation

Unsupervised segmentation typically involves linear segmentation, which divides text into multi-paragraph units representing parts of the original text. These algorithms rely on cohesion, which ties textual units together through linguistic devices like synonyms.

Supervised Discourse Segmentation

Unlike unsupervised methods, supervised discourse segmentation uses labeled training data with defined segment boundaries. It relies heavily on discourse markers or cue words, which are domain-specific terms that signal the structure of the discourse.

Enhancing Text Coherence

To ensure a discourse is coherent, one must focus on coherence relations that define the possible connections between utterances. For example, here are some coherence relations proposed by Hebb:

  • Result: S0 can cause the state in S1. E.g., "Ram was caught in the fire. His skin burned."
  • Explanation: S1 can cause the state in S0. E.g., "Ram fought with Shyam’s friend. He was drunk."
  • Parallel: Both S0 and S1 make similar assertions. E.g., "Ram wanted a car. Shyam wanted money."
  • Elaboration: S0 and S1 assert the same proposition. E.g., "Ram was from Chandigarh. Shyam was from Kerala."
  • Occasion: A change of state is inferred between S0 and S1. E.g., "Ram picked up the book. He gave it to Shyam."

Building Hierarchical Discourse Structures

The coherence of a discourse can also be analyzed using a hierarchical structure of coherence relations. Consider the following passage as an example of hierarchical structure:

  1. Ram went to the bank to deposit money.
  2. He then took a train to Shyam’s cloth shop.
  3. He wanted to buy some clothes.
  4. He did not have new clothes for the party.
  5. He also wanted to talk to Shyam regarding his health.

Reference Resolution in Discourse

Understanding a discourse also involves reference resolution, which is identifying who or what entities are being referred to in the text. Reference is the linguistic term used to denote an entity. For example, in the passage, "Ram, the manager of ABC bank, saw his friend Shyam at a shop. He went to meet him," words like "Ram," "His," and "He" are references.

Key Terminologies in Reference Resolution

  • Referring Expression: The phrase used to make a reference, such as "Ram" in the above passage.
  • Referent: The entity being referred to, such as Ram in the example.
  • Corefer: Expressions that refer to the same entity, like "Ram" and "he."
  • Antecedent: The term that allows the use of another term, like "Ram" being the antecedent for "he."
  • Anaphora & Anaphoric: Refers to an entity previously mentioned in the discourse. The referring expression is called anaphoric.
  • Discourse Model: A model containing representations of entities and their relationships in the discourse.

Types of Referring Expressions

Different types of referring expressions include:

  • Indefinite Noun Phrases: Introduce entities new to the listener, like "some" in "Ram had gone around one day to bring him some food."
  • Definite Noun Phrases: Refer to entities that are known or identifiable, such as "The Times of India" in "I used to read The Times of India."
  • Pronouns: A form of definite reference, e.g., "he" in "Ram laughed as loud as he could."
  • Demonstratives: Words like "this" and "that" that function differently than simple pronouns.
  • Names: The simplest type, like "Ram" in the examples.

Reference Resolution Tasks

Coreference Resolution

Coreference resolution involves identifying referring expressions that point to the same entity, forming a coreference chain. For example, "He," "Chief Manager," and "His" are corefer expressions referring to the same entity.

Constraint on Coreference Resolution

In English, the pronoun "it" poses a significant challenge because it can refer to various entities or abstract concepts. For example, "It’s raining" or "It is really good" demonstrate this difficulty.

Pronominal Anaphora Resolution

Pronominal anaphora resolution focuses on finding the antecedent for a specific pronoun. For example, resolving "his" in a sentence to "Ram," who is the antecedent.

Explore our latest online courses and learn new skills at your own pace. Enroll and become a certified expert to boost your career.