Natural Language Processing - Introduction

Introduction to Natural Language Processing (NLP), its history, phases, and key concepts explained in a simple and detailed manner.



>

Understanding Natural Language Processing (NLP)

Language is a powerful method of communication that allows us to speak, read, and write. For instance, we think, make decisions, and plan in our natural language, which consists of words and sentences. But in the current era of Artificial Intelligence (AI), a big question arises: can we communicate with computers in the same way we communicate with each other? In other words, can human beings interact with computers using natural language?

This is a significant challenge because computers require structured data to process information, while human language is often unstructured and can be ambiguous. To bridge this gap, Natural Language Processing (NLP) has emerged as a specialized field within computer science, particularly in AI, that focuses on enabling computers to understand and process human language. In essence, NLP involves programming computers to analyze and process large volumes of natural language data.

History of Natural Language Processing (NLP)

The history of NLP can be divided into four distinct phases, each with its own focus and developments:

First Phase (Machine Translation Phase) - Late 1940s to Late 1960s

This phase primarily focused on machine translation (MT). It was a period filled with enthusiasm and optimism. Here are some key events:

  • In the early 1950s, research in NLP began following Booth & Richens’ investigation and Weaver’s memorandum on machine translation in 1949.
  • In 1954, a limited experiment on automatic translation from Russian to English was demonstrated in the Georgetown-IBM experiment.
  • The journal Machine Translation (MT) was first published in 1954.
  • The first international conference on Machine Translation was held in 1952, followed by the second in 1956.
  • In 1961, the work presented at the Teddington International Conference on Machine Translation of Languages and Applied Language Analysis marked the high point of this phase.

Second Phase (AI Influenced Phase) – Late 1960s to Late 1970s

This phase, also known as the AI-flavored phase, focused on understanding world knowledge and its role in constructing and manipulating meaning representations:

  • In 1961, research began on addressing and constructing data or knowledge bases, influenced by AI.
  • In the same year, the BASEBALL question-answering system was developed, with restricted input and simple language processing.
  • A more advanced system described by Minsky in 1968 recognized the need for inference on the knowledge base to interpret and respond to language input.

Third Phase (Grammatico-logical Phase) – Late 1970s to Late 1980s

Known as the grammatico-logical phase, this period saw a shift towards using logic for knowledge representation and reasoning in AI due to the challenges faced in building practical systems in the previous phase:

  • The grammatico-logical approach led to the development of powerful general-purpose sentence processors, such as SRI’s Core Language Engine and Discourse Representation Theory, which helped tackle more extended discourse.
  • Practical resources and tools, like parsers (e.g., Alvey Natural Language Tools), and more operational and commercial systems (e.g., for database queries) emerged during this phase.
  • Research on the lexicon in the 1980s also pointed towards the grammatico-logical approach.

Fourth Phase (Lexical & Corpus Phase) – The 1990s

This phase is characterized by a lexicalized approach to grammar that began in the late 1980s and became increasingly influential:

  • This period saw a revolution in NLP with the introduction of machine learning algorithms for language processing, which greatly enhanced the capability of NLP systems.

The Study of Human Languages

Language is a crucial component of human life and one of the most fundamental aspects of our behavior. It can be experienced in two main forms:

  • Written Form: This form of language allows us to pass knowledge from one generation to the next.
  • Spoken Form: This is the primary medium through which human beings communicate and coordinate with each other in their daily lives.

Language is studied across various academic disciplines, each with its own set of problems and solutions. The table below summarizes these disciplines, the problems they address, and the tools they use:

Discipline Problems Tools Linguists
  • How are phrases and sentences formed with words?
  • What limits the possible meaning of a sentence?
  • Intuitions about well-formedness and meaning
  • Mathematical models of structure (e.g., model-theoretic semantics, formal language theory)
Psycholinguists
  • How do human beings identify the structure of sentences?
  • How is the meaning of words identified?
  • When does understanding take place?
  • Experimental techniques for measuring human performance
  • Statistical analysis of observations
Philosophers
  • How do words and sentences acquire meaning?
  • How are objects identified by words?
  • What is the nature of meaning?
  • Natural language argumentation using intuition
  • Mathematical models like logic and model theory
Computational Linguists
  • How can the structure of a sentence be identified?
  • How can knowledge and reasoning be modeled?
  • How can we use language to accomplish specific tasks?
  • Algorithms
  • Data structures
  • Formal models of representation and reasoning
  • AI techniques like search and representation methods

Explore our latest online courses and learn new skills at your own pace. Enroll now and become a certified expert to boost your career!

Understanding Ambiguity and Uncertainty in Language

Ambiguity in natural language processing refers to the ability of language to be understood in more than one way. In simpler terms, ambiguity is when a word, phrase, or sentence can have multiple interpretations. Natural language is highly ambiguous, and NLP deals with several types of ambiguities:

Lexical Ambiguity

This occurs when a single word can have different meanings. For example, the word "bat" can refer to a flying mammal or a piece of equipment used in sports.

Syntactic Ambiguity

This type of ambiguity happens when a sentence can be parsed in different ways. For instance, "The man saw the girl with the telescope" can mean either the man saw the girl who had a telescope or he saw her through his telescope.

Semantic Ambiguity

Semantic ambiguity arises when the meaning of the words themselves can be misinterpreted. For example, the sentence "The car hit the pole while it was moving" can mean either the car was moving or the pole was moving.

Anaphoric Ambiguity

This occurs due to the use of anaphoric references (e.g., pronouns) in discourse. For example, in "The horse ran up the hill. It was very steep. It soon got tired," the pronoun "it" could refer to either the hill or the horse, leading to ambiguity.

Pragmatic Ambiguity

Pragmatic ambiguity occurs when the context of a phrase gives it multiple interpretations. For example, the sentence "I like you too" could mean "I like you, just as you like me" or "I like you, just as I like someone else."

NLP Phases

The following diagram illustrates the logical steps or phases in natural language processing:

Morphological Processing

This is the first phase of NLP. The goal is to break chunks of language input into sets of tokens corresponding to paragraphs, sentences, and words. For instance, a word like "uneasy" can be split into "un" and "easy".

Syntax Analysis

The second phase involves checking if a sentence is well-formed and breaking it into a structure that shows the syntactic relationships between different words. For example, the sentence "The school goes to the boy" would be rejected by a syntax analyzer or parser.

Semantic Analysis

In the third phase, the exact meaning (or dictionary meaning) is extracted from the text. For example, a semantic analyzer would reject a sentence like "Hot ice cream" as it is not meaningful.

Pragmatic Analysis

The fourth phase involves fitting the actual objects or events in a given context with the object references obtained during semantic analysis. For example, the sentence "Put the banana in the basket on the shelf" can have two interpretations, and the pragmatic analyzer will choose the most contextually appropriate one.