Natural Language Processing (NLP) - Inception

Learn about the inception of natural language processing, natural language grammar, components of language, grammatical categories, and spoken language syntax.



Natural Language Processing - Inception

In this chapter, we'll explore the inception of Natural Language Processing (NLP). First, let's understand the basics of Natural Language Grammar.

Understanding Natural Language Grammar

In linguistics, language is considered a collection of arbitrary vocal signs. It is creative, rule-governed, innate, universal, and distinctly human. However, the nature of language varies among individuals, leading to common misconceptions. This makes understanding the term ‘grammar’ crucial. In linguistics, grammar refers to the rules or principles by which language operates. Broadly, grammar can be categorized into two types:

1. Descriptive Grammar

Descriptive grammar outlines the set of rules used by linguists and grammarians to describe the grammar that speakers naturally use.

2. Prescriptive Grammar

Prescriptive grammar sets the standards of correctness in a language, often diverging from how the language is actually used in everyday communication.

Key Components of Language

Language study is divided into several interrelated components, each focusing on a specific aspect of linguistic investigation:

Phonology

Phonology is the study of speech sounds in a language. Originating from the Greek word ‘phone’ (meaning sound or voice), it includes phonetics, which examines speech sounds based on their production, perception, and physical properties. The International Phonetic Alphabet (IPA) represents human sounds consistently, with each symbol corresponding to a unique speech sound.

Phonemes

Phonemes are the basic sound units that distinguish one word from another in a language. For example, the phoneme /k/ appears in words like "cat" and "kit."

Morphology

Morphology is the study of the structure and classification of words. From the Greek word ‘morphe’ (meaning form), it examines how words are formed, including prefixes, suffixes, and roots, and how words are grouped into parts of speech.

Lexeme

In linguistics, a lexeme is an abstract unit representing a set of forms of a single word. Lexemes can be single words or multiword expressions, such as "run" (with forms like runs, ran, running) or "give up" (a multiword lexeme).

Syntax

Syntax studies the order and arrangement of words into larger structures like sentences, clauses, and phrases. The term originates from the Greek word ‘suntassein,’ meaning ‘to arrange.’

Semantics

Semantics is the study of meaning in language, examining how meanings relate to both the external world and the grammar of sentences. It derives from the Greek word ‘semainein,’ which means ‘to signify.’

Pragmatics

Pragmatics focuses on the use of language in context, studying the functions of language in different situations. The term comes from the Greek word ‘pragma,’ meaning ‘deed’ or ‘affair.’

Grammatical Categories in Language

Grammatical categories are classes of linguistic units or features that share common characteristics. They form the foundational elements of language:

Number

This category includes singular (one) and plural (more than one) forms, such as "cat/cats" and "this/these."

Gender

Gender in language is expressed through variations in pronouns, like "he," "she," and "it," as well as in first and second-person forms ("I," "we," "you").

Person

Person identifies the speaker (1st person), the listener (2nd person), and others being spoken about (3rd person).

Case

Case indicates the function of noun phrases in sentences, such as subject (nominative), possession (genitive), or object (objective).

Degree

Degree relates to adjectives and adverbs, describing qualities (positive), comparisons (comparative), and extremes (superlative).

Definiteness and Indefiniteness

Definiteness marks known or familiar referents with "the," while indefiniteness marks unknown or unfamiliar referents with "a/an."

Tense

Tense indicates the time of an action relative to the moment of speaking. It includes present (e.g., "She writes"), past (e.g., "She wrote"), and future (e.g., "She will write") tenses.

Aspect

Aspect describes the view of an event as complete (perfective) or ongoing (imperfective), like "She has written" versus "She is writing."

Mood

Mood conveys the speaker's attitude or intent, such as indicative (statements), imperative (commands), or subjunctive (wishes or hypotheticals).

Agreement

Also known as concord, agreement adjusts words based on their relationship to others, aligning in person, number, gender, or case.

Spoken Language Syntax

Spoken and written English share many grammar features, but they also have notable differences:

Disfluencies and Repair

Spoken language often includes disfluencies like filler words ("uh," "um") and repairs where speakers correct themselves mid-sentence.

Restarts

Restarts occur when speakers pause, use filler words, and then restart their sentence, such as changing "uh, one-way flights" to "one-way fares."

Word Fragments

Spoken language may include word fragments, such as "w-what time is it?" where the speaker starts to say "what" but stutters.

Explore our latest online courses and enhance your skills at your own pace. Enroll today to become a certified expert and boost your career!