Natural Language Processing (NLP) - Inception
Learn about the inception of natural language processing, natural language grammar, components of language, grammatical categories, and spoken language syntax.
Natural Language Processing - Inception
In this chapter, we'll explore the inception of Natural Language Processing (NLP). First, let's understand the basics of Natural Language Grammar.
Understanding Natural Language Grammar
In linguistics, language is considered a collection of arbitrary vocal signs. It is creative, rule-governed, innate, universal, and distinctly human. However, the nature of language varies among individuals, leading to common misconceptions. This makes understanding the term ‘grammar’ crucial. In linguistics, grammar refers to the rules or principles by which language operates. Broadly, grammar can be categorized into two types:
1. Descriptive Grammar
Descriptive grammar outlines the set of rules used by linguists and grammarians to describe the grammar that speakers naturally use.
2. Prescriptive Grammar
Prescriptive grammar sets the standards of correctness in a language, often diverging from how the language is actually used in everyday communication.
Key Components of Language
Language study is divided into several interrelated components, each focusing on a specific aspect of linguistic investigation:
Phonology
Phonology is the study of speech sounds in a language. Originating from the Greek word ‘phone’ (meaning sound or voice), it includes phonetics, which examines speech sounds based on their production, perception, and physical properties. The International Phonetic Alphabet (IPA) represents human sounds consistently, with each symbol corresponding to a unique speech sound.
Phonemes
Phonemes are the basic sound units that distinguish one word from another in a language. For example, the phoneme /k/ appears in words like "cat" and "kit."
Morphology
Morphology is the study of the structure and classification of words. From the Greek word ‘morphe’ (meaning form), it examines how words are formed, including prefixes, suffixes, and roots, and how words are grouped into parts of speech.
Lexeme
In linguistics, a lexeme is an abstract unit representing a set of forms of a single word. Lexemes can be single words or multiword expressions, such as "run" (with forms like runs, ran, running) or "give up" (a multiword lexeme).
Syntax
Syntax studies the order and arrangement of words into larger structures like sentences, clauses, and phrases. The term originates from the Greek word ‘suntassein,’ meaning ‘to arrange.’
Semantics
Semantics is the study of meaning in language, examining how meanings relate to both the external world and the grammar of sentences. It derives from the Greek word ‘semainein,’ which means ‘to signify.’
Pragmatics
Pragmatics focuses on the use of language in context, studying the functions of language in different situations. The term comes from the Greek word ‘pragma,’ meaning ‘deed’ or ‘affair.’
Grammatical Categories in Language
Grammatical categories are classes of linguistic units or features that share common characteristics. They form the foundational elements of language:
Number
This category includes singular (one) and plural (more than one) forms, such as "cat/cats" and "this/these."
Gender
Gender in language is expressed through variations in pronouns, like "he," "she," and "it," as well as in first and second-person forms ("I," "we," "you").
Person
Person identifies the speaker (1st person), the listener (2nd person), and others being spoken about (3rd person).
Case
Case indicates the function of noun phrases in sentences, such as subject (nominative), possession (genitive), or object (objective).
Degree
Degree relates to adjectives and adverbs, describing qualities (positive), comparisons (comparative), and extremes (superlative).
Definiteness and Indefiniteness
Definiteness marks known or familiar referents with "the," while indefiniteness marks unknown or unfamiliar referents with "a/an."
Tense
Tense indicates the time of an action relative to the moment of speaking. It includes present (e.g., "She writes"), past (e.g., "She wrote"), and future (e.g., "She will write") tenses.
Aspect
Aspect describes the view of an event as complete (perfective) or ongoing (imperfective), like "She has written" versus "She is writing."
Mood
Mood conveys the speaker's attitude or intent, such as indicative (statements), imperative (commands), or subjunctive (wishes or hypotheticals).
Agreement
Also known as concord, agreement adjusts words based on their relationship to others, aligning in person, number, gender, or case.
Spoken Language Syntax
Spoken and written English share many grammar features, but they also have notable differences:
Disfluencies and Repair
Spoken language often includes disfluencies like filler words ("uh," "um") and repairs where speakers correct themselves mid-sentence.
Restarts
Restarts occur when speakers pause, use filler words, and then restart their sentence, such as changing "uh, one-way flights" to "one-way fares."
Word Fragments
Spoken language may include word fragments, such as "w-what time is it?" where the speaker starts to say "what" but stutters.
Explore our latest online courses and enhance your skills at your own pace. Enroll today to become a certified expert and boost your career!