Part of Speech (PoS) Tagging: Methods and Techniques

Learn about Part of Speech (PoS) Tagging, including Rule-Based, Stochastic, Transformation-Based, and HMM approaches.

Part of Speech (PoS) Tagging: Overview and Techniques

Part of Speech (PoS) Tagging is a classification task where tags or labels are assigned to words (tokens) in a text. These tags represent the part of speech, semantic information, or other characteristics of the tokens.

What is Part of Speech (PoS) Tagging?

PoS tagging refers to the process of assigning a part of speech to each word in a sentence, like noun, verb, adjective, etc. It’s essentially labeling each word based on its role in the sentence. Common parts of speech include nouns, verbs, adjectives, adverbs, pronouns, and conjunctions, along with their subcategories.

Types of PoS Tagging Methods

PoS tagging can be categorized into three main types:

Rule-Based PoS Tagging
Stochastic PoS Tagging
Transformation-Based Tagging

Rule-Based PoS Tagging

Rule-based PoS tagging is one of the earliest methods. It uses a dictionary or lexicon to suggest possible tags for each word. If a word has multiple possible tags, handwritten rules are used to select the most appropriate one. Disambiguation is performed by analyzing linguistic features of a word and its context (preceding and following words). For example, if a word follows an article, it’s likely a noun.

Architecture of Rule-Based PoS Tagging

First Stage: Uses a dictionary to assign potential parts of speech to each word.
Second Stage: Applies rules to narrow down the potential tags to a single, correct part of speech.

Properties of Rule-Based PoS Tagging

Knowledge-driven and rule-based.
Relies on manually created rules.
Uses around 1,000 rules for accurate tagging.
Explicit smoothing and language modeling.

Stochastic PoS Tagging

Stochastic PoS tagging uses probability and frequency data to tag words. It’s called stochastic because it applies statistical models. Two common approaches are:

1. Word Frequency Approach

This method assigns tags based on the probability of a word appearing with a particular tag, using frequency data from a training set. However, this approach can sometimes produce invalid tag sequences.

2. Tag Sequence Probabilities (N-gram Approach)

Here, the tagger calculates the probability of sequences of tags occurring. For example, in a bigram model, the tag of a word depends on the tag of the previous word.

Properties of Stochastic PoS Tagging

Based on the probability of tags.
Requires a training corpus.
Fails for words not in the corpus.
Tests with different data sets.
Assigns the most frequent tag from the training data.

Transformation-Based Tagging (Brill Tagging)

Transformation-based tagging, also known as Brill tagging, combines elements of rule-based and stochastic approaches. It uses rules that are automatically generated from the data (machine learning) and human-generated rules.

How Transformation-Based Learning (TBL) Works

Start with an Initial Solution: Begins with an initial tagging solution.
Apply Beneficial Transformations: Iteratively applies the most beneficial rule-based transformations to improve tagging.
Stop When No Further Improvements: Continues until no further improvements are possible.

Advantages of Transformation-Based Learning

Simple rules are sufficient for accurate tagging.
Easy development and debugging due to understandable rules.
Combines machine-learned and human-generated rules to reduce complexity.
Faster than Markov-model taggers.

Disadvantages of Transformation-Based Learning

Does not provide probabilities for tags.
Long training times, especially with large datasets.

Hidden Markov Model (HMM) PoS Tagging

Understanding Hidden Markov Model (HMM)

An HMM is a doubly-embedded stochastic model where the underlying process is hidden and can only be observed through another stochastic process. For instance, in a series of hidden coin tosses, you observe only the sequence of heads and tails, not the underlying details such as which coin was used.

An HMM for PoS tagging models the process where tags are hidden states generating the observable words. In this model, tags (hidden states) produce the words (observations), and we aim to find the most probable sequence of tags for a given sequence of words.

Using HMM for PoS Tagging

To use HMM for PoS tagging, we seek the sequence of tags that maximizes the probability of generating the given sequence of words. This involves using Bayes’ rule and assumptions about the independence of tags and words to simplify calculations.

Using a large tagged corpus, the probabilities of tags can be calculated as:

PROB (Ci=VERB|Ci-1=NOUN) = (# of times Verb follows Noun) / (# of times Noun appears)
PROB (Wi|Ci) = (# of times word Wi appears with tag Ci) / (# of times tag Ci appears)

These probabilities help determine the most likely tags for the words in a sentence.

Explore our latest online courses and become a certified expert to enhance your career. Learn at your own pace with our user-friendly and interactive platform.

Follow On

TutorialsArena