Python Regular Expressions (RegEx) Tutorial

Learn how to use Python's built-in re module for working with Regular Expressions (RegEx). Discover how to define and search for patterns in strings with this comprehensive guide.



Python RegEx

A Regular Expression, or RegEx, is a sequence of characters that defines a search pattern. It is commonly used to check if a string contains a specific pattern.

RegEx Module

Python includes a built-in package called re, which you can use to work with Regular Expressions.

Importing the re Module

First, you need to import the re module:

Syntax
import re

Using RegEx in Python

Once you've imported the re module, you can start using regular expressions. Here’s an example:

Example

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
        

RegEx Functions

The re module provides a set of functions to search a string for a match:

  • findall: Returns a list of all matches
  • search: Returns a Match object if there's a match
  • split: Splits the string at each match and returns a list
  • sub: Replaces one or many matches with a string

Metacharacters

Metacharacters are characters with special meanings in RegEx:

Character Description Example
[] A set of characters "[a-m]"
\ Signals a special sequence (or escapes special characters) "\d"
. Any character (except newline) "he..o"
^ Starts with "^hello"
$ Ends with "planet$"
* Zero or more occurrences "he.*o"
+ One or more occurrences "he.+o"
? Zero or one occurrence "he.?o"
{} Exactly the specified number of occurrences "he.{2}o"
| Either or "falls|stays"
() Capture and group

Special Sequences

Special sequences are denoted by a backslash (\) followed by a character and have a special meaning:

Character Description Example
\A Matches if specified characters are at the beginning of the string "\AThe"
\b Matches if specified characters are at the beginning or end of a word r"\bain", r"ain\b"
\B Matches if specified characters are not at the beginning or end of a word r"\Bain", r"ain\B"
\d Matches any digit (0-9) "\d"
\D Matches any non-digit "\D"
\s Matches any whitespace character "\s"
\S Matches any non-whitespace character "\S"
\w Matches any word character (letters, digits, and underscore) "\w"
\W Matches any non-word character "\W"
\Z Matches if specified characters are at the end of the string "Spain\Z"

Sets

A set is a group of characters inside square brackets [] with a special meaning:

Set Description
[arn] Matches any one of the specified characters (a, r, or n)
[a-n] Matches any character alphabetically between a and n
[^arn] Matches any character except a, r, and n
[0123] Matches any of the specified digits (0, 1, 2, or 3)
[0-9] Matches any digit between 0 and 9
[0-5][0-9] Matches any two-digit numbers from 00 to 59
[a-zA-Z] Matches any character between a and z (case insensitive)
[+] Matches any + character (in sets, +, *, ., |, (), $, and {} have no special meaning)

The findall() Function

The findall() function returns a list of all matches:

Example

import re

txt = "The rain in Spain"
matches = re.findall("ai", txt)
print(matches)
        
Output
['ai', 'ai']

The search() Function

The search() function searches the string for a match and returns a Match object if there's a match. Only the first occurrence is returned:

Example

import re

txt = "The rain in Spain"
match = re.search("\s", txt)

print("The first white-space character is located in position:", match.start())
        
Output
The first white-space character is located in position: 3

The split() Function

The split() function returns a list where the string has been split at each match:

Example

import re

txt = "The rain in Spain"
split_txt = re.split("\s", txt)
print(split_txt)
        
Output
['The', 'rain', 'in', 'Spain']

The sub() Function

The sub() function replaces the matches with the text of your choice:

Example

import re

txt = "The rain in Spain"
result = re.sub("\s", "9", txt)
print(result)
        
Output
The9rain9in9Spain

Match Object

A Match Object contains information about the search and the result. If there is no match, None is returned.

Example

import re

txt = "The rain in Spain"
match = re.search("ai", txt)
print(match)
        
Output

The Match object has methods and properties to extract details about the search result:

  • .span(): Returns a tuple with the start and end positions of the match
  • .string: Returns the string passed into the function
  • .group(): Returns the part of the string where there was a match
Example

import re

txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.span())
        
Output
(12, 17)
Example

import re

txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.string)
        
Output
The rain in Spain
Example

import re

txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.group())
        
Output
Spain