Python Regular Expressions (RegEx) Tutorial
Learn how to use Python's built-in re
module for working with Regular Expressions (RegEx). Discover how to define and search for patterns in strings with this comprehensive guide.
Python RegEx
A Regular Expression, or RegEx, is a sequence of characters that defines a search pattern. It is commonly used to check if a string contains a specific pattern.
RegEx Module
Python includes a built-in package called re
, which you can use to work with Regular Expressions.
Importing the re Module
First, you need to import the re
module:
Syntax
import re
Using RegEx in Python
Once you've imported the re
module, you can start using regular expressions. Here’s an example:
Example
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
RegEx Functions
The re
module provides a set of functions to search a string for a match:
findall
: Returns a list of all matchessearch
: Returns a Match object if there's a matchsplit
: Splits the string at each match and returns a listsub
: Replaces one or many matches with a string
Metacharacters
Metacharacters are characters with special meanings in RegEx:
Character | Description | Example |
---|---|---|
[] | A set of characters | "[a-m]" |
\ | Signals a special sequence (or escapes special characters) | "\d" |
. | Any character (except newline) | "he..o" |
^ | Starts with | "^hello" |
$ | Ends with | "planet$" |
* | Zero or more occurrences | "he.*o" |
+ | One or more occurrences | "he.+o" |
? | Zero or one occurrence | "he.?o" |
{} | Exactly the specified number of occurrences | "he.{2}o" |
| | Either or | "falls|stays" |
() | Capture and group |
Special Sequences
Special sequences are denoted by a backslash (\) followed by a character and have a special meaning:
Character | Description | Example |
---|---|---|
\A | Matches if specified characters are at the beginning of the string | "\AThe" |
\b | Matches if specified characters are at the beginning or end of a word | r"\bain", r"ain\b" |
\B | Matches if specified characters are not at the beginning or end of a word | r"\Bain", r"ain\B" |
\d | Matches any digit (0-9) | "\d" |
\D | Matches any non-digit | "\D" |
\s | Matches any whitespace character | "\s" |
\S | Matches any non-whitespace character | "\S" |
\w | Matches any word character (letters, digits, and underscore) | "\w" |
\W | Matches any non-word character | "\W" |
\Z | Matches if specified characters are at the end of the string | "Spain\Z" |
Sets
A set is a group of characters inside square brackets [] with a special meaning:
Set | Description |
---|---|
[arn] | Matches any one of the specified characters (a, r, or n) |
[a-n] | Matches any character alphabetically between a and n |
[^arn] | Matches any character except a, r, and n |
[0123] | Matches any of the specified digits (0, 1, 2, or 3) |
[0-9] | Matches any digit between 0 and 9 |
[0-5][0-9] | Matches any two-digit numbers from 00 to 59 |
[a-zA-Z] | Matches any character between a and z (case insensitive) |
[+] | Matches any + character (in sets, +, *, ., |, (), $, and {} have no special meaning) |
The findall() Function
The findall()
function returns a list of all matches:
Example
import re
txt = "The rain in Spain"
matches = re.findall("ai", txt)
print(matches)
Output
['ai', 'ai']
The search() Function
The search()
function searches the string for a match and returns a Match object if there's a match. Only the first occurrence is returned:
Example
import re
txt = "The rain in Spain"
match = re.search("\s", txt)
print("The first white-space character is located in position:", match.start())
Output
The first white-space character is located in position: 3
The split() Function
The split()
function returns a list where the string has been split at each match:
Example
import re
txt = "The rain in Spain"
split_txt = re.split("\s", txt)
print(split_txt)
Output
['The', 'rain', 'in', 'Spain']
The sub() Function
The sub()
function replaces the matches with the text of your choice:
Example
import re
txt = "The rain in Spain"
result = re.sub("\s", "9", txt)
print(result)
Output
The9rain9in9Spain
Match Object
A Match Object contains information about the search and the result. If there is no match, None
is returned.
Example
import re
txt = "The rain in Spain"
match = re.search("ai", txt)
print(match)
Output
The Match object has methods and properties to extract details about the search result:
.span()
: Returns a tuple with the start and end positions of the match.string
: Returns the string passed into the function.group()
: Returns the part of the string where there was a match
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.span())
Output
(12, 17)
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.string)
Output
The rain in Spain
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.group())
Output
Spain