How regex work in python in detail
Regular expressions (regex):
Regular expressions (regex) are a powerful tool for matching patterns in text. In Python, the re
module provides support for working with regular expressions. Below is a detailed explanation of how regex works in Python:
1. Regex Patterns
Regex patterns are written as raw strings (r''
) to avoid issues with escape characters.
a) Basic Patterns
.
: Matches any character except a newline.
\d
: Matches any digit (0-9).
\D
: Matches any non-digit.
\w
: Matches any word character (a-z, A-Z, 0-9, _).
\W
: Matches any non-word character.
\s
: Matches any whitespace character (space, tab, newline).
\S
: Matches any non-whitespace character.
b) Quantifiers
*
: Matches 0 or more occurrences.
+
: Matches 1 or more occurrences.
?
: Matches 0 or 1 occurrence.
{n}
: Matches exactly n
occurrences.
{n,}
: Matches n
or more occurrences.
{n,m}
: Matches between n
and m
occurrences.
c) Anchors
^
: Matches the start of the string.
$
: Matches the end of the string.
d) Character Classes
[abc]
: Matches any one of the characters a
, b
, or c
.
[^abc]
: Matches any character except a
, b
, or c
.
[a-z]
: Matches any character in the range a
to z
.
e) Groups and Capturing
()
: Groups part of the pattern and captures it.
(?:...)
: Groups without capturing.
f) Alternation
|
: Matches either the pattern before or after the |
.
4. Match Objects
When a match is found, a match object is returned. It has several useful methods:group()
: Returns the matched string.start()
: Returns the starting position of the match.end()
: Returns the ending position of the match.span()
: Returns a tuple (start, end)
.
2. Importing the re
Module
To use regular expressions in Python, you need to import the re
module:
import re
3. Basic Regex Functions
The re
module provides several functions to work with regex:
a) re.match()
Checks for a match only at the beginning of the string.
Returns a match object if successful, otherwise None
.
Output:
b) re.search()
Searches for a match anywhere in the string.
Returns a match object if successful, otherwise None
.
c) re.findall()
Finds all occurrences of the pattern in the string.
Returns a list of all matches.
Output:
d) re.finditer()
Similar to re.findall()
, but returns an iterator of match objects.
e) re.sub()
Replaces occurrences of the pattern with a specified string.
f) re.split()
Splits the string by the occurrences of the pattern.
5. Flags
Flags modify the behavior of regex functions. Common flags include:
re.IGNORECASE
(re.I
) : Case-insensitive matching.
re.MULTILINE
(re.M
) : Allows ^
and $
to match the start/end of each line.
re.DOTALL
(re.S
) : Allows .
to match newline characters.
Post a Comment