How regex work in python in detail
Regular expressions (regex):
Regular expressions (regex) are a powerful tool for matching patterns in text. In Python, the re module provides support for working with regular expressions. Below is a detailed explanation of how regex works in Python:
1. Regex Patterns
Regex patterns are written as raw strings (r'') to avoid issues with escape characters.
a) Basic Patterns
. : Matches any character except a newline.
\d : Matches any digit (0-9).
\D : Matches any non-digit.
\w : Matches any word character (a-z, A-Z, 0-9, _).
\W : Matches any non-word character.
\s : Matches any whitespace character (space, tab, newline).
\S : Matches any non-whitespace character.
b) Quantifiers
* : Matches 0 or more occurrences.
+ : Matches 1 or more occurrences.
? : Matches 0 or 1 occurrence.
{n} : Matches exactly n occurrences.
{n,} : Matches n or more occurrences.
{n,m} : Matches between n and m occurrences.
c) Anchors
^ : Matches the start of the string.
$ : Matches the end of the string.
d) Character Classes
[abc] : Matches any one of the characters a, b, or c.
[^abc] : Matches any character except a, b, or c.
[a-z] : Matches any character in the range a to z.
e) Groups and Capturing
() : Groups part of the pattern and captures it.
(?:...) : Groups without capturing.
f) Alternation
| : Matches either the pattern before or after the |.
4. Match Objects
When a match is found, a match object is returned. It has several useful methods:group() : Returns the matched string.start() : Returns the starting position of the match.end() : Returns the ending position of the match.span() : Returns a tuple (start, end).
2. Importing the re Module
To use regular expressions in Python, you need to import the re module:
import re
3. Basic Regex Functions
The re module provides several functions to work with regex:
a) re.match()
Checks for a match only at the beginning of the string.
Returns a match object if successful, otherwise None.
Output:
b) re.search()
Searches for a match anywhere in the string.
Returns a match object if successful, otherwise None.
c) re.findall()
Finds all occurrences of the pattern in the string.
Returns a list of all matches.
Output:
d) re.finditer()
Similar to re.findall(), but returns an iterator of match objects.
e) re.sub()
Replaces occurrences of the pattern with a specified string.
f) re.split()
Splits the string by the occurrences of the pattern.
5. Flags
Flags modify the behavior of regex functions. Common flags include:
re.IGNORECASE (re.I) : Case-insensitive matching.
re.MULTILINE (re.M) : Allows ^ and $ to match the start/end of each line.
re.DOTALL (re.S) : Allows . to match newline characters.
Post a Comment