How regex work in python in detail

Regular expressions (regex):

Regular expressions (regex) are a powerful tool for matching patterns in text. In Python, the re module provides support for working with regular expressions. Below is a detailed explanation of how regex works in Python:

1. Regex Patterns

Regex patterns are written as raw strings (r'') to avoid issues with escape characters.

a) Basic Patterns

`.` : Matches any character except a newline.

`\d` : Matches any digit (0-9).

`\D` : Matches any non-digit.

`\w` : Matches any word character (a-z, A-Z, 0-9, _).

`\W` : Matches any non-word character.

`\s` : Matches any whitespace character (space, tab, newline).

`\S` : Matches any non-whitespace character.

b) Quantifiers

`*` : Matches 0 or more occurrences.

`+` : Matches 1 or more occurrences.

`?` : Matches 0 or 1 occurrence.

`{n}` : Matches exactly `n` occurrences.

`{n,}` : Matches `n` or more occurrences.

`{n,m}` : Matches between `n` and `m` occurrences.

c) Anchors

`^` : Matches the start of the string.

`$` : Matches the end of the string.

d) Character Classes

`[abc]` : Matches any one of the characters `a`, `b`, or `c`.

`[^abc]` : Matches any character except `a`, `b`, or `c`.

`[a-z]` : Matches any character in the range `a` to `z`.

e) Groups and Capturing

`()` : Groups part of the pattern and captures it.

`(?:...)` : Groups without capturing.

f) Alternation

`|` : Matches either the pattern before or after the `|`.

4. Match Objects

When a match is found, a match object is returned. It has several useful methods:
group() : Returns the matched string.
start() : Returns the starting position of the match.
end() : Returns the ending position of the match.
span() : Returns a tuple (start, end).

2. Importing the `re` Module

To use regular expressions in Python, you need to import the re module:

import re

3. Basic Regex Functions

The re module provides several functions to work with regex:

a) `re.match()`

Checks for a match only at the beginning of the string.

Returns a match object if successful, otherwise `None`.

result = re.match(r'hello', 'hello world')
if result:
    print("Match found!")
else:
    print("No match.")

Output:

Match found!

b) `re.search()`

Searches for a match anywhere in the string.

Returns a match object if successful, otherwise `None`.

result = re.search(r'world', 'hello world')
if result:
    print("Match found!")
else:
    print("No match.")

Output:

Match found!

c) `re.findall()`

Finds all occurrences of the pattern in the string.

Returns a list of all matches.

result = re.findall(r'\d+', '3 apples, 5 bananas, 10 cherries')
print(result)

Output:

['3', '5', '10']

d) `re.finditer()`

Similar to `re.findall()`, but returns an iterator of match objects.

matches = re.finditer(r'\d+', '3 apples, 5 bananas, 10 cherries')
for match in matches:
    print(match.group())

Output:

3
5
10

e) `re.sub()`

Replaces occurrences of the pattern with a specified string.

result = re.sub(r'\d+', 'X', '3 apples, 5 bananas, 10 cherries')
print(result)

Output:

X apples, X bananas, X cherries

f) `re.split()`

Splits the string by the occurrences of the pattern.

result = re.split(r'\d+', '3 apples, 5 bananas, 10 cherries')
print(result)

Output:

['', ' apples, ', ' bananas, ', ' cherries']

result = re.search(r'\d+', '3 apples, 5 bananas, 10 cherries')
if result:
    print(f"Matched: {result.group()}, Start: {result.start()}, End: {result.end()}")

Output:

Matched: 3, Start: 0, End: 1

5. Flags

Flags modify the behavior of regex functions. Common flags include:

re.IGNORECASE (re.I) : Case-insensitive matching.

re.MULTILINE (re.M) : Allows ^ and $ to match the start/end of each line.

re.DOTALL (re.S) : Allows . to match newline characters.

result = re.findall(r'hello', 'Hello world', re.IGNORECASE)
print(result)

Output:

['Hello']

Example: Extracting Email Addresses

text = "Contact us at support@example.com or sales@example.org."
emails = re.findall(r'[\w\.-]+@[\w\.-]+', text)
print(emails)

Output:

['support@example.com', 'sales@example.org']

Example: Validating a Phone Number

def validate_phone(number):
    pattern = r'^\d{3}-\d{3}-\d{4}$'
    if re.match(pattern, number):
        return True
    return False

print(validate_phone('123-456-7890'))  # True
print(validate_phone('123-4567'))      # False

Example: Validating a Phone Numbers And Emails From A String.

import re

my_string = "Hello, my name is John, and you can reach me at 123-456-7890. 
My office number is (987) 654-3210. If I'm unavailable, call my assistant at +1-800-555-0199. 
We also have a support line: 800.123.4567. 
support1@example1.com Our international number is +44 20 7946 0958. 
You can also contact us at 999-888-7777 or (555) 123-4567. 
If you prefer, send a text to 123 456 7890 or email us at support@example.com."
numbers = r"\+?\d{0,3}[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"
match = re.findall(numbers, my_string)

print(match)

emails = re.findall(r'[\w\.-]+@[\w\.-]+', my_string)
print(emails)

How regex work in python in detail

Regular expressions (regex):

1. Regex Patterns

a) Basic Patterns

. : Matches any character except a newline.

\d : Matches any digit (0-9).

\D : Matches any non-digit.

\w : Matches any word character (a-z, A-Z, 0-9, _).

\W : Matches any non-word character.

\s : Matches any whitespace character (space, tab, newline).

\S : Matches any non-whitespace character.

b) Quantifiers

* : Matches 0 or more occurrences.

+ : Matches 1 or more occurrences.

? : Matches 0 or 1 occurrence.

{n} : Matches exactly n occurrences.

{n,} : Matches n or more occurrences.

{n,m} : Matches between n and m occurrences.

c) Anchors

^ : Matches the start of the string.

$ : Matches the end of the string.

d) Character Classes

[abc] : Matches any one of the characters a, b, or c.

[^abc] : Matches any character except a, b, or c.

[a-z] : Matches any character in the range a to z.

e) Groups and Capturing

() : Groups part of the pattern and captures it.

(?:...) : Groups without capturing.

f) Alternation

| : Matches either the pattern before or after the |.

4. Match Objects

2. Importing the re Module

3. Basic Regex Functions

a) re.match()

Checks for a match only at the beginning of the string.

Returns a match object if successful, otherwise None.

b) re.search()

Searches for a match anywhere in the string.

Returns a match object if successful, otherwise None.

c) re.findall()

Finds all occurrences of the pattern in the string.

Returns a list of all matches.

d) re.finditer()

Similar to re.findall(), but returns an iterator of match objects.

e) re.sub()

Replaces occurrences of the pattern with a specified string.

f) re.split()

Splits the string by the occurrences of the pattern.

5. Flags

Example: Extracting Email Addresses

Example: Validating a Phone Number

Example: Validating a Phone Numbers And Emails From A String.

Related Posts

Post a Comment

No comments

Popular Posts

Popular Posts

`.` : Matches any character except a newline.

`\d` : Matches any digit (0-9).

`\D` : Matches any non-digit.

`\w` : Matches any word character (a-z, A-Z, 0-9, _).

`\W` : Matches any non-word character.

`\s` : Matches any whitespace character (space, tab, newline).

`\S` : Matches any non-whitespace character.

`*` : Matches 0 or more occurrences.

`+` : Matches 1 or more occurrences.

`?` : Matches 0 or 1 occurrence.

`{n}` : Matches exactly `n` occurrences.

`{n,}` : Matches `n` or more occurrences.

`{n,m}` : Matches between `n` and `m` occurrences.

`^` : Matches the start of the string.

`$` : Matches the end of the string.

`[abc]` : Matches any one of the characters `a`, `b`, or `c`.

`[^abc]` : Matches any character except `a`, `b`, or `c`.

`[a-z]` : Matches any character in the range `a` to `z`.

`()` : Groups part of the pattern and captures it.

`(?:...)` : Groups without capturing.

`|` : Matches either the pattern before or after the `|`.

2. Importing the `re` Module

a) `re.match()`

Returns a match object if successful, otherwise `None`.

b) `re.search()`

Returns a match object if successful, otherwise `None`.

c) `re.findall()`

d) `re.finditer()`

Similar to `re.findall()`, but returns an iterator of match objects.

e) `re.sub()`

f) `re.split()`