What is Regex?
Regular expressions (regex) are sequences of characters that define search patterns. They are incredibly powerful for finding, matching, and manipulating text.
Basic Characters
| Pattern | Description | Example |
|---|
| . | Any character except newline | a.c matches "abc", "a1c" |
| \d | Any digit (0-9) | \d\d matches "42" |
| \D | Any non-digit | \D+ matches "abc" |
| \w | Word character (a-z, A-Z, 0-9, _) | \w+ matches "hello_123" |
| \W | Non-word character | \W matches "@", "#" |
| \s | Whitespace (space, tab, newline) | \s+ matches " " |
| \S | Non-whitespace | \S+ matches "hello" |
Quantifiers
| Pattern | Description | Example |
|---|
| * | 0 or more | ab*c matches "ac", "abc", "abbc" |
| + | 1 or more | ab+c matches "abc", "abbc" |
| ? | 0 or 1 (optional) | colou?r matches "color", "colour" |
| {n} | Exactly n times | \d{4} matches "2025" |
| {n,} | n or more times | \d{2,} matches "42", "123" |
| {n,m} | Between n and m times | \d{2,4} matches "42", "123", "2025" |
Anchors
| Pattern | Description | Example |
|---|
| ^ | Start of string/line | ^Hello matches "Hello World" |
| $ | End of string/line | World$ matches "Hello World" |
| \b | Word boundary | \bcat\b matches "cat" not "category" |
| \B | Non-word boundary | \Bcat matches "category" |
Character Classes
| Pattern | Description | Example |
|---|
| [abc] | Match any of a, b, or c | [aeiou] matches vowels |
| [^abc] | Match any except a, b, or c | [^0-9] matches non-digits |
| [a-z] | Range: any lowercase letter | [a-zA-Z] any letter |
| [0-9] | Range: any digit | [0-9]+ matches numbers |
Groups and Alternation
| Pattern | Description | Example |
|---|
| (abc) | Capturing group | (\d+)-(\d+) captures both numbers |
| (?:abc) | Non-capturing group | (?:https?://) groups without capturing |
| a|b | Alternation (or) | cat|dog matches either |
| \1 | Backreference to group 1 | (\w)\1 matches "aa", "bb" |
Common Patterns
Email Address
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
URL
https?://[\w.-]+(?:/[\w./-]*)?
Phone Number (US)
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
Date (YYYY-MM-DD)
\d{4}-\d{2}-\d{2}
IP Address (IPv4)
\b(?:\d{1,3}\.){3}\d{1,3}\b
Credit Card Number
\b(?:\d{4}[- ]?){3}\d{4}\b
Flags
| Flag | Description |
|---|
| i | Case-insensitive matching |
| g | Global - find all matches |
| m | Multiline - ^ and $ match line starts/ends |
| s | Dotall - . matches newlines too |
Tips for Using Regex Data Extractor
- Start simple and build complexity gradually
- Use the preview feature to test your patterns
- Escape special characters with backslash when matching literally
- Use non-greedy quantifiers (
*?, +?) when needed - Test with edge cases to ensure your pattern works correctly
With Regex Data Extractor, you can apply these patterns directly to any webpage and extract exactly the data you need. Happy extracting!