Regex Cheat Sheet: Complete Guide to Regular Expressions

Master regular expressions with this comprehensive cheat sheet. Learn patterns, syntax, language-specific implementations (Python, JavaScript, PHP, C#, Java, Go, Ruby), VS Code find/replace transformations, and 200+ practical examples. The ultimate regex resource for beginners and experts.

G
GUi Softworks
60 min read

What is Regex?

Regular expressions (regex or regexp) are sequences of characters that define search patterns. They are one of the most powerful tools for text processing, pattern matching, and data extraction.

Why Use Regular Expressions?

Regex is essential for:

  • Form validation (email, phone numbers, passwords)
  • Data extraction (parsing logs, scraping web pages)
  • Text processing (find and replace, formatting)
  • Code refactoring (renaming variables, updating syntax)
  • Input sanitization (security, preventing injection attacks)

How to Read This Cheat Sheet

This guide is organized into progressive sections:

  1. Fundamentals - Basic syntax and patterns
  2. Advanced Features - Lookarounds, named groups, Unicode
  3. Language-Specific - Examples in Python, JavaScript, PHP, C#, Java, Go, Ruby
  4. VS Code Regex - Find/replace transformations with 20+ examples
  5. Common Patterns - Copy-paste patterns for emails, URLs, dates, etc.
  6. Practical Use Cases - Real-world applications
  7. Troubleshooting - Common mistakes and performance tips

Each section includes:

  • Clear explanations
  • Visual examples
  • Code snippets you can copy
  • Pro tips and gotchas

Literal Characters

The simplest regex is a literal character sequence:

abc

Matches: "abc" in "The abc sequence"

Case Sensitivity

By default, regex is case-sensitive:

  • Hello matches "Hello" but NOT "hello"
  • Use the i flag for case-insensitive matching: /hello/i

Special Characters (Metacharacters)

These 12 characters have special meaning in regex and must be escaped with \ to match literally:

. ^ $ * + ? { } [ ] \ | ( )

Examples:

  • \. matches a literal dot (period)
  • \$ matches a dollar sign
  • \( matches a literal parenthesis
Character Escape Example Matches
. (dot) \. 3\.14 "3.14"
$ (dollar) \$ \$100 "$100"
* (asterisk) \* a\*b "a*b"

Character Classes

Predefined Character Classes

Pattern Description Equivalent Example Matches
\d Any digit [0-9] \d\d "42"
\D Any non-digit [^0-9] \D+ "abc"
\w Word character [a-zA-Z0-9_] \w+ "hello_123"
\W Non-word character [^a-zA-Z0-9_] \W "@", "#"
\s Whitespace [ \t\n\r\f\v] \s+ " " (spaces)
\S Non-whitespace [^ \t\n\r\f\v] \S+ "hello"
. Any character except newline - a.c "abc", "a1c"

Pro Tip: \w does NOT include Unicode letters by default. Use \p{L} for Unicode support (JavaScript/Python).

Custom Character Classes

Pattern Description Example Matches
[abc] Match any of a, b, or c [aeiou] Vowels: "a", "e", "i", "o", "u"
[^abc] Match any except a, b, or c [^0-9] Non-digits
[a-z] Range: lowercase letters [a-z]+ "hello"
[A-Z] Range: uppercase letters [A-Z]+ "HELLO"
[0-9] Range: digits [0-9]{4} "2025"
[a-zA-Z] Combined: all letters [a-zA-Z0-9] Alphanumeric

Examples:

[aeiou]       → Matches any vowel
[^aeiou]      → Matches any consonant (not a vowel)
[a-z0-9]      → Matches lowercase letters and digits
[a-zA-Z0-9_]  → Same as \w (word characters)

Quantifiers

Quantifiers specify how many times a pattern should match.

Basic Quantifiers

Pattern Description Example Matches
* 0 or more ab*c "ac", "abc", "abbc", "abbbc"
+ 1 or more ab+c "abc", "abbc" (NOT "ac")
? 0 or 1 (optional) colou?r "color", "colour"
{n} Exactly n times \d{4} "2025" (exactly 4 digits)
{n,} n or more times \d{2,} "42", "123", "9999"
{n,m} Between n and m times \d{2,4} "42", "123", "2025"

Greedy vs. Lazy Quantifiers

Greedy (default): Matches as much as possible

<.*>      → Matches: "<div>Hello</div>" (entire string)

Lazy (non-greedy): Matches as little as possible (add ? after quantifier)

<.*?>     → Matches: "<div>" and "</div>" separately
Greedy Lazy Description
* *? 0 or more (lazy)
+ +? 1 or more (lazy)
? ?? 0 or 1 (lazy)
{n,m} {n,m}? Between n and m (lazy)

Example:

Text: "Hello" and "World"

  • ".*" matches: "Hello" and "World" (greedy)
  • ".*?" matches: "Hello" and "World" separately (lazy)

Anchors & Boundaries

Anchors match positions, not characters.

Pattern Description Example Matches
^ Start of string/line ^Hello "Hello World" (at start)
$ End of string/line World$ "Hello World" (at end)
\b Word boundary \bcat\b "cat" in "The cat sat" (NOT "category")
\B Non-word boundary \Bcat "category" (cat NOT at boundary)
\A Start of string (not line) \AHello Only matches if "Hello" is at very start
\z End of string (not line) World\z Only matches if "World" is at very end
\Z End of string (before final newline) World\Z Matches "World" or "World\n"

Examples:

^cat$         → Matches: "cat" (entire line is "cat")
\bcat\b       → Matches: "cat" in "the cat sat" (whole word)
\Bcat         → Matches: "cat" in "category" (NOT at boundary)

Multiline Mode (m flag):

  • Without m: ^ and $ match start/end of entire string
  • With m: ^ and $ match start/end of each line

Groups & Alternation

Capturing Groups

Capturing groups (...) remember the matched text:

(\d+)-(\d+)   → Matches: "123-456"
                Group 1: "123"
                Group 2: "456"

Backreferences (reuse captured groups):

(\w)\1        → Matches: "aa", "bb", "cc" (repeated character)
(\w+) \1      → Matches: "hello hello" (repeated word)

Non-Capturing Groups

Use (?:...) when you need grouping but don't need to capture:

(?:https?://)  → Groups "http://" or "https://" without capturing

Why use non-capturing?

  • Faster performance (no memory overhead)
  • Cleaner backreferences (numbered groups only count capturing groups)

Alternation (OR)

Use | for "match this OR that":

cat|dog       → Matches: "cat" or "dog"
gr(a|e)y      → Matches: "gray" or "grey"

Examples:

(Mr|Ms|Mrs)\.?  → Matches: "Mr.", "Ms.", "Mrs."
https?://       → Matches: "http://" or "https://"

Lookaround Assertions

Lookarounds are zero-width assertions that match a position (like anchors) but with conditions.

Positive Lookahead (?=...)

Matches if the pattern ahead matches (but doesn't consume it):

\d(?=px)      → Matches: "10" in "10px" (NOT the "px" part)

Use case: Password validation

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&]).{8,}$

Breakdown:

  • (?=.*[A-Z]) - Must contain uppercase
  • (?=.*[a-z]) - Must contain lowercase
  • (?=.*\d) - Must contain digit
  • (?=.*[@$!%*?&]) - Must contain special char
  • .{8,} - At least 8 characters

Negative Lookahead (?!...)

Matches if the pattern ahead does NOT match:

\d(?!px)      → Matches: "10" in "10em" (NOT "10px")

Use case: Exclude certain words

\b(?!test)\w+  → Matches words that DON'T start with "test"

Positive Lookbehind (?<=...)

Matches if the pattern behind matches:

(?<=\$)\d+    → Matches: "100" in "$100" (NOT the "$" part)

Use case: Extract prices

(?<=Price: \$)\d+\.\d{2}  → Matches: "29.99" in "Price: $29.99"

Negative Lookbehind (?<!...)

Matches if the pattern behind does NOT match:

(?<!\$)\d+    → Matches: "100" but NOT in "$100"

Summary Table:

Type Syntax Description Example
Positive Lookahead (?=...) Matches if followed by... q(?=u) matches "q" in "queen"
Negative Lookahead (?!...) Matches if NOT followed by... q(?!u) matches "q" in "iraq"
Positive Lookbehind (?<=...) Matches if preceded by... (?<=\$)\d+ matches "10" in "$10"
Negative Lookbehind (?<!...) Matches if NOT preceded by... (?<!\$)\d+ matches "10" in "10"

Named Capturing Groups

Named groups (?<name>...) make regex more readable:

JavaScript:

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2025-01-17'.match(dateRegex);

console.log(match.groups.year);  // "2025"
console.log(match.groups.month); // "01"
console.log(match.groups.day);   // "17"

Python:

import re

pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2025-01-17')

print(match.group('year'))   # "2025"
print(match.group('month'))  # "01"
print(match.group('day'))    # "17"

C#:

var pattern = @"(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})";
var match = Regex.Match("2025-01-17", pattern);

Console.WriteLine(match.Groups["year"].Value);  // "2025"

Atomic Groups & Possessive Quantifiers

Atomic Groups (?>...)

Once matched, the group does not backtrack. Prevents catastrophic backtracking:

(?>\d+)bar    → Matches: "123bar" (fast)

Without atomic group:

\d+bar        → Tries: "123bar", "12bar", "1bar" (slow on mismatch)

Possessive Quantifiers

Greedy Possessive Description
* *+ 0 or more (no backtracking)
+ ++ 1 or more (no backtracking)
? ?+ 0 or 1 (no backtracking)

Use case: Prevent catastrophic backtracking on complex patterns.


Unicode Support

Modern regex engines support Unicode categories and scripts.

Unicode Categories \p{...}

JavaScript (ES2018+):

const letters = /\p{L}+/u;     // Any letter (any language)
const numbers = /\p{N}+/u;     // Any number
const currency = /\p{Sc}/u;    // Currency symbols

Python:

import regex  # Note: requires 'regex' module, not 're'

letters = regex.compile(r'\p{L}+')

Common Unicode Categories:

Category Description Example
\p{L} Letter "a", "字", "א"
\p{N} Number "1", "①", "一"
\p{S} Symbol "$", "©", "♥"
\p{Sc} Currency symbol "$", "€", "¥"
\p{P} Punctuation ".", "!", "?"
\p{Z} Separator Space, tab

Unicode Scripts:

/\p{Script=Greek}/u     → Matches Greek letters: "α", "β", "γ"
/\p{Script=Cyrillic}/u  → Matches Cyrillic: "а", "б", "в"
/\p{Script=Han}/u       → Matches Chinese characters

Negation:

/\P{L}+/u   → Matches anything that is NOT a letter

Modifiers & Flags

Flags change how regex patterns are interpreted.

Flag Name Description Example
i Case-insensitive Ignore case /hello/i matches "Hello"
g Global Find all matches /cat/g finds all "cat"
m Multiline ^ and $ match line starts/ends /^hello/m
s Dotall . matches newlines too /a.b/s matches "a\nb"
u Unicode Enable Unicode features /\p{L}+/u
x Extended Ignore whitespace (free-spacing) Allows comments
y Sticky Match at exact position JavaScript only

Examples:

Case-insensitive (i):

/hello/i.test('HELLO')   // true

Global (g):

'cat dog cat'.match(/cat/g)   // ["cat", "cat"]

Multiline (m):

const text = 'Line 1\nLine 2';
/^Line 2/m.test(text)   // true (without 'm': false)

Dotall (s):

/a.b/s.test('a\nb')   // true (without 's': false)

Inline Modifiers

Apply flags to part of the pattern:

(?i)hello      → Case-insensitive "hello"
(?-i)WORLD     → Case-sensitive "WORLD"
(?i:hello)     → Only "hello" is case-insensitive

Conditional Patterns

Syntax: (?(condition)true|false)

Example: Match quoted or unquoted strings

("|')?[^"'\r\n]*(?(1)\1)

Breakdown:

  • ("|')? - Optionally capture opening quote
  • [^"'\r\n]* - Match content
  • (?(1)\1) - If group 1 matched (opening quote), match same closing quote

Matches:

  • "hello"
  • 'world'
  • test ✅ (no quotes)
  • "mixed' ❌ (mismatched quotes)

Comments in Regex

Inline Comments (?# comment)

\d{3}(?# area code)-\d{3}(?# prefix)-\d{4}(?# line number)

Free-Spacing Mode (x flag)

Ignore whitespace and allow comments:

(?x)
  \d{3}     # area code
  -         # separator
  \d{3}     # prefix
  -         # separator
  \d{4}     # line number

Much more readable for complex patterns!

This section demonstrates how to use regex in 7 popular programming languages. Each language has its own regex API, but the core pattern syntax remains mostly consistent.

JavaScript / Node.js

Creating Regex Patterns

// Literal notation (most common)
const pattern1 = /\d{3}-\d{4}/;

// Constructor (when pattern is dynamic)
const pattern2 = new RegExp('\\d{3}-\\d{4}');
// Note: Backslashes must be escaped in strings

// With flags
const pattern3 = /hello/gi;  // Global, case-insensitive

String Methods

// .match() - Find matches
const text = 'Contact: 123-4567 or 987-6543';
const matches = text.match(/\d{3}-\d{4}/g);
console.log(matches);  // ["123-4567", "987-6543"]

// .matchAll() - Get all matches with groups (ES2020)
const emailPattern = /([\w.-]+)@([\w.-]+\.[a-z]{2,})/gi;
const emails = 'admin@example.com, user@test.org';
for (const match of emails.matchAll(emailPattern)) {
  console.log(`User: ${match[1]}, Domain: ${match[2]}`);
}
// User: admin, Domain: example.com
// User: user, Domain: test.org

// .search() - Find position of first match
const pos = 'Hello World'.search(/World/);
console.log(pos);  // 6

// .replace() - Replace matches
const phone = '(123) 456-7890';
const cleaned = phone.replace(/[^\d]/g, '');
console.log(cleaned);  // "1234567890"

// .replaceAll() - Replace all matches (ES2021)
const text2 = 'cat dog cat';
const result = text2.replaceAll(/cat/g, 'bird');
console.log(result);  // "bird dog bird"

// .split() - Split by pattern
const csv = 'apple,banana, orange , grape';
const fruits = csv.split(/\s*,\s*/);
console.log(fruits);  // ["apple", "banana", "orange", "grape"]

RegExp Methods

// .test() - Returns boolean
const isEmail = /^[\w.-]+@[\w.-]+\.[a-z]{2,}$/i;
console.log(isEmail.test('user@example.com'));  // true

// .exec() - Returns match details (or null)
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = pattern.exec('Date: 2025-01-17');
if (match) {
  console.log(match[0]);  // "2025-01-17" (full match)
  console.log(match[1]);  // "2025" (group 1)
  console.log(match[2]);  // "01" (group 2)
  console.log(match[3]);  // "17" (group 3)
}

Named Groups (ES2018+)

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2025-01-17'.match(pattern);

console.log(match.groups.year);   // "2025"
console.log(match.groups.month);  // "01"
console.log(match.groups.day);    // "17"

// Named backreferences
const dupeWord = /\b(?<word>\w+)\s+\k<word>\b/i;
console.log(dupeWord.test('hello hello'));  // true

Unicode Support (ES2018+)

// Match any letter (including accented, Chinese, Arabic, etc.)
const letters = /\p{L}+/u;
console.log(letters.test('café'));    // true
console.log(letters.test('你好'));    // true

// Match emoji
const emoji = /\p{Emoji}/u;
console.log(emoji.test('Hello 👋'));  // true

Python

The re Module

import re

# Compile pattern (recommended for reuse)
pattern = re.compile(r'\d{3}-\d{4}')

# Or use directly
re.search(r'\d{3}-\d{4}', 'Call 123-4567')

Core Functions

import re

# re.search() - Find first match
match = re.search(r'\d{3}-\d{4}', 'Contact: 123-4567 or 987-6543')
if match:
    print(match.group())  # "123-4567"
    print(match.start())  # 9 (position)
    print(match.end())    # 17

# re.match() - Match at START of string
match = re.match(r'\d+', '123 Main St')
print(match.group() if match else None)  # "123"

match = re.match(r'\d+', 'Main St 123')
print(match)  # None (doesn't start with digit)

# re.fullmatch() - Match ENTIRE string
result = re.fullmatch(r'\d{3}-\d{4}', '123-4567')
print(bool(result))  # True

result = re.fullmatch(r'\d{3}-\d{4}', 'Call 123-4567')
print(bool(result))  # False (extra text)

# re.findall() - Find all matches (returns list)
text = 'Prices: $10, $25, $100'
prices = re.findall(r'\$(\d+)', text)
print(prices)  # ['10', '25', '100']

# re.finditer() - Find all matches (returns iterator)
for match in re.finditer(r'\$(\d+)', text):
    print(f'Found ${match.group(1)} at position {match.start()}')
# Found $10 at position 8
# Found $25 at position 13
# Found $100 at position 18

# re.sub() - Replace matches
phone = '(123) 456-7890'
cleaned = re.sub(r'[^\d]', '', phone)
print(cleaned)  # "1234567890"

# re.split() - Split by pattern
csv = 'apple,banana, orange , grape'
fruits = re.split(r'\s*,\s*', csv)
print(fruits)  # ['apple', 'banana', 'orange', 'grape']

Groups and Named Groups

import re

# Numbered groups
pattern = r'(\d{4})-(\d{2})-(\d{2})'
match = re.search(pattern, 'Date: 2025-01-17')
if match:
    print(match.group(0))  # "2025-01-17" (full match)
    print(match.group(1))  # "2025"
    print(match.group(2))  # "01"
    print(match.group(3))  # "17"
    print(match.groups())  # ('2025', '01', '17')

# Named groups (?P<name>...)
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2025-01-17')
if match:
    print(match.group('year'))   # "2025"
    print(match.group('month'))  # "01"
    print(match.group('day'))    # "17"
    print(match.groupdict())     # {'year': '2025', 'month': '01', 'day': '17'}

Flags

import re

# Case-insensitive
re.search(r'hello', 'HELLO', re.IGNORECASE)  # or re.I

# Multiline (^ and $ match line starts/ends)
re.search(r'^Line 2', 'Line 1\nLine 2', re.MULTILINE)  # or re.M

# Dotall (. matches newlines)
re.search(r'a.b', 'a\nb', re.DOTALL)  # or re.S

# Verbose (free-spacing mode with comments)
pattern = re.compile(r'''
    \d{3}     # area code
    -         # separator
    \d{3}     # prefix
    -         # separator
    \d{4}     # line number
''', re.VERBOSE)  # or re.X

# Combine flags with |
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)

Replacement with Functions

import re

# Use function for dynamic replacements
def double_number(match):
    num = int(match.group())
    return str(num * 2)

text = 'I have 5 apples and 10 oranges'
result = re.sub(r'\d+', double_number, text)
print(result)  # "I have 10 apples and 20 oranges"

# With named groups
def format_name(match):
    return f"{match.group('last').upper()}, {match.group('first')}"

pattern = r'(?P<first>\w+)\s+(?P<last>\w+)'
text = 'John Doe'
result = re.sub(pattern, format_name, text)
print(result)  # "DOE, John"

PHP

PCRE Functions

<?php

// preg_match() - Find first match
$pattern = '/\d{3}-\d{4}/';
$text = 'Contact: 123-4567 or 987-6543';

if (preg_match($pattern, $text, $matches)) {
    echo $matches[0];  // "123-4567"
}

// preg_match_all() - Find all matches
preg_match_all('/\d{3}-\d{4}/', $text, $matches);
print_r($matches[0]);  // ["123-4567", "987-6543"]

// preg_replace() - Replace matches
$phone = '(123) 456-7890';
$cleaned = preg_replace('/[^\d]/', '', $phone);
echo $cleaned;  // "1234567890"

// preg_split() - Split by pattern
$csv = 'apple,banana, orange , grape';
$fruits = preg_split('/\s*,\s*/', $csv);
print_r($fruits);  // ["apple", "banana", "orange", "grape"]

// preg_grep() - Filter array by pattern
$words = ['apple', 'banana', 'apricot', 'orange'];
$aWords = preg_grep('/^a/', $words);
print_r($aWords);  // ["apple", "apricot"]
?>

Named Groups

<?php
$pattern = '/(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})/';
$text = '2025-01-17';

if (preg_match($pattern, $text, $matches)) {
    echo $matches['year'];   // "2025"
    echo $matches['month'];  // "01"
    echo $matches['day'];    // "17"
}
?>

Modifiers (Flags)

<?php
// i - Case-insensitive
preg_match('/hello/i', 'HELLO');  // Match

// m - Multiline
preg_match('/^Line 2/m', "Line 1\nLine 2");  // Match

// s - Dotall (. matches newlines)
preg_match('/a.b/s', "a\nb");  // Match

// x - Free-spacing (ignore whitespace)
$pattern = '/
    \d{3}     # area code
    -         # separator
    \d{3}     # prefix
    -         # separator
    \d{4}     # line number
/x';

// u - UTF-8 support
preg_match('/\w+/u', 'café');  // Match (includes é)

// Combine modifiers
preg_match('/hello/imu', $text);
?>

Replacement with Callbacks

<?php
$text = 'I have 5 apples and 10 oranges';

$result = preg_replace_callback('/\d+/', function($matches) {
    return (int)$matches[0] * 2;
}, $text);

echo $result;  // "I have 10 apples and 20 oranges"
?>

C# (.NET)

Regex Class

using System;
using System.Text.RegularExpressions;

// Static methods (simple use)
string text = "Contact: 123-4567 or 987-6543";
Match match = Regex.Match(text, @"\d{3}-\d{4}");
if (match.Success)
{
    Console.WriteLine(match.Value);  // "123-4567"
}

// Find all matches
MatchCollection matches = Regex.Matches(text, @"\d{3}-\d{4}");
foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}
// Output:
// 123-4567
// 987-6543

// Replace
string phone = "(123) 456-7890";
string cleaned = Regex.Replace(phone, @"[^\d]", "");
Console.WriteLine(cleaned);  // "1234567890"

// Split
string csv = "apple,banana, orange , grape";
string[] fruits = Regex.Split(csv, @"\s*,\s*");
// ["apple", "banana", "orange", "grape"]

Compiled Regex (Better Performance)

using System.Text.RegularExpressions;

// Compile for reuse (much faster for repeated use)
Regex pattern = new Regex(@"\d{3}-\d{4}", RegexOptions.Compiled);

string text = "Contact: 123-4567";
Match match = pattern.Match(text);
if (match.Success)
{
    Console.WriteLine(match.Value);
}

RegexOptions (Flags)

using System.Text.RegularExpressions;

// Case-insensitive
Regex.IsMatch("HELLO", "hello", RegexOptions.IgnoreCase);

// Multiline
Regex.Match("Line 1\nLine 2", "^Line 2", RegexOptions.Multiline);

// Singleline (. matches newlines)
Regex.Match("a\nb", "a.b", RegexOptions.Singleline);

// Compiled (better performance)
var pattern = new Regex(@"\d+", RegexOptions.Compiled);

// Combine options
var opts = RegexOptions.IgnoreCase | RegexOptions.Multiline;
Regex.Match(text, pattern, opts);

Named Groups

using System;
using System.Text.RegularExpressions;

string pattern = @"(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})";
Match match = Regex.Match("2025-01-17", pattern);

if (match.Success)
{
    Console.WriteLine(match.Groups["year"].Value);   // "2025"
    Console.WriteLine(match.Groups["month"].Value);  // "01"
    Console.WriteLine(match.Groups["day"].Value);    // "17"
}

Replacement with MatchEvaluator

using System;
using System.Text.RegularExpressions;

string text = "I have 5 apples and 10 oranges";

string result = Regex.Replace(text, @"\d+", match =>
{
    int num = int.Parse(match.Value);
    return (num * 2).ToString();
});

Console.WriteLine(result);  // "I have 10 apples and 20 oranges"

Java

Pattern and Matcher Classes

import java.util.regex.Pattern;
import java.util.regex.Matcher;

// Compile pattern
Pattern pattern = Pattern.compile("\\d{3}-\\d{4}");
String text = "Contact: 123-4567 or 987-6543";

// Create matcher
Matcher matcher = pattern.matcher(text);

// Find first match
if (matcher.find()) {
    System.out.println(matcher.group());  // "123-4567"
}

// Find all matches
matcher.reset();  // Reset to start
while (matcher.find()) {
    System.out.println(matcher.group());
}
// Output:
// 123-4567
// 987-6543

String Methods

// matches() - Check if ENTIRE string matches
boolean isPhone = "123-4567".matches("\\d{3}-\\d{4}");
System.out.println(isPhone);  // true

// replaceAll() - Replace all matches
String phone = "(123) 456-7890";
String cleaned = phone.replaceAll("[^\\d]", "");
System.out.println(cleaned);  // "1234567890"

// replaceFirst() - Replace first match
String text = "cat dog cat";
String result = text.replaceFirst("cat", "bird");
System.out.println(result);  // "bird dog cat"

// split() - Split by pattern
String csv = "apple,banana, orange , grape";
String[] fruits = csv.split("\\s*,\\s*");
// ["apple", "banana", "orange", "grape"]

Pattern Flags

import java.util.regex.Pattern;

// Case-insensitive
Pattern pattern = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);

// Multiline
Pattern.compile("^Line 2", Pattern.MULTILINE);

// Dotall (. matches newlines)
Pattern.compile("a.b", Pattern.DOTALL);

// Comments (free-spacing)
Pattern.compile("""
    \\d{3}     # area code
    -          # separator
    \\d{3}     # prefix
    -          # separator
    \\d{4}     # line number
    """, Pattern.COMMENTS);

// Combine flags
int flags = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE;
Pattern.compile("pattern", flags);

Named Groups

import java.util.regex.Pattern;
import java.util.regex.Matcher;

Pattern pattern = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher matcher = pattern.matcher("2025-01-17");

if (matcher.find()) {
    System.out.println(matcher.group("year"));   // "2025"
    System.out.println(matcher.group("month"));  // "01"
    System.out.println(matcher.group("day"));    // "17"
}

Advanced Replacement

import java.util.regex.Pattern;
import java.util.regex.Matcher;

String text = "I have 5 apples and 10 oranges";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(text);

StringBuffer result = new StringBuffer();
while (matcher.find()) {
    int num = Integer.parseInt(matcher.group());
    matcher.appendReplacement(result, String.valueOf(num * 2));
}
matcher.appendTail(result);

System.out.println(result);  // "I have 10 apples and 20 oranges"

Go (Golang)

The regexp Package

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Compile pattern
    pattern := regexp.MustCompile(`\d{3}-\d{4}`)
    text := "Contact: 123-4567 or 987-6543"

    // Find first match
    match := pattern.FindString(text)
    fmt.Println(match)  // "123-4567"

    // Find all matches
    matches := pattern.FindAllString(text, -1)
    fmt.Println(matches)  // [123-4567 987-6543]

    // Check if matches
    isMatch := pattern.MatchString("123-4567")
    fmt.Println(isMatch)  // true

    // Replace all
    phone := "(123) 456-7890"
    cleaned := regexp.MustCompile(`[^\d]`).ReplaceAllString(phone, "")
    fmt.Println(cleaned)  // "1234567890"

    // Split
    csv := "apple,banana, orange , grape"
    fruits := regexp.MustCompile(`\s*,\s*`).Split(csv, -1)
    fmt.Println(fruits)  // [apple banana orange grape]
}

Submatches (Groups)

package main

import (
    "fmt"
    "regexp"
)

func main() {
    pattern := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    text := "Date: 2025-01-17"

    // FindStringSubmatch returns [full, group1, group2, ...]
    matches := pattern.FindStringSubmatch(text)
    if matches != nil {
        fmt.Println(matches[0])  // "2025-01-17" (full match)
        fmt.Println(matches[1])  // "2025" (group 1)
        fmt.Println(matches[2])  // "01" (group 2)
        fmt.Println(matches[3])  // "17" (group 3)
    }

    // FindAllStringSubmatch for all matches
    text2 := "Dates: 2025-01-17 and 2024-12-31"
    allMatches := pattern.FindAllStringSubmatch(text2, -1)
    for _, match := range allMatches {
        fmt.Printf("Year: %s, Month: %s, Day: %s\n", match[1], match[2], match[3])
    }
    // Year: 2025, Month: 01, Day: 17
    // Year: 2024, Month: 12, Day: 31
}

Named Groups

package main

import (
    "fmt"
    "regexp"
)

func main() {
    pattern := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
    text := "2025-01-17"

    match := pattern.FindStringSubmatch(text)
    if match != nil {
        // Get named group indices
        names := pattern.SubexpNames()
        result := make(map[string]string)
        for i, name := range names {
            if i != 0 && name != "" {
                result[name] = match[i]
            }
        }
        fmt.Println(result["year"])   // "2025"
        fmt.Println(result["month"])  // "01"
        fmt.Println(result["day"])    // "17"
    }
}

Replacement with Functions

package main

import (
    "fmt"
    "regexp"
    "strconv"
)

func main() {
    text := "I have 5 apples and 10 oranges"
    pattern := regexp.MustCompile(`\d+`)

    result := pattern.ReplaceAllStringFunc(text, func(s string) string {
        num, _ := strconv.Atoi(s)
        return strconv.Itoa(num * 2)
    })

    fmt.Println(result)  // "I have 10 apples and 20 oranges"
}

Ruby

Regex Literals

# Literal notation
pattern = /\d{3}-\d{4}/

# With flags
pattern_ci = /hello/i  # Case-insensitive
pattern_multi = /^line/m  # Multiline

# Constructor (for dynamic patterns)
pattern = Regex.new('\d{3}-\d{4}')

String Methods

text = 'Contact: 123-4567 or 987-6543'

# match() - Returns MatchData or nil
match = text.match(/\d{3}-\d{4}/)
if match
  puts match[0]  # "123-4567"
end

# scan() - Find all matches
matches = text.scan(/\d{3}-\d{4}/)
puts matches  # ["123-4567", "987-6543"]

# =~ operator - Returns index of first match
index = text =~ /\d{3}-\d{4}/
puts index  # 9

# sub() - Replace first match
result = 'cat dog cat'.sub(/cat/, 'bird')
puts result  # "bird dog cat"

# gsub() - Replace all matches
phone = '(123) 456-7890'
cleaned = phone.gsub(/[^\d]/, '')
puts cleaned  # "1234567890"

# split() - Split by pattern
csv = 'apple,banana, orange , grape'
fruits = csv.split(/\s*,\s*/)
puts fruits  # ["apple", "banana", "orange", "grape"]

Capture Groups

pattern = /(\d{4})-(\d{2})-(\d{2})/
match = '2025-01-17'.match(pattern)

if match
  puts match[0]  # "2025-01-17" (full match)
  puts match[1]  # "2025" (group 1)
  puts match[2]  # "01" (group 2)
  puts match[3]  # "17" (group 3)
end

Named Groups

pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
match = '2025-01-17'.match(pattern)

if match
  puts match[:year]   # "2025"
  puts match[:month]  # "01"
  puts match[:day]    # "17"
end

Replacement with Blocks

text = 'I have 5 apples and 10 oranges'

result = text.gsub(/\d+/) { |num| (num.to_i * 2).to_s }
puts result  # "I have 10 apples and 20 oranges"

# With named groups
pattern = /(?<first>\w+)\s+(?<last>\w+)/
text = 'John Doe'

result = text.gsub(pattern) do |match|
  m = Regexp.last_match
  "#{m[:last].upcase}, #{m[:first]}"
end
puts result  # "DOE, John"

Flags

# i - Case-insensitive
/hello/i.match('HELLO')  # Match

# m - Multiline (. matches newlines)
/a.b/m.match("a\nb")  # Match

# x - Free-spacing (ignore whitespace)
pattern = /
  \d{3}     # area code
  -         # separator
  \d{3}     # prefix
  -         # separator
  \d{4}     # line number
/x

# o - Compile once (optimization)
pattern = /\d+/o

Visual Studio Code's Find and Replace (Ctrl/Cmd+H) supports regex with powerful transformation capabilities. This section shows 28 practical examples developers use every day.

Accessing Find & Replace

Keyboard Shortcuts:

  • Find: Ctrl+F (Windows/Linux) / Cmd+F (Mac)
  • Replace: Ctrl+H (Windows/Linux) / Cmd+H (Mac)
  • Enable Regex: Click the .* button or press Alt+R

Tips:

  • Use Ctrl+Enter (Cmd+Enter) to replace all
  • Preview matches before replacing (they highlight in yellow)
  • Use F3 / Shift+F3 to navigate between matches

Case Transformations

VS Code supports special replacement sequences for case conversion:

Sequence Effect Example
\l Lowercase next character \l
\u Uppercase next character \u
\L Lowercase all following characters \L
\U Uppercase all following characters \U
\E End case transformation \U\E

Example 1: Capitalize First Letter

Find:

\b(\w)(\w*)

Replace:

\u\

Before:

hello world

After:

Hello World

This section provides 25+ ready-to-use regex patterns.

Email Validation

^[\w.-]+@[\w.-]+\.[a-z]{2,}$

Phone Numbers (US)

^(\+?1)?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$

Dates (ISO 8601)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

URLs

^https?://(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b

IPv4 Addresses

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Hex Colors

^#?([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Username (3-16 chars)

^[a-zA-Z0-9_-]{3,16}$

UUID v4

^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$

And 15+ more patterns with full documentation!

Real-world regex applications in development.

Form Validation

const validators = {
  email: /^[\w.-]+@[\w.-]+\.[a-z]{2,}$/i,
  phone: /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/
};

Data Extraction

Extract emails from text:

import re
emails = re.findall(r'[\w.-]+@[\w.-]+\.[a-z]{2,}', text)

Log Parsing

pattern = r'(?P<ip>[\d.]+).+(?P<status>\d{3})'

Code Refactoring

Convert old API calls in VS Code:

Find: apiClient\.get\('([^']+)'\)
Replace: fetch('$1').then(r => r.json())

Common Mistakes

1. Forgetting to Escape Special Characters

❌ Wrong: file.txt
✅ Correct: file\.txt

2. Greedy vs Lazy

❌ Greedy: <.*> matches entire <div>text</div>
✅ Lazy: <.*?> matches <div> and </div> separately

3. Not Using Anchors

/\d{3}/ matches "123" in "abc123def"
/^\d{3}$/ only matches exactly "123"

Performance Tips

  1. Use specific character classes instead of .
  2. Anchor patterns when possible
  3. Avoid nested quantifiers (catastrophic backtracking)
  4. Use atomic groups for performance
  5. Compile patterns for reuse

Debugging

  1. Test on regex101.com
  2. Use verbose mode with comments
  3. Break complex patterns into parts
  4. Test edge cases

Online Tools

Regex Testers

  • regex101.com - Best tester with explanations
  • regexr.com - Visual regex builder
  • regexpal.com - Simple, fast testing

Visualizers

  • debuggex.com - Railroad diagrams
  • regexper.com - Visual tool

Learning Resources

  • regexone.com - Interactive lessons
  • regexlearn.com - Step-by-step guide
  • regular-expressions.info - Documentation

IDE Extensions

  • Regex Previewer (VS Code)
  • Regex Tester (VS Code)

Character Classes

Pattern Matches
\d Digit [0-9]
\w Word [a-zA-Z0-9_]
\s Whitespace
. Any character

Quantifiers

Pattern Meaning
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n

Anchors

Pattern Meaning
^ Start of line
$ End of line
\b Word boundary

Flags

Flag Meaning
i Case-insensitive
g Global
m Multiline
s Dotall

General Questions

Q: What is the difference between greedy and lazy quantifiers?

A: Greedy matches as much as possible. Lazy (*?, +?) matches as little as possible.

Q: How do I match a literal dot?

A: Escape it with backslash: \.

Q: What is catastrophic backtracking?

A: When regex tries many combinations, causing slowness. Avoid nested quantifiers like (a+)+.

Q: Can regex validate email perfectly?

A: No. Use regex for basic format, then verify via email.

Q: How do I match across multiple lines?

A: Use the s flag, or use [\s\S]* instead of .*.

VS Code Specific

Q: How do I replace with uppercase in VS Code?

A: Use \u (uppercase next), \U (uppercase all).

Q: Can I use regex in VS Code file search?

A: Yes! Press Ctrl+Shift+F and enable regex (Alt+R).

Regex Data Extractor Chrome Extension

Our Regex Data Extractor helps you extract data from web pages using patterns from this guide.

Key Features

  • Pattern Library: Pre-built patterns
  • Live Testing: Test regex on any webpage
  • Multi-Format Export: CSV, JSON, Excel, PDF
  • Batch Extraction: Extract from multiple pages

Example: Extract Emails

  1. Install Regex Data Extractor
  2. Navigate to any webpage
  3. Click the extension icon
  4. Enter pattern: [\w.-]+@[\w.-]+\.[a-z]{2,}
  5. Click "Extract"
  6. Export to CSV/JSON

Example: Extract Prices

Pattern: \$([0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2})
Captures: $1,234.56, $99.99

Pro Tips

  • Save frequently used patterns
  • Use named groups for structured data
  • Test patterns first
  • Export for data analysis

Get Regex Data Extractor →

regexregular expressionscheat sheetpatternstutorialvs codepythonjavascriptphp

Last updated: January 16, 2025