Regex cheat sheet
Character classes, quantifiers, groups, lookaround, and the recipes worth stealing — plus the one bug (catastrophic backtracking) that has caused real production outages.
Regular expressions are dense on purpose — every character usually matters. This is the reference we keep open when writing or reading one. Examples use the JavaScript / PCRE flavor; most patterns work identically in Python, Go, Ruby, and Java.
Character classes
| Pattern | Matches |
|---|---|
. | Any character except newline (add s flag to include newlines). |
\d | Digit (0–9). |
\D | Non-digit. |
\w | Word character: letter, digit, or underscore. |
\W | Non-word character. |
\s | Whitespace: space, tab, newline, form feed. |
\S | Non-whitespace. |
[abc] | Any of a, b, or c. |
[^abc] | Anything except a, b, or c. |
[a-z] | Range: lowercase a through z. |
Quantifiers
| Pattern | Meaning |
|---|---|
* | 0 or more (greedy). |
+ | 1 or more (greedy). |
? | 0 or 1. |
{n} | Exactly n. |
{n,} | n or more. |
{n,m} | Between n and m. |
*? +? ?? | Lazy versions — match as few characters as possible. |
Anchors and boundaries
^— start of string (or start of line withmflag).$— end of string (or end of line withmflag).\b— word boundary (between\wand non-\w).\B— not a word boundary.\A/\z— absolute string start/end (some flavors; JS doesn't support them).
Groups
(abc)— capturing group, referenced as$1.(?:abc)— non-capturing group. Faster, doesn't consume a numbered slot.(?<name>abc)— named group, referenced as$<name>.|— alternation.cat|dogmatches either.
Lookaround
Zero-width assertions — they check a condition without consuming characters.
| Pattern | Meaning |
|---|---|
(?=abc) | Positive lookahead — followed by abc. |
(?!abc) | Negative lookahead — not followed by abc. |
(?<=abc) | Positive lookbehind — preceded by abc. |
(?<!abc) | Negative lookbehind — not preceded by abc. |
Flags (JavaScript)
g— global; find every match, not just the first.i— case-insensitive.m— multiline;^and$match line breaks.s— dotall;.matches newline.u— full Unicode mode. Use it.y— sticky; match atlastIndexonly.
Recipes worth stealing
| What | Pattern |
|---|---|
| Trim whitespace | ^\s+|\s+$ |
| Collapse repeated spaces | \s+ → " " |
| Split CSV line (simple) | ,(?=(?:[^"]*"[^"]*")*[^"]*$) — but use a CSV parser for real data. |
| ISO 8601 date | ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$ |
| UUID (any version) | ^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$ |
| Semver core | ^\d+\.\d+\.\d+ |
| US-style phone | ^\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$ |
| Hex color | ^#(?:[0-9a-fA-F]{3,4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$ |
Things you should NOT use regex for
- Parsing HTML. Use a real parser. Every time.
- Validating email addresses precisely. The RFC 5321 grammar is 200+ lines. Accept
^[^@\s]+@[^@\s]+\.[^@\s]+$and send a confirmation email. - Parsing JSON / YAML / XML. Use a parser.
- Extracting fields from source code. Use the language's AST.
Catastrophic backtracking — the one bug that will bite you
Patterns like (a+)+ or (a|a)+ can take exponential time on adversarial input. This is the class of bug behind real production outages (Cloudflare 2019, Stack Exchange 2016). Avoid nested quantifiers on overlapping character classes; use possessive quantifiers or atomic groups where your engine supports them; test regexes on hostile inputs before shipping.
Two golden rules
- Every regex needs a comment explaining what it matches. Future you will thank present you.
- If a regex takes more than three lines to explain, use a parser instead.