Regex cheat sheet

Character classes, quantifiers, groups, lookaround, and the recipes worth stealing — plus the one bug (catastrophic backtracking) that has caused real production outages.

July 4, 2026•6 min read

Regular expressions are dense on purpose — every character usually matters. This is the reference we keep open when writing or reading one. Examples use the JavaScript / PCRE flavor; most patterns work identically in Python, Go, Ruby, and Java.

Character classes

Pattern	Matches
`.`	Any character except newline (add `s` flag to include newlines).
`\d`	Digit (0–9).
`\D`	Non-digit.
`\w`	Word character: letter, digit, or underscore.
`\W`	Non-word character.
`\s`	Whitespace: space, tab, newline, form feed.
`\S`	Non-whitespace.
`[abc]`	Any of a, b, or c.
`[^abc]`	Anything except a, b, or c.
`[a-z]`	Range: lowercase a through z.

Quantifiers

Pattern	Meaning
`*`	0 or more (greedy).
`+`	1 or more (greedy).
`?`	0 or 1.
`{n}`	Exactly n.
`{n,}`	n or more.
`{n,m}`	Between n and m.
`*?` `+?` `??`	Lazy versions — match as few characters as possible.

Anchors and boundaries

^ — start of string (or start of line with m flag).
$ — end of string (or end of line with m flag).
\b — word boundary (between \w and non-\w).
\B — not a word boundary.
\A / \z — absolute string start/end (some flavors; JS doesn't support them).

Groups

(abc) — capturing group, referenced as $1.
(?:abc) — non-capturing group. Faster, doesn't consume a numbered slot.
(?<name>abc) — named group, referenced as $<name>.
| — alternation. cat|dog matches either.

Lookaround

Zero-width assertions — they check a condition without consuming characters.

Pattern	Meaning
`(?=abc)`	Positive lookahead — followed by abc.
`(?!abc)`	Negative lookahead — not followed by abc.
`(?<=abc)`	Positive lookbehind — preceded by abc.
`(?<!abc)`	Negative lookbehind — not preceded by abc.

Flags (JavaScript)

g — global; find every match, not just the first.
i — case-insensitive.
m — multiline; ^ and $ match line breaks.
s — dotall; . matches newline.
u — full Unicode mode. Use it.
y — sticky; match at lastIndex only.

Recipes worth stealing

What	Pattern
Trim whitespace	`^\s+\|\s+$`
Collapse repeated spaces	`\s+` → `" "`
Split CSV line (simple)	`,(?=(?:[^"]"[^"]")[^"]$)` — but use a CSV parser for real data.
ISO 8601 date	`^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z\|[+-]\d{2}:\d{2})$`
UUID (any version)	`^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$`
Semver core	`^\d+\.\d+\.\d+`
US-style phone	`^$?\d{3}$?[\s.-]?\d{3}[\s.-]?\d{4}$`
Hex color	`^#(?:[0-9a-fA-F]{3,4}\|[0-9a-fA-F]{6}\|[0-9a-fA-F]{8})$`

Things you should NOT use regex for

Parsing HTML. Use a real parser. Every time.
Validating email addresses precisely. The RFC 5321 grammar is 200+ lines. Accept ^[^@\s]+@[^@\s]+\.[^@\s]+$ and send a confirmation email.
Parsing JSON / YAML / XML. Use a parser.
Extracting fields from source code. Use the language's AST.

Catastrophic backtracking — the one bug that will bite you

Patterns like (a+)+ or (a|a)+ can take exponential time on adversarial input. This is the class of bug behind real production outages (Cloudflare 2019, Stack Exchange 2016). Avoid nested quantifiers on overlapping character classes; use possessive quantifiers or atomic groups where your engine supports them; test regexes on hostile inputs before shipping.

Two golden rules

Every regex needs a comment explaining what it matches. Future you will thank present you.
If a regex takes more than three lines to explain, use a parser instead.