Regex cheat sheet

Character classes, quantifiers, groups, lookaround, and the recipes worth stealing — plus the one bug (catastrophic backtracking) that has caused real production outages.

6 min read

Regular expressions are dense on purpose — every character usually matters. This is the reference we keep open when writing or reading one. Examples use the JavaScript / PCRE flavor; most patterns work identically in Python, Go, Ruby, and Java.

Character classes

PatternMatches
.Any character except newline (add s flag to include newlines).
\dDigit (0–9).
\DNon-digit.
\wWord character: letter, digit, or underscore.
\WNon-word character.
\sWhitespace: space, tab, newline, form feed.
\SNon-whitespace.
[abc]Any of a, b, or c.
[^abc]Anything except a, b, or c.
[a-z]Range: lowercase a through z.

Quantifiers

PatternMeaning
*0 or more (greedy).
+1 or more (greedy).
?0 or 1.
{n}Exactly n.
{n,}n or more.
{n,m}Between n and m.
*? +? ??Lazy versions — match as few characters as possible.

Anchors and boundaries

  • ^ — start of string (or start of line with m flag).
  • $ — end of string (or end of line with m flag).
  • \b — word boundary (between \w and non-\w).
  • \B — not a word boundary.
  • \A / \z — absolute string start/end (some flavors; JS doesn't support them).

Groups

  • (abc) — capturing group, referenced as $1.
  • (?:abc) — non-capturing group. Faster, doesn't consume a numbered slot.
  • (?<name>abc) — named group, referenced as $<name>.
  • | — alternation. cat|dog matches either.

Lookaround

Zero-width assertions — they check a condition without consuming characters.

PatternMeaning
(?=abc)Positive lookahead — followed by abc.
(?!abc)Negative lookahead — not followed by abc.
(?<=abc)Positive lookbehind — preceded by abc.
(?<!abc)Negative lookbehind — not preceded by abc.

Flags (JavaScript)

  • g — global; find every match, not just the first.
  • i — case-insensitive.
  • m — multiline; ^ and $ match line breaks.
  • s — dotall; . matches newline.
  • u — full Unicode mode. Use it.
  • y — sticky; match at lastIndex only.

Recipes worth stealing

WhatPattern
Trim whitespace^\s+|\s+$
Collapse repeated spaces\s+" "
Split CSV line (simple),(?=(?:[^"]*"[^"]*")*[^"]*$) — but use a CSV parser for real data.
ISO 8601 date^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$
UUID (any version)^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$
Semver core^\d+\.\d+\.\d+
US-style phone^\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
Hex color^#(?:[0-9a-fA-F]{3,4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$

Things you should NOT use regex for

  • Parsing HTML. Use a real parser. Every time.
  • Validating email addresses precisely. The RFC 5321 grammar is 200+ lines. Accept ^[^@\s]+@[^@\s]+\.[^@\s]+$ and send a confirmation email.
  • Parsing JSON / YAML / XML. Use a parser.
  • Extracting fields from source code. Use the language's AST.

Catastrophic backtracking — the one bug that will bite you

Patterns like (a+)+ or (a|a)+ can take exponential time on adversarial input. This is the class of bug behind real production outages (Cloudflare 2019, Stack Exchange 2016). Avoid nested quantifiers on overlapping character classes; use possessive quantifiers or atomic groups where your engine supports them; test regexes on hostile inputs before shipping.

Two golden rules

  1. Every regex needs a comment explaining what it matches. Future you will thank present you.
  2. If a regex takes more than three lines to explain, use a parser instead.

Try the tools

Related reading