Regex Lookaheads and Lookbehinds — Understand Them in 5 Minutes
Lookarounds are the feature that turn regular expressions from a toy into a professional tool. They let you write conditions like "check whether a word follows without consuming it" or "find a comma that isn't wrapped in quotation marks". Sounds unspectacular, but it's the difference between a regex that sometimes works and one that is correct. This article explains the four lookaround flavours, why they count as zero-width assertions, walks through five real-world patterns, and closes with browser support, performance, and the most common pitfalls. If you only know regular expressions as cryptic onboard tooling, this article will give you a much more practical sense of what you can actually structure, validate, and extract with them.
Zero-width assertions: why lookarounds don't consume characters
A normal regex class like \d does two things: it checks whether the current character is a digit, and it advances the engine position by one. Lookarounds, in contrast, are zero-width assertions: they test a condition at the current position without moving it. The engine stays put, peeks ahead or back, and only decides yes or no. That lets you stack multiple conditions on the same position, which would be impossible with normal quantifiers.
Concretely: a pattern like (?=.*\d)(?=.*[A-Z]) checks at the start of the string whether a digit appears somewhere and (separately) whether an uppercase letter appears somewhere — both tests begin at the same position. Without zero-width assertions you would have to spell out every order (\d.*[A-Z]|[A-Z].*\d), which explodes combinatorially at three or more conditions. Lookarounds shine here.
The four flavours at a glance
There are exactly four lookaround operators — combinations of two axes (forward or backward) and two polarities (positive or negative):
- Positive lookahead
(?=...): matches if the pattern...follows the current position. Example:\d+(?= EUR)matches the digits in"42 EUR", butEURstays in the remaining string. - Negative lookahead
(?!...): matches if the pattern...does not follow. Example:foo(?!bar)matchesfooin"foobaz"but not in"foobar". - Positive lookbehind
(?<=...): matches if the pattern...precedes the position. Example:(?<=\$)\d+matches the number after a dollar sign without consuming the sign itself. - Negative lookbehind
(?<!...): matches if the pattern...does not precede. Example:(?<!@)\bfoo\bmatchesfoobut not@foo— useful for filtering mentions.
Example 1: password validation
A classic pattern: a password must be at least 10 characters and contain at least one uppercase letter, one digit, and one special character. Without lookaheads you would have to cover every order of the three classes. With lookaheads the regex stays short and readable: ^(?=.*[A-Z])(?=.*\d)(?=.*[^A-Za-z0-9]).{10,}$. Each (?=...) is an independent test starting at the beginning of the string; the actual match is just .{10,} with a minimum length.
Heads up: lookaheads say nothing about the order or position of the found characters — only that they appear somewhere in the rest of the string. If you want "the uppercase letter must be at the start", you don't need a lookahead but an anchor rule. Lookaheads are ideal for "exists somewhere" conditions, not "is right here".
Example 2: extracting currency amounts
You want to extract from running text like "The price is $19.99, plus $2.50 shipping" only the numbers preceded by a dollar sign. The naive solution \$\d+\.\d+ matches $19.99 including the sign. With positive lookbehind: (?<=\$)\d+\.\d+ matches only 19.99 and 2.50. Worth its weight in gold when you feed the numbers straight into parseFloat() without scrubbing them first.
Mirror variant with lookahead when the currency comes after (German format "19,99 EUR"): \d+,\d+(?= EUR). Both variants keep the engine sitting on the number's starting position, the rest of the string remains untouched — ideal when you want to collect several different tokens in the same pass.
Example 3: splitting CamelCase into words
A handy developer trick: split "getUserProfile" into ["get", "User", "Profile"]. The idea: split at every position where a lowercase letter (or word boundary) meets an uppercase letter, without losing a letter. That's exactly what a split with lookaround is made for: str.split(/(?<=[a-z])(?=[A-Z])/). The lookbehind confirms there's a lowercase letter on the left, the lookahead an uppercase letter on the right — and both stay in the result.
The same logic also cleanly splits acronyms: "XMLHttpRequest" with /(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ yields ["XML", "Http", "Request"]. Anyone generating slugs or snake_case names from CamelCase has a clean foundation here — the CalcSI slug generator and the case converter use exactly these patterns internally.
Example 4: CSV with embedded quotes
Classic trap: a CSV splitter that breaks on commas fails on "Smith, John",42,Berlin, because the comma inside the name cuts the record. A pragmatic regex solution uses a lookahead that checks for an even number of quotes following the comma to end-of-string — meaning the comma sits outside an open quote. The pattern reads roughly: ,(?=(?:[^"]*"[^"]*")*[^"]*$).
Important caveat: for production use a proper CSV parser (RFC 4180 allows multi-line fields, escaped quotes, BOM, and more — this regex doesn't cover those). But for quick scripts and log parsing the pattern is unbeatable. With the regex tester you can immediately see which commas trigger the match — especially with combined lookarounds, live testing saves a lot of debug time.
Example 5: word boundaries beyond \b
\b is handy but Unicode-blind: \bcafé\b doesn't match reliably in many engines because é isn't a word character by default. With lookarounds you can build a precise custom word boundary, for example: (?<![\p{L}\d])café(?![\p{L}\d]) — a "no letters or digits before or after" rule using the Unicode property \p{L}. Lookarounds make word boundaries explicit and controllable.
Another typical use: match a word that is not followed by a dot, to exclude TLDs — example(?!\.). Or match a word only when a specific preceding word is present: (?<=Dr\. )Smith. Both examples show how lookarounds express contextual conditions that classical quantifiers can only solve via capture groups and post-processing.
Browser and language support in 2026
JavaScript has had lookaheads since ECMAScript 3 (1999), but lookbehinds only arrived with ECMAScript 2018 (ES9). That caused issues for a long time because Safari supported them late: Chrome/V8 from version 62 (2017), Firefox from 78 (2020), Safari only with version 16.4 (March 2023). For variable-length lookbehinds — like (?<=\w+ ) — the same applies; V8 has supported them from the start, Safari since 16.4. As of 2026 the engine landscape is uniform enough that lookbehinds can be used in modern web apps without feature tests.
Backends are more relaxed: Python has had lookaheads and lookbehinds since version 1.5, though only fixed-length in lookbehind (the newer regex module allows variable length). Java has supported both since JDK 1.4, with variable-length lookbehinds up to a declared maximum. PHP uses PCRE, which knows all four variants plus variable lookbehinds via \K or PCRE2. Perl has been the reference implementation forever. .NET is by far the most liberal: full variable-length lookbehinds out of the box.
Performance: when lookarounds get slow
Lookarounds themselves aren't expensive — a constant addition to engine logic. They get expensive when the inner pattern is complex and unbounded. A pattern like (?=.*\d.*\d.*\d) with three nested .* quantifiers can catastrophically backtrack on long strings ("catastrophic backtracking"), because the engine tries every possible split. Rule of thumb: every .* or .+ inside a lookaround is a warning sign.
Antidote: use more specific classes instead of . ([^x]* instead of .*), atomic groups or possessive quantifiers (in PCRE and Java), and above all profile patterns with the regex tester against realistic data. If a regex takes more than 100 ms on 1 MB of text, the cause is usually a lookaround with an unbounded inner pattern — and a rewrite often saves two orders of magnitude. A second rule of thumb: if your pattern contains more than three nested lookarounds, chances are a two-stage process (tokenize first, validate each token) is easier to maintain and faster. Regular expressions are a sharp tool but not a parser — once you sink into towers of lookarounds, a different algorithm is usually the better choice.
Frequently asked questions
Why is my regex with a lookaround so slow?
Usually there's a .* or .+ inside the lookaround, and on failure the engine tries every split (catastrophic backtracking). Replace . with a specific negated class (e.g. [^,] for CSV parsing), use possessive quantifiers if the engine supports them, and test against long inputs with the CalcSI regex tester. If that's not enough: algorithmically splitting the pattern into two steps (coarse match, then validate) is often faster than a mega-regex.
Which languages support variable-length lookbehinds?
.NET supports them fully. JavaScript (V8, SpiderMonkey, JavaScriptCore since Safari 16.4) too. Java allows them up to a declared maximum length. Python's standard re module is fixed-length only; the third-party regex module allows variable length. PCRE/PHP allow fixed length plus alternation with fixed alternatives — so (?<=ab|cde) works but not (?<=\w+). Ruby (Onigmo) is fixed-length. If you write portable patterns, assume fixed length.
What's the difference between a lookbehind and a capture group?
A capture group (\$)\d+ consumes the $ and exposes it as group 1 — the match string contains the character. A lookbehind (?<=\$)\d+ tests for the $ but keeps it out of the match. Practical difference: with String.replace or when slicing the match, lookbehind gives you the clean product directly; with a capture group you have to discard the group afterwards. Lookbehind is the right choice when the prefix is a condition but not part of the result.
Comments
Comments are powered by Disqus. Before they load, we need your consent — Disqus is a third-party service and sets its own cookies.