Skip to content

Commit d813518

Browse files
committed
Extend ending position of capture search.
When searching for captures, we first use the DFA to find the start and end of the match. We then pass just the matched region of text to the NFA engine to find sub-capture locations. This is a key optimization that prevents the NFA engine from searching a lot more text than what is necessary in some cases. One problem with this is that some instructions determine their match state based on whether the engine is at the boundary of the search text. For example, `$` matches if and only if the engine is at EOF. If we only provide the matched text region, then assertions like `\b` might not work, since it needs to examine at least one character past the end of the match. If we provide the matched text region plus one character, then `$` may match when it shouldn't. Therefore, we provide the matched text plus (at most) two characters. Fixes #334
1 parent d894c63 commit d813518

File tree

2 files changed

+10
-3
lines changed

2 files changed

+10
-3
lines changed

src/exec.rs

+6-3
Original file line numberDiff line numberDiff line change
@@ -850,9 +850,12 @@ impl<'c> ExecNoSync<'c> {
850850
match_start: usize,
851851
match_end: usize,
852852
) -> Option<(usize, usize)> {
853-
// We can't use match_end directly, because we may need to examine
854-
// one "character" after the end of a match for lookahead operators.
855-
let e = cmp::min(next_utf8(text, match_end), text.len());
853+
// We can't use match_end directly, because we may need to examine one
854+
// "character" after the end of a match for lookahead operators. We
855+
// need to move two characters beyond the end, since some look-around
856+
// operations may falsely assume a premature end of text otherwise.
857+
let e = cmp::min(
858+
next_utf8(text, next_utf8(text, match_end)), text.len());
856859
self.captures_nfa(slots, &text[..e], match_start)
857860
}
858861

tests/regression.rs

+4
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,7 @@ mat!(wb_start_x, r"(?u:\b)^(?-u:X)", "X", Some((0, 1)));
8686
// See: https://github.com/rust-lang/regex/issues/321
8787
ismatch!(strange_anchor_non_complete_prefix, r"a^{2}", "", false);
8888
ismatch!(strange_anchor_non_complete_suffix, r"${2}a", "", false);
89+
90+
// See: https://github.com/rust-lang/regex/issues/334
91+
mat!(captures_after_dfa_premature_end, r"a(b*(X|$))?", "abcbX",
92+
Some((0, 1)), None, None);

0 commit comments

Comments
 (0)