Skip to content

Commit b1489c8

Browse files
committed
syntax: make \p{cf} work
It turns out that 'cf' is also an abbreviation for the 'Case_Folding' property. Even though we don't actually support a 'Case_Folding' property, a quirk of our code caused 'cf' to fail since it was treated as a normal boolean property instead of a general category. We fix it be special casing it. Note that '\p{gc=cf}' worked and continues to work. If we ever do add the 'Case_Folding' property, we'll not be able to support its abbreviation since it is now taken by 'Format'. Fixes rust-lang#719
1 parent fe9b5c9 commit b1489c8

File tree

3 files changed

+23
-2
lines changed

3 files changed

+23
-2
lines changed

CHANGELOG.md

+10
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
1.4.1 (2020-10-13)
2+
==================
3+
This is a small bug fix release that makes `\p{cf}` work. Previously, it would
4+
report "property not found" even though `cf` is a valid abbreviation for the
5+
`Format` general category.
6+
7+
* [BUG #719](https://github.com/rust-lang/regex/issues/719):
8+
Fixes bug that prevented `\p{cf}` from working.
9+
10+
111
1.4.0 (2020-10-11)
212
==================
313
This releases has a few minor documentation fixes as well as some very minor

regex-syntax/src/unicode.rs

+10-2
Original file line numberDiff line numberDiff line change
@@ -237,8 +237,16 @@ impl<'a> ClassQuery<'a> {
237237
fn canonical_binary(&self, name: &str) -> Result<CanonicalClassQuery> {
238238
let norm = symbolic_name_normalize(name);
239239

240-
if let Some(canon) = canonical_prop(&norm)? {
241-
return Ok(CanonicalClassQuery::Binary(canon));
240+
// This is a special case where 'cf' refers to the 'Format' general
241+
// category, but where the 'cf' abbreviation is also an abbreviation
242+
// for the 'Case_Folding' property. But we want to treat it as
243+
// a general category. (Currently, we don't even support the
244+
// 'Case_Folding' property. But if we do in the future, users will be
245+
// required to spell it out.)
246+
if norm != "cf" {
247+
if let Some(canon) = canonical_prop(&norm)? {
248+
return Ok(CanonicalClassQuery::Binary(canon));
249+
}
242250
}
243251
if let Some(canon) = canonical_gencat(&norm)? {
244252
return Ok(CanonicalClassQuery::GeneralCategory(canon));

tests/unicode.rs

+3
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,9 @@ mat!(
7474
Some((0, 3))
7575
);
7676
mat!(uni_class_gencat_format, r"\p{Format}", "\u{E007F}", Some((0, 4)));
77+
// See: https://github.com/rust-lang/regex/issues/719
78+
mat!(uni_class_gencat_format_abbrev1, r"\p{cf}", "\u{E007F}", Some((0, 4)));
79+
mat!(uni_class_gencat_format_abbrev2, r"\p{gc=cf}", "\u{E007F}", Some((0, 4)));
7780
mat!(
7881
uni_class_gencat_initial_punctuation,
7982
r"\p{Initial_Punctuation}",

0 commit comments

Comments
 (0)