-
Notifications
You must be signed in to change notification settings - Fork 2.7k
TokensRegex cannot detect rules cross the period '.' #1396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's an excellent guess, and in fact that's the first place I went to when trying to diagnose this. The problem is clearly that the tokenizer is splitting this into two sentences rather than keeping it as one. However, despite having
My belief is that we need to add
|
It's a small change, but I hesitate to make such a change without running it by my PI @manning (who is unfortunately out of town for the next couple weeks). If you want to give the |
Thank you for the fix and quick response! |
Hi @AngledLuffa, I wonder if @manning had a chance to see if we can merge the current fix? Thank you. |
It's already merged. I can make a new release that includes it soon if you need, or you can just use the dev branch |
That would be great if we can have a release version of it. Thanks very much for the quick response! |
It's not an official release yet, but I built a version here: https://nlp.stanford.edu/software/stanford-corenlp-4.5.5b.zip I'd like to make some more changes before making an official release |
--Thanks very much for the release! --Looking forward to the official release because our security policies only allow official releases of dependencies to move to production. --Shall I know when the official release will occur? Thanks! |
Now released in 4.5.6 (may take a little time to show up on Maven) |
Thanks so much! I appreciate it @AngledLuffa |
The task is to detect apartment number via tokensRegex.
Example sentence: I live in 123 Pretty RD, APT. #456.
Here is the rule used to detect the apartment number: { ruleType: "tokens", pattern: ( /APT/ /./ /#/ [{word:/[0-9]+/}]), action: Annotate($0, ner, "APT#"), result:"APARTMENT NUMBER"}
Above rule failed to detect the pattern APT. #456. It looks like TokensRegex cannot correctly recognize the rule across the period '.'
A guess is a change in line 713 would do the trick …
https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/process/PTBLexer.flex
The text was updated successfully, but these errors were encountered: