Skip to content

Fix author extraction for single-line declarations(#4229) #4234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

alok1304
Copy link
Contributor

@alok1304 alok1304 commented Apr 9, 2025

Fixes #4229

Currently, I did these things:

1. Split tokens on colons (:)

Previously, entries like Author:Frankie.Chu were treated as a single token. I updated the tokenizer to split on : so that Author and Frankie.Chu are recognized as separate tokens.

2. Remove leading plus sign from author token

There is no support for leading + in author token like #+AUTHOR: Lee Hinman, I removed the leading plus sign from the author token +AUTHOR. There are 45k+ file used these this, these .org file see https://github.com/search?q=%22%2Bauthor%3A%22&type=code&p=5

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: Alok Kumar alokkumarjipura9973@gmail.com

@alok1304 alok1304 force-pushed the issue-4229-author-detection branch 5 times, most recently from 7374632 to 66dba8f Compare April 10, 2025 14:03
@alok1304
Copy link
Contributor Author

still in Author:Frankie.Chu author is not detect because of this single token Frankie.Chu, I am trying to add this in GRAMMER
https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/cluecode/copyrights.py#L3414 I add

    AUTHOR: {<AUTH>  <NNP|NN>}

this worked fine but due to this, we got many false positive detections. like for author be etc..

@alok1304 alok1304 force-pushed the issue-4229-author-detection branch 9 times, most recently from 19b76b5 to 1630148 Compare April 13, 2025 16:46
Reference: aboutcode-org#4229
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
no need to remove single plus sign

Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
If any single word whose first letter is capital and also having dot(.) between word then consider as NNP.

Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
@alok1304 alok1304 force-pushed the issue-4229-author-detection branch from 34f3ded to e104aff Compare April 13, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Author not detected in C++ file
1 participant