Skip to content

Support search qualifiers #8386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidsvantesson opened this issue Oct 5, 2019 · 22 comments
Open

Support search qualifiers #8386

davidsvantesson opened this issue Oct 5, 2019 · 22 comments
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/enhancement An improvement of existing functionality

Comments

@davidsvantesson
Copy link
Contributor

davidsvantesson commented Oct 5, 2019

  • Gitea version (or commit ref): 1.10.0+dev-375-g8a828500e

Description

Gitea should support search qualifiers when searching for repositories, issues, PRs or users. This would allow more flexible options when searching.

Example when searching for repos:
topic:
is:private
is:public
owner:name

Search qualifiers shall always be AND search terms (in contrast to text search which is OR).

Note: Gitea has divided search terms by comma and not space so this example:
"sentence of words,topic:mytopic,separate sentance"
would search for repositories where "sentence of words" OR "separate sentance" is in the name or description AND has the topic "mytopic".

Maybe this can be solved by indexing with bleve and using required and exclusion of fields:
https://blevesearch.com/docs/Query-String-Query/

@guillep2k
Copy link
Member

* Gitea version (or commit ref): master

Gitea version (or commit ref): 1.10.0+dev-375-g8a828500e

You can use the footer from try.gitea.io. I know this is nit-picky of me, but master is a relative term. 😁

@davidsvantesson
Copy link
Contributor Author

Sure, I am usefully more careful when posting bugs.

@davidsvantesson
Copy link
Contributor Author

Another benefit of this way of searching is that it is more clear for the user when selecting search options in the UI. For example if selecting "Assigned to you" it would end up as "assignee:davidsvantesson" in the search field.

@guillep2k
Copy link
Member

Perhaps we should consider using a localizable resource for this (e.g. "assignee", "asignado", "asignato", etc.). Preferrably with many options. For example, in app.ini:

[search.qualifiers]
ASSIGNEE = assignee, asignado

To avoid complicating the code too much, the system could just use the first one in the list when building the search string for the search field in the returned form, like some sort of "normalization".

As for other kinds of keywords (like "fixes", "closes", etc.) this is not something you want to change whenever you update the translations from Crowding.

@davidsvantesson
Copy link
Contributor Author

I am so used working in English so I didn't consider localization, but it sounds like a good idea.

I also think the current behavior with splitting on "," should be reconsidered, although it would be a breaking change. It doesn't feel very standardized and is not documented (I didn't know about it before looking into code). I think it is better to search each white-space separated word separately and allow quotation marks to search for an exact match. Then package text/scanner can be used with default settings to split the string.

@guillep2k
Copy link
Member

I am so used working in English so I didn't consider localization, but it sounds like a good idea.

I also think the current behavior with splitting on "," should be reconsidered, although it would be a breaking change. It doesn't feel very standardized and is not documented (I didn't know about it before looking into code). I think it is better to search each white-space separated word separately and allow quotation marks to search for an exact match. Then package text/scanner can be used with default settings to split the string.

I agree with you about the spaces. As a breaking change, there are three options the way I see this:

  1. Change the behavior, make the users aware and get used to it (Github and search engines works with spaces, so I think this won't get too much resistance).
  2. Keep the current behavior.
  3. Add an app.ini flag to switch between old and new behaviors.

In this matter I'm all for (1), but it's just my opinion. If implementing (3) is trivial, it may be added and decide a sensible default for it by consensus.

@bagasme
Copy link
Contributor

bagasme commented Oct 8, 2019

For 1.10, I choose option 3 (as a transition into proposed behavior), and on version 1.11, use option 1 (remove transitional flag).

@davidsvantesson
Copy link
Contributor Author

@bagasme This is only a proposal issue so far.

I think the only change of behavior would be that comma should be replaced with no comma, and exact searches need quotes. The previous behavior was undocumented anyhow, I think we can add documentation for the new behavior. The special queries can be kept for backward compatibility (like &topic=1)

Preferably there shall be a "best match" sorting, like how many of the search words occurs in the repo. I don't know how hard it is to make an effective algorithm of that.

@bagasme
Copy link
Contributor

bagasme commented Oct 9, 2019

@davidsvantesson Besides documenting (proposed) new behavior, the old one should also be documented too. This come handy when we switch to new behavior, and users complain when their old/undocumented syntax doesn't work anymore and they want explanations...

@davidsvantesson
Copy link
Contributor Author

I tried to do some research on 'best match' text searches for sql. Most solutions are tied to specific sql databases (eg. oracle: REGEXP_COUNT, MsSQL: Rank).
I found one solution with standard sql to count how many of the words occur for each row. It could maybe work to order by as a best match.

@guillep2k
Copy link
Member

Preferably there shall be a "best match" sorting, like how many of the search words occurs in the repo. I don't know how hard it is to make an effective algorithm of that.

I think bleve can do that; it's the default text search engine for issues. I don't mind if simpler SQL indexing lacks this kind of feature.

For SQL search it's hard to decide what should count as "best match". Number of times a word appears in the title? (that should be low) Number of times it appears in the body? Number of comments, counting the body, that contain the word? Most of these will be very heavy on the database. Hence, bleve should be preferred.

A "best match" is more useful when you do some semantic analysis on the text, like counting any of "do", "did", "done" as synonyms for each other. And that's language dependent; we could make Gitea support x number of languages, but that's another whole can of worms.

@davidsvantesson
Copy link
Contributor Author

Sounds reasonable. So a best match search for repositories would need a bleve only for name, description and other repository metadata.
My concern is that if adding more search words only add more repositories being matched without any reasonable sorting, it might not be very useful. Usually you want to narrow down your search by adding more search terms.

@davidsvantesson
Copy link
Contributor Author

I have not understood bleve fully, but it seems it can support this directly if just indexing different fields of the repo metadata:
https://blevesearch.com/docs/Query-String-Query/

@guillep2k
Copy link
Member

We're not using that interface. We're one level below querying directly constructing the objects manually:

indexerQuery := bleve.NewConjunctionQuery(
numericEqualityQuery(repoID, "RepoID"),
bleve.NewDisjunctionQuery(
newMatchPhraseQuery(keyword, "Title", issueIndexerAnalyzer),
newMatchPhraseQuery(keyword, "Content", issueIndexerAnalyzer),
newMatchPhraseQuery(keyword, "Comments", issueIndexerAnalyzer),
))
search := bleve.NewSearchRequestOptions(indexerQuery, limit, start, false)
result, err := b.indexer.Search(search)

But of course that means we can do it either way.

@guillep2k
Copy link
Member

Here's the code where the analyzers are decided:

} else if err = mapping.AddCustomAnalyzer(issueIndexerAnalyzer, map[string]interface{}{
"type": custom.Name,
"char_filters": []string{},
"tokenizer": unicode.Name,
"token_filters": []string{unicodeNormalizeName, lowercase.Name},

Those decide what kind of analysis you want on the strings (they must be decided at the moment the index is built!).

@davidsvantesson
Copy link
Contributor Author

Maybe it is better to build up custom search query to have more control. If we shall support localization we would need to do that.

@lunny lunny added the type/enhancement An improvement of existing functionality label Oct 11, 2019
@lunny
Copy link
Member

lunny commented Oct 11, 2019

We should have our own rules but not follow bleve's because we will support many indexer backend. i.e. elasticsearch.

@stale
Copy link

stale bot commented Dec 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

@stale stale bot added the issue/stale label Dec 10, 2019
@lunny lunny added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label Dec 10, 2019
@stale stale bot removed the issue/stale label Dec 10, 2019
@hoffmannlin
Copy link

search by owner:name is good ideal, officially can add this function ?

@vexvec
Copy link

vexvec commented Mar 22, 2023

@lunny any progress on this issue?
It's still a problem. Currently I'm directly manipulating the url query parameters to search...
Not the best thing.
So while this is possible It seems that only the UI is lacking the keyword search functionality, at least in the current basic form.

@lunny
Copy link
Member

lunny commented Mar 22, 2023

@lunny any progress on this issue?

It's still a problem. Currently I'm directly manipulating the url query parameters to search...

Not the best thing.

So while this is possible It seems that only the UI is lacking the keyword search functionality, at least in the current basic form.

Nobody are working on this currently.

@bendem
Copy link

bendem commented Mar 27, 2024

Would love to see this for code search as well to be able to filter a query to a subtree like path:ansible/ gitea to look for gitea only in the ansible/ folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/enhancement An improvement of existing functionality
Projects
None yet
Development

No branches or pull requests

7 participants