Skip to content

pip search returns results that only partially match the string #3354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edmorley opened this issue Jan 12, 2016 · 7 comments
Closed

pip search returns results that only partially match the string #3354

edmorley opened this issue Jan 12, 2016 · 7 comments
Labels
auto-locked Outdated issues that have been locked by automation C: search 'pip search'

Comments

@edmorley
Copy link
Contributor

STR:
(Using pip v7.1.2 with Python 2.7.10 under MSYS2.)

pip search "treeherder-client"

Expected:
One result, the same as when just the string "treeherder" is used instead, eg:

[~/src]$ pip search "treeherder"
treeherder-client     - Python library to retrieve and submit data to the Treeherder API

Actual:

[~/src]$ pip search "treeherder-client" | wc -l
311
[~/src]$ pip search "treeherder-client" | tail -n 5
basket-client                             - A Python client for Mozilla's basket service.
OnPage-HUB-API-Client                     - Send Page to pager using OnPage HUB API
egenix-mxodbc-connect-client              - eGenix mxODBC Connect Client for Python
hapi-client                               - HyperDNS HAPI Client Libraries and CLI Tools
medeox-provider-client                    - A Medeox provider client library

These results contain the string "client" but not "treeherder".

It would appear pip search is splitting strings on hyphens and then finding any results that match any of the resultant substrings. This is:

PyPI's own search does return many results too, but at least that's sorted by weight:
https://pypi.python.org/pypi?%3Aaction=search&term=treeherder-client&submit=search

Perhaps the output from pip search could:

  • sort by how precise a match it is
  • offer pagination (311 results is not overly helpful)
  • either default to exact-match (particularly if there are hundreds of results), or at least offer that as an option

Many thanks :-)

@Ivoz
Copy link
Contributor

Ivoz commented Feb 2, 2016

Pip is currently just doing a pypi search - search.py#L60

pypi.search({'name': query, 'summary': query}, 'or')

And then sorting by the score that pypi gives in its hits: search.py#L94-L101

# each record has a unique name now, so we will convert the dict into a
# list sorted by score
package_list = sorted(
    packages.values(),
    key=lambda x: x['score'],
    reverse=True,
)

We could do a simple sort by levenstein first perhaps, and then score, using something like pylev @dstufft @pfmoore @xavfernandez @qwcode any thoughts on drawbacks to changing the current sort?

@pfmoore
Copy link
Member

pfmoore commented Feb 2, 2016

+1 on better search, but honestly I never use pip search, I use the PyPI web search directly. So I'd rather see any effort go into improving the underlying search (understood that PyPI codebase is no longer being worked on). Honestly, the facts that if I search for "requests" on PyPI it's 14th in the list, and that requests_ntlm is nowhere on the results for requests-ntlm, are ridiculous.

But if we can improve "pip search" - and maybe add to the output the PyPI URL - then I might use it more (modulo getting my @%^&$ access back, see discussion on the thread about NTLM :-))

@xavfernandez
Copy link
Member

Well pip search is currently a "proxy" to PyPI search API so if the plan is to keep this API in warehouse, I think time would better spent improving the API instead of sidestepping/quick-fixing it in pip.

@daverickdunn
Copy link

Sorry to bump this aging thread, but has this been considered for future updates? Seems like a very simple to implement, yet extremely useful feature; just a simple option to do a strict match. It's irrelevant if the PyPI API is no longer maintained, all a user needs are the results filtered. One is often forced to pipe into grep to find what they want, but that of course doesn't help Windows users...

@jedie
Copy link

jedie commented Jun 27, 2016

You get the same useless result online, e.g.: https://pypi.python.org/pypi?%3Aaction=search&term=django-meta&submit=search

@daverickdunn
Copy link

daverickdunn commented Jun 27, 2016

@jedie It was actually searching for 'django-auth-ldap' that drove me here. Our queries probably return >90% the same packages... On closer inspection, there's probably somewhere in the region of 8000 - 9000 hits for your search, absolutely ridiculous.

Edit: To be fair, 'django-meta' is sorted to the top on the web page, but this doesn't help when using a CLI.

@xavfernandez xavfernandez added the C: search 'pip search' label Oct 15, 2016
@dstufft
Copy link
Member

dstufft commented Mar 30, 2017

Closing this since pip is just using the PyPI search interface and thus it has no mechanism to really control it's searching mechanism. However if we implement something like #395 then we could work on something like this.

@dstufft dstufft closed this as completed Mar 30, 2017
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: search 'pip search'
Projects
None yet
Development

No branches or pull requests

7 participants