|
| 1 | +[[faceted_search]] |
| 2 | +==== Faceted Search |
| 3 | + |
| 4 | +The library comes with a simple abstraction aimed at helping you develop |
| 5 | +faceted navigation for your data. |
| 6 | + |
| 7 | +[NOTE] |
| 8 | +==== |
| 9 | +This API is experimental and will be subject to change. Any feedback is |
| 10 | +welcome. |
| 11 | +==== |
| 12 | + |
| 13 | +===== Configuration |
| 14 | + |
| 15 | +You can provide several configuration options (as class attributes) when |
| 16 | +declaring a `FacetedSearch` subclass: |
| 17 | + |
| 18 | +- `index`: |
| 19 | + the name of the index (as string) to search through, defaults to |
| 20 | + `'_all'`. |
| 21 | +- `doc_types`: |
| 22 | + list of `Document` subclasses or strings to be used, defaults to |
| 23 | + `['_all']`. |
| 24 | +- `fields`: |
| 25 | + list of fields on the document type to search through. The list will |
| 26 | + be passes to `MultiMatch` query so can contain boost values |
| 27 | + (`'title^5'`), defaults to `['*']`. |
| 28 | +- `facets`: |
| 29 | + dictionary of facets to display/filter on. The key is the name |
| 30 | + displayed and values should be instances of any `Facet` subclass, for |
| 31 | + example: `{'tags': TermsFacet(field='tags')}` |
| 32 | + |
| 33 | +====== Facets |
| 34 | + |
| 35 | +There are several different facets available: |
| 36 | + |
| 37 | +- `TermsFacet`: |
| 38 | + provides an option to split documents into groups based on a value of |
| 39 | + a field, for example `TermsFacet(field='category')` |
| 40 | +- `DateHistogramFacet`: |
| 41 | + split documents into time intervals, example: |
| 42 | + `DateHistogramFacet(field="published_date", calendar_interval="day")` |
| 43 | +- `HistogramFacet`: |
| 44 | + similar to `DateHistogramFacet` but for numerical values: |
| 45 | + `HistogramFacet(field="rating", interval=2)` |
| 46 | +- `RangeFacet`: |
| 47 | + allows you to define your own ranges for a numerical fields: |
| 48 | + `RangeFacet(field="comment_count", ranges=[("few", (None, 2)), ("lots", (2, None))])` |
| 49 | +- `NestedFacet`: |
| 50 | + is just a simple facet that wraps another to provide access to nested |
| 51 | + documents: |
| 52 | + `NestedFacet('variants', TermsFacet(field='variants.color'))` |
| 53 | + |
| 54 | +By default facet results will only calculate document count, if you wish |
| 55 | +for a different metric you can pass in any single value metric |
| 56 | +aggregation as the `metric` kwarg |
| 57 | +(`TermsFacet(field='tags', metric=A('max', field=timestamp))`). When |
| 58 | +specifying `metric` the results will be, by default, sorted in |
| 59 | +descending order by that metric. To change it to ascending specify |
| 60 | +`metric_sort="asc"` and to just sort by document count use |
| 61 | +`metric_sort=False`. |
| 62 | + |
| 63 | +====== Advanced |
| 64 | + |
| 65 | +If you require any custom behavior or modifications simply override one |
| 66 | +or more of the methods responsible for the class' functions: |
| 67 | + |
| 68 | +- `search(self)`: |
| 69 | + is responsible for constructing the `Search` object used. Override |
| 70 | + this if you want to customize the search object (for example by adding |
| 71 | + a global filter for published articles only). |
| 72 | +- `query(self, search)`: |
| 73 | + adds the query position of the search (if search input specified), by |
| 74 | + default using `MultiField` query. Override this if you want to modify |
| 75 | + the query type used. |
| 76 | +- `highlight(self, search)`: |
| 77 | + defines the highlighting on the `Search` object and returns a new one. |
| 78 | + Default behavior is to highlight on all fields specified for search. |
| 79 | + |
| 80 | +===== Usage |
| 81 | + |
| 82 | +The custom subclass can be instantiated empty to provide an empty search |
| 83 | +(matching everything) or with `query`, `filters` and `sort`. |
| 84 | + |
| 85 | +- `query`: |
| 86 | + is used to pass in the text of the query to be performed. If `None` is |
| 87 | + passed in (default) a `MatchAll` query will be used. For example |
| 88 | + `'python web'` |
| 89 | +- `filters`: |
| 90 | + is a dictionary containing all the facet filters that you wish to |
| 91 | + apply. Use the name of the facet (from `.facets` attribute) as the key |
| 92 | + and one of the possible values as value. For example |
| 93 | + `{'tags': 'python'}`. |
| 94 | +- `sort`: |
| 95 | + is a tuple or list of fields on which the results should be sorted. |
| 96 | + The format of the individual fields are to be the same as those passed |
| 97 | + to `~elasticsearch.dsl.Search.sort`. |
| 98 | + |
| 99 | +====== Response |
| 100 | + |
| 101 | +the response returned from the `FacetedSearch` object (by calling |
| 102 | +`.execute()`) is a subclass of the standard `Response` class that adds a |
| 103 | +property called `facets` which contains a dictionary with lists of |
| 104 | +buckets -each represented by a tuple of key, document count and a flag |
| 105 | +indicating whether this value has been filtered on. |
| 106 | + |
| 107 | +===== Example |
| 108 | + |
| 109 | +[source,python] |
| 110 | +---- |
| 111 | +from datetime import date |
| 112 | +
|
| 113 | +from elasticsearch.dsl import FacetedSearch, TermsFacet, DateHistogramFacet |
| 114 | +
|
| 115 | +class BlogSearch(FacetedSearch): |
| 116 | + doc_types = [Article, ] |
| 117 | + # fields that should be searched |
| 118 | + fields = ['tags', 'title', 'body'] |
| 119 | +
|
| 120 | + facets = { |
| 121 | + # use bucket aggregations to define facets |
| 122 | + 'tags': TermsFacet(field='tags'), |
| 123 | + 'publishing_frequency': DateHistogramFacet(field='published_from', interval='month') |
| 124 | + } |
| 125 | +
|
| 126 | + def search(self): |
| 127 | + # override methods to add custom pieces |
| 128 | + s = super().search() |
| 129 | + return s.filter('range', publish_from={'lte': 'now/h'}) |
| 130 | +
|
| 131 | +bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)}) |
| 132 | +response = bs.execute() |
| 133 | +
|
| 134 | +# access hits and other attributes as usual |
| 135 | +total = response.hits.total |
| 136 | +print('total hits', total.relation, total.value) |
| 137 | +for hit in response: |
| 138 | + print(hit.meta.score, hit.title) |
| 139 | +
|
| 140 | +for (tag, count, selected) in response.facets.tags: |
| 141 | + print(tag, ' (SELECTED):' if selected else ':', count) |
| 142 | +
|
| 143 | +for (month, count, selected) in response.facets.publishing_frequency: |
| 144 | + print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count) |
| 145 | +---- |
0 commit comments