Skip to content

Commit ab6b69d

Browse files
Added DSL documentation to Guide (#2761) (#2784)
* Added DSL documentation to Guide * clarify that this is a Python DSL (cherry picked from commit 8a27080) Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
1 parent 9b3fcf7 commit ab6b69d

13 files changed

+2329
-19
lines changed

docs/guide/dsl/asyncio.asciidoc

+103
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
[[asyncio]]
2+
==== Using asyncio with Elasticsearch Python DSL
3+
4+
The DSL module supports async/await with
5+
https://docs.python.org/3/library/asyncio.html[asyncio]. To ensure that
6+
you have all the required dependencies, install the `[async]`
7+
extra:
8+
9+
[source,bash]
10+
----
11+
$ python -m pip install "elasticsearch[async]"
12+
----
13+
14+
===== Connections
15+
16+
Use the `async_connections` module to manage your asynchronous
17+
connections.
18+
19+
[source,python]
20+
----
21+
from elasticsearch.dsl import async_connections
22+
23+
async_connections.create_connection(hosts=['localhost'], timeout=20)
24+
----
25+
26+
All the options available in the `connections` module can be used with
27+
`async_connections`.
28+
29+
====== How to avoid 'Unclosed client session / connector' warnings on exit
30+
31+
These warnings come from the `aiohttp` package, which is used internally
32+
by the `AsyncElasticsearch` client. They appear often when the
33+
application exits and are caused by HTTP connections that are open when
34+
they are garbage collected. To avoid these warnings, make sure that you
35+
close your connections.
36+
37+
[source,python]
38+
----
39+
es = async_connections.get_connection()
40+
await es.close()
41+
----
42+
43+
===== Search DSL
44+
45+
Use the `AsyncSearch` class to perform asynchronous searches.
46+
47+
[source,python]
48+
----
49+
from elasticsearch.dsl import AsyncSearch
50+
51+
s = AsyncSearch().query("match", title="python")
52+
async for hit in s:
53+
print(hit.title)
54+
----
55+
56+
Instead of using the `AsyncSearch` object as an asynchronous iterator,
57+
you can explicitly call the `execute()` method to get a `Response`
58+
object.
59+
60+
[source,python]
61+
----
62+
s = AsyncSearch().query("match", title="python")
63+
response = await s.execute()
64+
for hit in response:
65+
print(hit.title)
66+
----
67+
68+
An `AsyncMultiSearch` is available as well.
69+
70+
[source,python]
71+
----
72+
from elasticsearch.dsl import AsyncMultiSearch
73+
74+
ms = AsyncMultiSearch(index='blogs')
75+
76+
ms = ms.add(AsyncSearch().filter('term', tags='python'))
77+
ms = ms.add(AsyncSearch().filter('term', tags='elasticsearch'))
78+
79+
responses = await ms.execute()
80+
81+
for response in responses:
82+
print("Results for query %r." % response.search.query)
83+
for hit in response:
84+
print(hit.title)
85+
----
86+
87+
===== Asynchronous Documents, Indexes, and more
88+
89+
The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and
90+
`FacetedSearch` classes all have asynchronous versions that use the same
91+
name with an `Async` prefix. These classes expose the same interfaces as
92+
the synchronous versions, but any methods that perform I/O are defined
93+
as coroutines.
94+
95+
Auxiliary classes that do not perform I/O do not have asynchronous
96+
versions. The same classes can be used in synchronous and asynchronous
97+
applications.
98+
99+
When using a custom analyzer in an asynchronous
100+
application, use the `async_simulate()` method to invoke the Analyze
101+
API on it.
102+
103+
Consult the `api` section for details about each specific method.

docs/guide/dsl/configuration.asciidoc

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
=== Configuration
2+
3+
There are several ways to configure connections for the library. The
4+
easiest and most useful approach is to define one default connection
5+
that can be used every time an API call is made without explicitly
6+
passing in other connections.
7+
8+
[NOTE]
9+
====
10+
Unless you want to access multiple clusters from your application, it is
11+
highly recommended that you use the `create_connection` method and
12+
all operations will use that connection automatically.
13+
====
14+
15+
==== Default connection
16+
17+
To define a default connection that can be used globally, use the
18+
`connections` module and the `create_connection` method like this:
19+
20+
[source,python]
21+
----
22+
from elasticsearch.dsl import connections
23+
24+
connections.create_connection(hosts=['localhost'], timeout=20)
25+
----
26+
27+
===== Single connection with an alias
28+
29+
You can define the `alias` or name of a connection so you can easily
30+
refer to it later. The default value for `alias` is `default`.
31+
32+
[source,python]
33+
----
34+
from elasticsearch.dsl import connections
35+
36+
connections.create_connection(alias='my_new_connection', hosts=['localhost'], timeout=60)
37+
----
38+
39+
Additional keyword arguments (`hosts` and `timeout` in our example) will
40+
be passed to the `Elasticsearch` class from `elasticsearch-py`.
41+
42+
To see all possible configuration options refer to the
43+
https://elasticsearch-py.readthedocs.io/en/latest/api/elasticsearch.html[documentation].
44+
45+
==== Multiple clusters
46+
47+
You can define multiple connections to multiple clusters at the same
48+
time using the `configure` method:
49+
50+
[source,python]
51+
----
52+
from elasticsearch.dsl import connections
53+
54+
connections.configure(
55+
default={'hosts': 'localhost'},
56+
dev={
57+
'hosts': ['esdev1.example.com:9200'],
58+
'sniff_on_start': True
59+
}
60+
)
61+
----
62+
63+
Such connections will be constructed lazily when requested for the first
64+
time.
65+
66+
You can alternatively define multiple connections by adding them one by
67+
one as shown in the following example:
68+
69+
[source,python]
70+
----
71+
# if you have configuration options to be passed to Elasticsearch.__init__
72+
# this also shows creating a connection with the alias 'qa'
73+
connections.create_connection('qa', hosts=['esqa1.example.com'], sniff_on_start=True)
74+
75+
# if you already have an Elasticsearch instance ready
76+
connections.add_connection('another_qa', my_client)
77+
----
78+
79+
===== Using aliases
80+
81+
When using multiple connections, you can refer to them using the string
82+
alias specified when you created the connection.
83+
84+
This example shows how to use an alias to a connection:
85+
86+
[source,python]
87+
----
88+
s = Search(using='qa')
89+
----
90+
91+
A `KeyError` will be raised if there is no connection registered with
92+
that alias.
93+
94+
==== Manual
95+
96+
If you don't want to supply a global configuration, you can always pass
97+
in your own connection as an instance of `elasticsearch.Elasticsearch`
98+
with the parameter `using` wherever it is accepted like this:
99+
100+
[source,python]
101+
----
102+
s = Search(using=Elasticsearch('localhost'))
103+
----
104+
105+
You can even use this approach to override any connection the object
106+
might be already associated with:
107+
108+
[source,python]
109+
----
110+
s = s.using(Elasticsearch('otherhost:9200'))
111+
----
112+
113+
[NOTE]
114+
====
115+
When using the `dsl` module, it is highly recommended that you
116+
use the built-in serializer
117+
(`elasticsearch.dsl.serializer.serializer`) to ensure your objects
118+
are correctly serialized into `JSON` every time. The
119+
`create_connection` method that is described here (and that the
120+
`configure` method uses under the hood) will do that automatically for
121+
you, unless you explicitly specify your own serializer. The built-in
122+
serializer also allows you to serialize your own objects - just define a
123+
`to_dict()` method on your objects and that method will be
124+
automatically called when serializing your custom objects to `JSON`.
125+
====

docs/guide/dsl/examples.asciidoc

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
=== Examples
2+
3+
Please see the
4+
https://github.com/elastic/elasticsearch-py/tree/master/examples/dsl[DSL examples]
5+
directory to see some complex examples using the DSL module.
+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
[[faceted_search]]
2+
==== Faceted Search
3+
4+
The library comes with a simple abstraction aimed at helping you develop
5+
faceted navigation for your data.
6+
7+
[NOTE]
8+
====
9+
This API is experimental and will be subject to change. Any feedback is
10+
welcome.
11+
====
12+
13+
===== Configuration
14+
15+
You can provide several configuration options (as class attributes) when
16+
declaring a `FacetedSearch` subclass:
17+
18+
- `index`:
19+
the name of the index (as string) to search through, defaults to
20+
`'_all'`.
21+
- `doc_types`:
22+
list of `Document` subclasses or strings to be used, defaults to
23+
`['_all']`.
24+
- `fields`:
25+
list of fields on the document type to search through. The list will
26+
be passes to `MultiMatch` query so can contain boost values
27+
(`'title^5'`), defaults to `['*']`.
28+
- `facets`:
29+
dictionary of facets to display/filter on. The key is the name
30+
displayed and values should be instances of any `Facet` subclass, for
31+
example: `{'tags': TermsFacet(field='tags')}`
32+
33+
====== Facets
34+
35+
There are several different facets available:
36+
37+
- `TermsFacet`:
38+
provides an option to split documents into groups based on a value of
39+
a field, for example `TermsFacet(field='category')`
40+
- `DateHistogramFacet`:
41+
split documents into time intervals, example:
42+
`DateHistogramFacet(field="published_date", calendar_interval="day")`
43+
- `HistogramFacet`:
44+
similar to `DateHistogramFacet` but for numerical values:
45+
`HistogramFacet(field="rating", interval=2)`
46+
- `RangeFacet`:
47+
allows you to define your own ranges for a numerical fields:
48+
`RangeFacet(field="comment_count", ranges=[("few", (None, 2)), ("lots", (2, None))])`
49+
- `NestedFacet`:
50+
is just a simple facet that wraps another to provide access to nested
51+
documents:
52+
`NestedFacet('variants', TermsFacet(field='variants.color'))`
53+
54+
By default facet results will only calculate document count, if you wish
55+
for a different metric you can pass in any single value metric
56+
aggregation as the `metric` kwarg
57+
(`TermsFacet(field='tags', metric=A('max', field=timestamp))`). When
58+
specifying `metric` the results will be, by default, sorted in
59+
descending order by that metric. To change it to ascending specify
60+
`metric_sort="asc"` and to just sort by document count use
61+
`metric_sort=False`.
62+
63+
====== Advanced
64+
65+
If you require any custom behavior or modifications simply override one
66+
or more of the methods responsible for the class' functions:
67+
68+
- `search(self)`:
69+
is responsible for constructing the `Search` object used. Override
70+
this if you want to customize the search object (for example by adding
71+
a global filter for published articles only).
72+
- `query(self, search)`:
73+
adds the query position of the search (if search input specified), by
74+
default using `MultiField` query. Override this if you want to modify
75+
the query type used.
76+
- `highlight(self, search)`:
77+
defines the highlighting on the `Search` object and returns a new one.
78+
Default behavior is to highlight on all fields specified for search.
79+
80+
===== Usage
81+
82+
The custom subclass can be instantiated empty to provide an empty search
83+
(matching everything) or with `query`, `filters` and `sort`.
84+
85+
- `query`:
86+
is used to pass in the text of the query to be performed. If `None` is
87+
passed in (default) a `MatchAll` query will be used. For example
88+
`'python web'`
89+
- `filters`:
90+
is a dictionary containing all the facet filters that you wish to
91+
apply. Use the name of the facet (from `.facets` attribute) as the key
92+
and one of the possible values as value. For example
93+
`{'tags': 'python'}`.
94+
- `sort`:
95+
is a tuple or list of fields on which the results should be sorted.
96+
The format of the individual fields are to be the same as those passed
97+
to `~elasticsearch.dsl.Search.sort`.
98+
99+
====== Response
100+
101+
the response returned from the `FacetedSearch` object (by calling
102+
`.execute()`) is a subclass of the standard `Response` class that adds a
103+
property called `facets` which contains a dictionary with lists of
104+
buckets -each represented by a tuple of key, document count and a flag
105+
indicating whether this value has been filtered on.
106+
107+
===== Example
108+
109+
[source,python]
110+
----
111+
from datetime import date
112+
113+
from elasticsearch.dsl import FacetedSearch, TermsFacet, DateHistogramFacet
114+
115+
class BlogSearch(FacetedSearch):
116+
doc_types = [Article, ]
117+
# fields that should be searched
118+
fields = ['tags', 'title', 'body']
119+
120+
facets = {
121+
# use bucket aggregations to define facets
122+
'tags': TermsFacet(field='tags'),
123+
'publishing_frequency': DateHistogramFacet(field='published_from', interval='month')
124+
}
125+
126+
def search(self):
127+
# override methods to add custom pieces
128+
s = super().search()
129+
return s.filter('range', publish_from={'lte': 'now/h'})
130+
131+
bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)})
132+
response = bs.execute()
133+
134+
# access hits and other attributes as usual
135+
total = response.hits.total
136+
print('total hits', total.relation, total.value)
137+
for hit in response:
138+
print(hit.meta.score, hit.title)
139+
140+
for (tag, count, selected) in response.facets.tags:
141+
print(tag, ' (SELECTED):' if selected else ':', count)
142+
143+
for (month, count, selected) in response.facets.publishing_frequency:
144+
print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count)
145+
----

docs/guide/dsl/howto.asciidoc

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
=== How-To Guides
2+
3+
include::search_dsl.asciidoc[]
4+
include::persistence.asciidoc[]
5+
include::faceted_search.asciidoc[]
6+
include::update_by_query.asciidoc[]
7+
include::asyncio.asciidoc[]

0 commit comments

Comments
 (0)