Skip to content

Commit c6d5a13

Browse files
committed
Polishing.
See #3868
1 parent ca11738 commit c6d5a13

File tree

4 files changed

+176
-9
lines changed

4 files changed

+176
-9
lines changed

src/main/antora/antora-playbook.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ content:
1717
- url: https://github.com/spring-projects/spring-data-commons
1818
# Refname matching:
1919
# https://docs.antora.org/antora/latest/playbook/content-refname-matching/
20-
branches: [ main, 3.4.x ]
20+
branches: [ 4.0.x ]
2121
start_path: src/main/antora
2222
asciidoc:
2323
attributes:

src/main/antora/modules/ROOT/pages/jpa/query-methods.adoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,7 @@ This is a lighter variant than paging because it does not require the total resu
428428
3. <<repositories.scrolling.keyset,Keyset-baset scrolling>>.
429429
This method avoids https://use-the-index-luke.com/no-offset[the shortcomings of offset-based result retrieval by leveraging database indexes].
430430

431-
Read more on <<repositories.scrolling.guidance,which method to use best>> for your particular arrangement.
431+
Read more on xref:repositories/query-methods-details.adoc#repositories.scrolling.guidance[which method to use best] for your particular arrangement.
432432

433433
You can use the Scroll API with query methods, xref:repositories/query-by-example.adoc[Query-by-Example], and xref:repositories/core-extensions.adoc#core.extensions.querydsl[Querydsl].
434434

Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
:vector-search-intro-include: data-jpa::partial$vector-search-intro-include.adoc
2-
:vector-search-model-include: data-jpa::partial$vector-search-model-include.adoc
3-
:vector-search-repository-include: data-jpa::partial$vector-search-repository-include.adoc
4-
:vector-search-scoring-include: data-jpa::partial$vector-search-scoring-include.adoc
5-
:vector-search-method-derived-include: data-jpa::partial$vector-search-method-derived-include.adoc
6-
:vector-search-method-annotated-include: data-jpa::partial$vector-search-method-annotated-include.adoc
1+
:vector-search-intro-include: partial$vector-search-intro-include.adoc
2+
:vector-search-model-include: partial$vector-search-model-include.adoc
3+
:vector-search-repository-include: partial$vector-search-repository-include.adoc
4+
:vector-search-scoring-include: partial$vector-search-scoring-include.adoc
5+
:vector-search-method-derived-include: partial$vector-search-method-derived-include.adoc
6+
:vector-search-method-annotated-include: partial$vector-search-method-annotated-include.adoc
77

8-
include::{commons}@data-commons::page$repositories/vector-search.adoc[]
8+
include::partial$vector-search.adoc[]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
[[vector-search]]
2+
= Vector Search
3+
4+
With the rise of Generative AI, Vector databases have gained strong traction in the world of databases.
5+
These databases enable efficient storage and querying of high-dimensional vectors, making them well-suited for tasks such as semantic search, recommendation systems, and natural language understanding.
6+
7+
Vector search is a technique that retrieves semantically similar data by comparing vector representations (also known as embeddings) rather than relying on traditional exact-match queries.
8+
This approach enables intelligent, context-aware applications that go beyond keyword-based retrieval.
9+
10+
In the context of Spring Data, vector search opens new possibilities for building intelligent, context-aware applications, particularly in domains like natural language processing, recommendation systems, and generative AI.
11+
By modelling vector-based querying using familiar repository abstractions, Spring Data allows developers to seamlessly integrate similarity-based vector-capable databases with the simplicity and consistency of the Spring Data programming model.
12+
13+
ifdef::vector-search-intro-include[]
14+
include::{vector-search-intro-include}[]
15+
endif::[]
16+
17+
[[vector-search.model]]
18+
== Vector Model
19+
20+
To support vector search in a type-safe and idiomatic way, Spring Data introduces the following core abstractions:
21+
22+
* <<vector-search.model.vector,`Vector`>>
23+
* <<vector-search.model.search-result,`SearchResults<T>` and `SearchResult<T>`>>
24+
* <<vector-search.model.scoring,`Score`, `Similarity` and Scoring Functions>>
25+
26+
[[vector-search.model.vector]]
27+
=== `Vector`
28+
29+
The `Vector` type represents an n-dimensional numerical embedding, typically produced by embedding models.
30+
In Spring Data, it is defined as a lightweight wrapper around an array of floating-point numbers, ensuring immutability and consistency.
31+
This type can be used as an input for search queries or as a property on a domain entity to store the associated vector representation.
32+
33+
====
34+
[source,java]
35+
----
36+
Vector vector = Vector.of(0.23f, 0.11f, 0.77f);
37+
----
38+
====
39+
40+
Using `Vector` in your domain model removes the need to work with raw arrays or lists of numbers, providing a more type-safe and expressive way to handle vector data.
41+
This abstraction also allows for easy integration with various vector databases and libraries.
42+
It also allows for implementing vendor-specific optimizations such as binary or quantized vectors that do not map to a standard floating point (`float` and `double` as of https://en.wikipedia.org/wiki/IEEE_754[IEEE 754]) representation.
43+
A domain object can have a vector property, which can be used for similarity searches.
44+
Consider the following example:
45+
46+
ifdef::vector-search-model-include[]
47+
include::{vector-search-model-include}[]
48+
endif::[]
49+
50+
NOTE: Associating a vector with a domain object results in the vector being loaded and stored as part of the entity lifecycle, which may introduce additional overhead on retrieval and persistence operations.
51+
52+
[[vector-search.model.search-result]]
53+
=== Search Results
54+
55+
The `SearchResult<T>` type encapsulates the results of a vector similarity query.
56+
It includes both the matched domain object and a relevance score that indicates how closely it matches the query vector.
57+
This abstraction provides a structured way to handle result ranking and enables developers to easily work with both the data and its contextual relevance.
58+
59+
ifdef::vector-search-repository-include[]
60+
include::{vector-search-repository-include}[]
61+
endif::[]
62+
63+
In this example, the `searchByCountryAndEmbeddingNear` method returns a `SearchResults<Comment>` object, which contains a list of `SearchResult<Comment>` instances.
64+
Each result includes the matched `Comment` entity and its relevance score.
65+
66+
Relevance score is a numerical value that indicates how closely the matched vector aligns with the query vector.
67+
Depending on whether a score represents distance or similarity a higher score can mean a closer match or a more distant one.
68+
69+
The scoring function used to calculate this score can vary based on the underlying database, index or input parameters.
70+
71+
[[vector-search.model.scoring]]
72+
=== Score, Similarity, and Scoring Functions
73+
74+
The `Score` type holds a numerical value indicating the relevance of a search result.
75+
It can be used to rank results based on their similarity to the query vector.
76+
The `Score` type is typically a floating-point number, and its interpretation (higher is better or lower is better) depends on the specific similarity function used.
77+
Scores are a by-product of vector search and are not required for a successful search operation.
78+
Score values are not part of a domain model and therefore represented best as out-of-band data.
79+
80+
Generally, a Score is computed by a `ScoringFunction`.
81+
The actual scoring function used to calculate this score can depends on the underlying database and can be obtained from a search index or input parameters.
82+
83+
Spring Data support declares constants for commonly used functions such as:
84+
85+
Euclidean Distance:: Calculates the straight-line distance in n-dimensional space involving the square root of the sum of squared differences.
86+
Cosine Similarity:: Measures the angle between two vectors by calculating the Dot product first and then normalizing its result by dividing by the product of their lengths.
87+
Dot Product:: Computes the sum of element-wise multiplications.
88+
89+
The choice of similarity function can impact both the performance and semantics of the search and is often determined by the underlying database or index being used.
90+
Spring Data adopts to the database's native scoring function capabilities and whether the score can be used to limit results.
91+
92+
ifdef::vector-search-scoring-include[]
93+
include::{vector-search-scoring-include}[]
94+
endif::[]
95+
96+
[[vector-search.methods]]
97+
== Vector Search Methods
98+
99+
Vector search methods are defined in repositories using the same conventions as standard Spring Data query methods.
100+
These methods return `SearchResults<T>` and require a `Vector` parameter to define the query vector.
101+
The actual implementation depends on the actual internals of the underlying data store and its capabilities around vector search.
102+
103+
NOTE: If you are new to Spring Data repositories, make sure to familiarize yourself with the xref:repositories/core-concepts.adoc[basics of repository definitions and query methods].
104+
105+
Generally, you have the choice of declaring a search method using two approaches:
106+
107+
* Query Derivation
108+
* Declaring a String-based Query
109+
110+
Vector Search methods must declare a `Vector` parameter to define the query vector.
111+
112+
[[vector-search.method.derivation]]
113+
=== Derived Search Methods
114+
115+
A derived search method uses the name of the method to derive the query.
116+
Vector Search supports the following keywords to run a Vector search when declaring a search method:
117+
118+
.Query predicate keywords
119+
[options="header",cols="1,3"]
120+
|===============
121+
|Logical keyword|Keyword expressions
122+
|`NEAR`|`Near`, `IsNear`
123+
|`WITHIN`|`Within`, `IsWithin`
124+
|===============
125+
126+
ifdef::vector-search-method-derived-include[]
127+
include::{vector-search-method-derived-include}[]
128+
endif::[]
129+
130+
Derived search methods are typically easier to read and maintain, as they rely on the method name to express the query intent.
131+
However, a derived search method requires either to declare a `Score`, `Range<Score>` or `ScoreFunction` as second argument to the `Near`/`Within` keyword to limit search results by their score.
132+
133+
[[vector-search.method.string]]
134+
=== Annotated Search Methods
135+
136+
Annotated methods provide full control over the query semantics and parameters.
137+
Unlike derived methods, they do not rely on method name conventions.
138+
139+
ifdef::vector-search-method-annotated-include[]
140+
include::{vector-search-method-annotated-include}[]
141+
endif::[]
142+
143+
With more control over the actual query, Spring Data can make fewer assumptions about the query and its parameters.
144+
For example, `Similarity` normalization uses the native score function within the query to normalize the given similarity into a score predicate value and vice versa.
145+
If an annotated query does not define e.g. the score, then the score value in the returned `SearchResult<T>` will be zero.
146+
147+
[[vector-search.method.sorting]]
148+
=== Sorting
149+
150+
By default, search results are ordered according to their score.
151+
You can override sorting by using the `Sort` parameter:
152+
153+
.Using `Sort` in Repository Search Methods
154+
====
155+
[source,java]
156+
----
157+
interface CommentRepository extends Repository<Comment, String> {
158+
159+
SearchResults<Comment> searchByEmbeddingNearOrderByCountry(Vector vector, Score score);
160+
161+
SearchResults<Comment> searchByEmbeddingWithin(Vector vector, Score score, Sort sort);
162+
}
163+
----
164+
====
165+
166+
Please note that custom sorting does not allow expressing the score as a sorting criteria.
167+
You can only refer to domain properties.

0 commit comments

Comments
 (0)