You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
server : output embeddings for all tokens when pooling = none (ggml-org#10861)
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
### POST `/v1/embeddings`: OpenAI-compatible embeddings API
765
765
766
+
This endpoint requires that the model uses a pooling different than type `none`. The embeddings are normalized using the Eucledian norm.
767
+
766
768
*Options:*
767
769
768
770
See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
@@ -795,6 +797,46 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
795
797
}'
796
798
```
797
799
800
+
### POST `/embeddings`: non-OpenAI-compatible embeddings API
801
+
802
+
This endpoint supports all poolings, including `--pooling none`. When the pooling is `none`, the responses will contain the *unnormalized* embeddings for *all* input tokens. For all other pooling types, only the pooled embeddings are returned, normalized using Euclidian norm.
803
+
804
+
Note that the response format of this endpoint is different from `/v1/embeddings`.
805
+
806
+
*Options:*
807
+
808
+
Same as the `/v1/embeddings` endpoint.
809
+
810
+
*Examples:*
811
+
812
+
Same as the `/v1/embeddings` endpoint.
813
+
814
+
**Response format**
815
+
816
+
```json
817
+
[
818
+
{
819
+
"index": 0,
820
+
"embedding": [
821
+
[ ... embeddings for token 0 ... ],
822
+
[ ... embeddings for token 1 ... ],
823
+
[ ... ]
824
+
[ ... embeddings for token N-1 ... ],
825
+
]
826
+
},
827
+
...
828
+
{
829
+
"index": P,
830
+
"embedding": [
831
+
[ ... embeddings for token 0 ... ],
832
+
[ ... embeddings for token 1 ... ],
833
+
[ ... ]
834
+
[ ... embeddings for token N-1 ... ],
835
+
]
836
+
}
837
+
]
838
+
```
839
+
798
840
### GET `/slots`: Returns the current slots processing state
if (!ctx_server.params_base.reranking || ctx_server.params_base.embedding) {
3739
3777
res_error(res, format_error_response("This server does not support reranking. Start it with `--reranking` and without `--embedding`", ERROR_TYPE_NOT_SUPPORTED));
0 commit comments