@@ -763,6 +763,8 @@ curl http://localhost:8080/v1/chat/completions \
763
763
764
764
# ## POST `/v1/embeddings`: OpenAI-compatible embeddings API
765
765
766
+ This endpoint requires that the model uses a pooling different than type `none`.
767
+
766
768
*Options:*
767
769
768
770
See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
@@ -795,7 +797,45 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
795
797
}'
796
798
` ` `
797
799
798
- When `--pooling none` is used, the server will output an array of embeddings - one for each token in the input.
800
+ # ## POST `/embeddings`: non-OpenAI-compatible embeddings API
801
+
802
+ This endpoint supports `--pooling none`. When used, the responses will contain the embeddings for all input tokens.
803
+ Note that the response format is slightly different than `/v1/embeddings` - it does not have the `"data"` sub-tree and the
804
+ embeddings are always returned as vector of vectors.
805
+
806
+ *Options:*
807
+
808
+ Same as the `/v1/embeddings` endpoint.
809
+
810
+ *Examples:*
811
+
812
+ Same as the `/v1/embeddings` endpoint.
813
+
814
+ **Response format**
815
+
816
+ ` ` ` json
817
+ [
818
+ {
819
+ "index": 0,
820
+ "embedding": [
821
+ [ ... embeddings for token 0 ... ],
822
+ [ ... embeddings for token 1 ... ],
823
+ [ ... ]
824
+ [ ... embeddings for token N-1 ... ],
825
+ ]
826
+ },
827
+ ...
828
+ {
829
+ "index": P,
830
+ "embedding": [
831
+ [ ... embeddings for token 0 ... ],
832
+ [ ... embeddings for token 1 ... ],
833
+ [ ... ]
834
+ [ ... embeddings for token N-1 ... ],
835
+ ]
836
+ }
837
+ ]
838
+ ` ` `
799
839
800
840
# ## GET `/slots`: Returns the current slots processing state
801
841
0 commit comments