Skip to content

Commit 3a7c001

Browse files
committed
server : update readme
ggml-ci
1 parent 7e693f9 commit 3a7c001

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

examples/server/README.md

+41-1
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,8 @@ curl http://localhost:8080/v1/chat/completions \
763763

764764
### POST `/v1/embeddings`: OpenAI-compatible embeddings API
765765

766+
This endpoint requires that the model uses a pooling different than type `none`.
767+
766768
*Options:*
767769

768770
See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
@@ -795,7 +797,45 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
795797
}'
796798
```
797799

798-
When `--pooling none` is used, the server will output an array of embeddings - one for each token in the input.
800+
### POST `/embeddings`: non-OpenAI-compatible embeddings API
801+
802+
This endpoint supports `--pooling none`. When used, the responses will contain the embeddings for all input tokens.
803+
Note that the response format is slightly different than `/v1/embeddings` - it does not have the `"data"` sub-tree and the
804+
embeddings are always returned as vector of vectors.
805+
806+
*Options:*
807+
808+
Same as the `/v1/embeddings` endpoint.
809+
810+
*Examples:*
811+
812+
Same as the `/v1/embeddings` endpoint.
813+
814+
**Response format**
815+
816+
```json
817+
[
818+
{
819+
"index": 0,
820+
"embedding": [
821+
[ ... embeddings for token 0 ... ],
822+
[ ... embeddings for token 1 ... ],
823+
[ ... ]
824+
[ ... embeddings for token N-1 ... ],
825+
]
826+
},
827+
...
828+
{
829+
"index": P,
830+
"embedding": [
831+
[ ... embeddings for token 0 ... ],
832+
[ ... embeddings for token 1 ... ],
833+
[ ... ]
834+
[ ... embeddings for token N-1 ... ],
835+
]
836+
}
837+
]
838+
```
799839

800840
### GET `/slots`: Returns the current slots processing state
801841

0 commit comments

Comments
 (0)