Update README
Browse files
README.md
CHANGED
|
@@ -18,9 +18,11 @@ license: llama2
|
|
| 18 |
|
| 19 |
## <a id="models"></a> Usage
|
| 20 |
|
| 21 |
-
To use these models, we highly recommend installing the OpenChat
|
| 22 |
|
| 23 |
-
When started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). See the example request below for reference. Additionally, you can access the [OpenChat Web UI](
|
|
|
|
|
|
|
| 24 |
|
| 25 |
<details>
|
| 26 |
<summary>Example request (click to expand)</summary>
|
|
@@ -33,14 +35,15 @@ curl http://localhost:18888/v1/chat/completions \
|
|
| 33 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
| 34 |
}'
|
| 35 |
```
|
|
|
|
| 36 |
</details>
|
| 37 |
|
| 38 |
| Model | Size | Context | Weights | Serving |
|
| 39 |
|---------------|------|---------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
|
| 40 |
-
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --
|
| 41 |
-
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
<details>
|
| 46 |
<summary>Conversation templates (click to expand)</summary>
|
|
|
|
| 18 |
|
| 19 |
## <a id="models"></a> Usage
|
| 20 |
|
| 21 |
+
To use these models, we highly recommend installing the OpenChat package by following the [installation guide](https://github.com/imoneoi/openchat/#installation) and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using [vLLM](https://github.com/vllm-project/vllm) and can run on a GPU with at least 48GB RAM or two consumer GPUs with tensor parallelism. To enable tensor parallelism, append `--tensor-parallel-size 2` to the serving command.
|
| 22 |
|
| 23 |
+
When started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). See the example request below for reference. Additionally, you can access the [OpenChat Web UI](#web-ui) for a user-friendly experience.
|
| 24 |
+
|
| 25 |
+
To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. We recommend using a [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server for security purposes.
|
| 26 |
|
| 27 |
<details>
|
| 28 |
<summary>Example request (click to expand)</summary>
|
|
|
|
| 35 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
| 36 |
}'
|
| 37 |
```
|
| 38 |
+
|
| 39 |
</details>
|
| 40 |
|
| 41 |
| Model | Size | Context | Weights | Serving |
|
| 42 |
|---------------|------|---------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
|
| 43 |
+
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
| 44 |
+
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
| 45 |
|
| 46 |
+
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
| 47 |
|
| 48 |
<details>
|
| 49 |
<summary>Conversation templates (click to expand)</summary>
|