[doc] update README
Browse files
README.md
CHANGED
|
@@ -33,35 +33,23 @@ To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...`
|
|
| 33 |
curl http://localhost:18888/v1/chat/completions \
|
| 34 |
-H "Content-Type: application/json" \
|
| 35 |
-d '{
|
| 36 |
-
"model": "openchat_v3.
|
| 37 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
| 38 |
}'
|
| 39 |
```
|
| 40 |
|
| 41 |
</details>
|
| 42 |
|
| 43 |
-
| Model
|
| 44 |
-
|
| 45 |
-
| OpenChat 3.
|
| 46 |
-
| OpenChat 3.
|
| 47 |
|
| 48 |
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
| 49 |
|
| 50 |
<details>
|
| 51 |
<summary>Conversation templates (click to expand)</summary>
|
| 52 |
|
| 53 |
-
V3.1
|
| 54 |
-
|
| 55 |
-
```python
|
| 56 |
-
# Single-turn V3.1
|
| 57 |
-
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
|
| 58 |
-
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
| 59 |
-
|
| 60 |
-
# Multi-turn V3.1
|
| 61 |
-
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
| 62 |
-
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
V3.2
|
| 66 |
|
| 67 |
```python
|
|
@@ -74,6 +62,18 @@ tokenize("GPT4 User: Hello<|end_of_turn|>GPT4 Assistant: Hi<|end_of_turn|>GPT4 U
|
|
| 74 |
# Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
|
| 75 |
```
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
</details>
|
| 78 |
|
| 79 |
## <a id="benchmarks"></a> Benchmarks
|
|
@@ -82,16 +82,16 @@ We have evaluated our models using the two most popular evaluation benchmarks **
|
|
| 82 |
|
| 83 |
To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
|
| 84 |
|
| 85 |
-
| **Model** | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (
|
| 86 |
-
|
| 87 |
-
| | | |
|
| 88 |
-
| GPT-4 | 1.8T* | 8K |
|
| 89 |
-
| ChatGPT | 175B* | 4K | ❌ | 89.4 | 7.94 |
|
| 90 |
-
| Llama-2-70B-Chat | 70B | 4K | ✅ | 92.7 | 6.86 |
|
| 91 |
-
| **OpenChat 3.
|
| 92 |
-
| **OpenChat 3.
|
| 93 |
-
| Llama-2-13B-Chat | 13B | 4K | ✅ | 81.0 | 6.65 |
|
| 94 |
-
| Vicuna 1.3 | 13B | 2K | ❌ | 82.1 | 6.00 |
|
| 95 |
|
| 96 |
*: Estimated model size
|
| 97 |
|
|
|
|
| 33 |
curl http://localhost:18888/v1/chat/completions \
|
| 34 |
-H "Content-Type: application/json" \
|
| 35 |
-d '{
|
| 36 |
+
"model": "openchat_v3.2",
|
| 37 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
| 38 |
}'
|
| 39 |
```
|
| 40 |
|
| 41 |
</details>
|
| 42 |
|
| 43 |
+
| Model | Size | Context | Weights | Serving |
|
| 44 |
+
|--------------|------|---------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 45 |
+
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
| 46 |
+
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
| 47 |
|
| 48 |
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
| 49 |
|
| 50 |
<details>
|
| 51 |
<summary>Conversation templates (click to expand)</summary>
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
V3.2
|
| 54 |
|
| 55 |
```python
|
|
|
|
| 62 |
# Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
|
| 63 |
```
|
| 64 |
|
| 65 |
+
V3.1
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
# Single-turn V3.1
|
| 69 |
+
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
|
| 70 |
+
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
| 71 |
+
|
| 72 |
+
# Multi-turn V3.1
|
| 73 |
+
tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
| 74 |
+
# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
</details>
|
| 78 |
|
| 79 |
## <a id="benchmarks"></a> Benchmarks
|
|
|
|
| 82 |
|
| 83 |
To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
|
| 84 |
|
| 85 |
+
| **Model** | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (win rate adjusted %)** | **MT-bench (score)** |
|
| 86 |
+
|------------------|----------|-------------|------------|-----------------------------|------------------------------------|----------------------|
|
| 87 |
+
| | | | | **v.s. text-davinci-003** | **v.s. ChatGPT** | |
|
| 88 |
+
| GPT-4 | 1.8T* | 8K | ❌ | 95.3 | 82.5 | 8.99 |
|
| 89 |
+
| ChatGPT | 175B* | 4K | ❌ | 89.4 | 50.0 | 7.94 |
|
| 90 |
+
| Llama-2-70B-Chat | 70B | 4K | ✅ | 92.7 | | 6.86 |
|
| 91 |
+
| **OpenChat 3.2** | **13B** | **4K** | ✅ | **89.1** | **51.6** | **7.01** |
|
| 92 |
+
| **OpenChat 3.1** | **13B** | **4K** | ✅ | **89.5** | **50.0** | **6.65** |
|
| 93 |
+
| Llama-2-13B-Chat | 13B | 4K | ✅ | 81.0 | | 6.65 |
|
| 94 |
+
| Vicuna 1.3 | 13B | 2K | ❌ | 82.1 | 37.5 | 6.00 |
|
| 95 |
|
| 96 |
*: Estimated model size
|
| 97 |
|