openchat
/

openchat_v3.1

@@ -33,35 +33,23 @@ To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...`
 curl http://localhost:18888/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-    "model": "openchat_v3.1_llama2",
     "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
   }'
 ```
 </details>
-| Model         | Size | Context | Weights                                                                 | Serving                                                                                                    |
-|---------------|------|---------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| OpenChat 3.1 | 13B  | 4096    | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120`      |
-| OpenChat 3.2   | 13B  | 4096    | [Huggingface](https://huggingface.co/openchat/openchat_v3.2)     | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120`        |
 For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
 <details>
   <summary>Conversation templates (click to expand)</summary>
-V3.1
-```python
-# Single-turn V3.1
-tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
-# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
-# Multi-turn V3.1
-tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
-# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
-```
 V3.2
 ```python
@@ -74,6 +62,18 @@ tokenize("GPT4 User: Hello<|end_of_turn|>GPT4 Assistant: Hi<|end_of_turn|>GPT4 U
 # Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
 ```
 </details>
 ## <a id="benchmarks"></a> Benchmarks
@@ -82,16 +82,16 @@ We have evaluated our models using the two most popular evaluation benchmarks **
 To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
-| **Model**        | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (score)** | **MT-bench (win rate adjusted %)** |
-|------------------|----------|-------------|-----------|-----------------------------|----------------------|------------------------------------|
-|                  |          |             |           | **v.s. text-davinci-003**   |                      | **v.s. ChatGPT**                   |
-| GPT-4            | 1.8T*    | 8K          | ��         | 95.3                        | 8.99                 | 82.5                               |
-| ChatGPT          | 175B*    | 4K          | ❌         | 89.4                        | 7.94                 | 50.0                               |
-| Llama-2-70B-Chat | 70B      | 4K          | ✅         | 92.7                        | 6.86                 |                                    |
-| **OpenChat 3.1** | 13B      | 4K          | ✅         | **89.5**                    | **6.65**             | **50.0**                           |
-| **OpenChat 3.2** | 13B      | 4K          | ✅         | **89.1**                    | **7.01**             | **51.6**                           |
-| Llama-2-13B-Chat | 13B      | 4K          | ✅         | 81.0                        | 6.65                 |                                    |
-| Vicuna 1.3       | 13B      | 2K          | ❌         | 82.1                        | 6.00                 | 37.5                               |
 *: Estimated model size

 curl http://localhost:18888/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
+    "model": "openchat_v3.2",
     "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
   }'
 ```
 </details>
+| Model        | Size | Context | Weights                                                      | Serving                                                                                                                                                                      |
+|--------------|------|---------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| OpenChat 3.2 | 13B  | 4096    | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120`        |
+| OpenChat 3.1 | 13B  | 4096    | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
 For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
 <details>
   <summary>Conversation templates (click to expand)</summary>
 V3.2
 ```python
 # Result: [1, 402, 7982, 29946, 4911, 29901, 15043, 32000, 402, 7982, 29946, 4007, 22137, 29901, 6324, 32000, 402, 7982, 29946, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 402, 7982, 29946, 4007, 22137, 29901]
 ```
+V3.1
+```python
+# Single-turn V3.1
+tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant:")
+# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
+# Multi-turn V3.1
+tokenize("Assistant is GPT4<|end_of_turn|>User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
+# Result: [1, 4007, 22137, 338, 402, 7982, 29946, 32000, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
+```
 </details>
 ## <a id="benchmarks"></a> Benchmarks
 To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.api_base` to `http://localhost:18888/v1` in the benchmark program.
+| **Model**        | **Size** | **Context** | **💲Free** | **AlpacaEval (win rate %)** | **MT-bench (win rate adjusted %)** | **MT-bench (score)** |
+|------------------|----------|-------------|------------|-----------------------------|------------------------------------|----------------------|
+|                  |          |             |            | **v.s. text-davinci-003**   | **v.s. ChatGPT**                   |                      |
+| GPT-4            | 1.8T*    | 8K          | ❌         | 95.3                        | 82.5                               | 8.99                 |
+| ChatGPT          | 175B*    | 4K          | ❌         | 89.4                        | 50.0                               | 7.94                 |
+| Llama-2-70B-Chat | 70B      | 4K          | ✅         | 92.7                        |                                    | 6.86                 |
+| **OpenChat 3.2** | **13B**  | **4K**      | ✅         | **89.1**                    | **51.6**                           | **7.01**             |
+| **OpenChat 3.1** | **13B**  | **4K**      | ✅         | **89.5**                    | **50.0**                           | **6.65**             |
+| Llama-2-13B-Chat | 13B      | 4K          | ✅         | 81.0                        |                                    | 6.65                 |
+| Vicuna 1.3       | 13B      | 2K          | ❌         | 82.1                        | 37.5                               | 6.00                 |
 *: Estimated model size