Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,10 @@ Note: The following benchmarks are evaluated by TRT-LLM-backend
|
|
| 96 |
|
| 97 |
You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
### Inference Performance
|
| 100 |
|
| 101 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|
|
|
|
| 96 |
|
| 97 |
You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
|
| 98 |
|
| 99 |
+
#### Inference Framework
|
| 100 |
+
- This open-source release offers two inference backend options tailored for the Hunyuan-7B model: the popular [vLLM-backend](https://github.com/quinnrong94/vllm/tree/dev_hunyuan) and the TensorRT-LLM Backend. In this release, we are initially open-sourcing the vLLM solution, with plans to release the TRT-LLM solution in the near future.
|
| 101 |
+
|
| 102 |
+
|
| 103 |
### Inference Performance
|
| 104 |
|
| 105 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|