Spaces:
Runtime error
Runtime error
Benjamin Consolvo
commited on
Commit
·
ad676d5
1
Parent(s):
7645d86
doc updates 3
Browse files- app.py +1 -1
- info/deployment.py +7 -1
app.py
CHANGED
|
@@ -30,7 +30,7 @@ with demo:
|
|
| 30 |
follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
|
| 31 |
on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
|
| 32 |
the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
|
| 33 |
-
gr.Markdown("""
|
| 34 |
talk about everything from GenAI, HPC, to Quantum Computing.""")
|
| 35 |
gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 36 |
team for generously sharing their code and best
|
|
|
|
| 30 |
follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
|
| 31 |
on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
|
| 32 |
the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
|
| 33 |
+
gr.Markdown("""Join 5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and
|
| 34 |
talk about everything from GenAI, HPC, to Quantum Computing.""")
|
| 35 |
gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 36 |
team for generously sharing their code and best
|
info/deployment.py
CHANGED
|
@@ -95,9 +95,11 @@ The Intel® Data Center GPU Max Series is Intel's highest performing, highest de
|
|
| 95 |
|
| 96 |
### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
|
| 97 |
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
|
|
|
|
| 98 |
👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
|
| 99 |
|
| 100 |
Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
|
|
|
|
| 101 |
👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
|
| 102 |
|
| 103 |
```python
|
|
@@ -125,6 +127,7 @@ The Intel® Xeon® CPUs have the most built-in accelerators of any CPU on the ma
|
|
| 125 |
|
| 126 |
### Optimum Intel and Intel Extension for PyTorch (no quantization)
|
| 127 |
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
|
|
|
|
| 128 |
👍 [Optimum Intel GitHub](https://github.com/huggingface/optimum-intel)
|
| 129 |
|
| 130 |
Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]`
|
|
@@ -179,6 +182,7 @@ Intel® Core™ Ultra Processors are optimized for premium thin and powerful lap
|
|
| 179 |
|
| 180 |
### Intel® NPU Acceleration Library
|
| 181 |
The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
|
|
|
|
| 182 |
👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
|
| 183 |
|
| 184 |
```python
|
|
@@ -214,6 +218,7 @@ _ = model.generate(**generation_kwargs)
|
|
| 214 |
|
| 215 |
### OpenVINO Tooling with Optimum Intel
|
| 216 |
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
|
|
|
|
| 217 |
👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
|
| 218 |
|
| 219 |
```python
|
|
@@ -235,12 +240,13 @@ pipe("In the spring, beautiful flowers bloom...")
|
|
| 235 |
# Intel® Gaudi Accelerators
|
| 236 |
The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
|
| 237 |
|
| 238 |
-
|
| 239 |
The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
|
| 240 |
(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
|
| 241 |
and graph-level optimizations).
|
| 242 |
|
| 243 |
Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
|
|
|
|
| 244 |
👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
|
| 245 |
|
| 246 |
The "run_generation.py" script below can be found [here on GitHub](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
|
|
|
|
| 95 |
|
| 96 |
### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
|
| 97 |
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
|
| 98 |
+
|
| 99 |
👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
|
| 100 |
|
| 101 |
Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
|
| 102 |
+
|
| 103 |
👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
|
| 104 |
|
| 105 |
```python
|
|
|
|
| 127 |
|
| 128 |
### Optimum Intel and Intel Extension for PyTorch (no quantization)
|
| 129 |
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
|
| 130 |
+
|
| 131 |
👍 [Optimum Intel GitHub](https://github.com/huggingface/optimum-intel)
|
| 132 |
|
| 133 |
Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]`
|
|
|
|
| 182 |
|
| 183 |
### Intel® NPU Acceleration Library
|
| 184 |
The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
|
| 185 |
+
|
| 186 |
👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
|
| 187 |
|
| 188 |
```python
|
|
|
|
| 218 |
|
| 219 |
### OpenVINO Tooling with Optimum Intel
|
| 220 |
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
|
| 221 |
+
|
| 222 |
👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
|
| 223 |
|
| 224 |
```python
|
|
|
|
| 240 |
# Intel® Gaudi Accelerators
|
| 241 |
The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
|
| 242 |
|
| 243 |
+
Intel Gaudi Software supports PyTorch and DeepSpeed for accelerating LLM training and inference.
|
| 244 |
The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
|
| 245 |
(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
|
| 246 |
and graph-level optimizations).
|
| 247 |
|
| 248 |
Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
|
| 249 |
+
|
| 250 |
👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
|
| 251 |
|
| 252 |
The "run_generation.py" script below can be found [here on GitHub](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
|