Upload README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,62 +1,62 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            license: mit
         | 
| 3 | 
            -
            pipeline_tag: text-generation
         | 
| 4 | 
            -
            tags:
         | 
| 5 | 
            -
             - ONNX
         | 
| 6 | 
            -
             - ONNXRuntime
         | 
| 7 | 
            -
             - ONNXRuntimeWeb
         | 
| 8 | 
            -
             - phi3
         | 
| 9 | 
            -
             - transformers.js
         | 
| 10 | 
            -
             - transformers
         | 
| 11 | 
            -
             - nlp
         | 
| 12 | 
            -
             - conversational
         | 
| 13 | 
            -
             - custom_code
         | 
| 14 | 
            -
            inference: false
         | 
| 15 | 
            -
            ---
         | 
| 16 | 
            -
             | 
| 17 | 
            -
            # Phi-3 Mini-4K-Instruct ONNX model for in-browser inference
         | 
| 18 | 
            -
             | 
| 19 | 
            -
            <!-- Provide a quick summary of what the model is/does. -->
         | 
| 20 | 
            -
            Running Phi3-mini-4K entirely in the browser! Check out this [demo](https://guschmue.github.io/ort-webgpu/chat/index.html).
         | 
| 21 | 
            -
             | 
| 22 | 
            -
            This repository hosts the optimized Web version of ONNX Phi-3-mini-4k-instruct model to accelerate inference in the browser with ONNX Runtime Web.
         | 
| 23 | 
            -
             | 
| 24 | 
            -
            [The Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. 
         | 
| 25 | 
            -
             | 
| 26 | 
            -
            ## How to run
         | 
| 27 | 
            -
             | 
| 28 | 
            -
            [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/build-web-app.html) is a JavaScript library to enable web developers to deploy machine learning models directly in web browsers, offering multiple backends leveraging hardware acceleration. WebGPU backend is recommended to run Phi-3-mini efficiently. 
         | 
| 29 | 
            -
             | 
| 30 | 
            -
             | 
| 31 | 
            -
            Here is an [E2E example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/chat) for running this optimized  Phi3-mini-4K for the web, with ONNX Runtime harnessing WebGPU.
         | 
| 32 | 
            -
             | 
| 33 | 
            -
             | 
| 34 | 
            -
            **Supported devices and browser with WebGPU**: Chrome 113+ and Edge 113+ for Mac, Windows, ChromeOS, and Chrome 121+ for Android. Pls visit [here](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#safari-in-progress) for tracking WebGPU support in browsers
         | 
| 35 | 
            -
             | 
| 36 | 
            -
            ## Performance Metrics
         | 
| 37 | 
            -
            Performance vary between GPUs. The more powerful the GPU, the faster the speed. On a NVIDIA GeForce RTX 4090: ~42 tokens/second 
         | 
| 38 | 
            -
             | 
| 39 | 
            -
             | 
| 40 | 
            -
            ## Additional Details
         | 
| 41 | 
            -
             | 
| 42 | 
            -
            To obtain other optimized Phi3-mini-4k ONNX models for server platforms, Windows, Linux, Mac desktops, and mobile, please visit [Phi-3-mini-4k-instruct onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx). The model differences in the web version compared to other versions:
         | 
| 43 | 
            -
             | 
| 44 | 
            -
            1. the model is fp16 with int4 block quantization for weights
         | 
| 45 | 
            -
            2. the 'logits' output is fp32 
         | 
| 46 | 
            -
            3. the model uses MHA instead of GQA
         | 
| 47 | 
            -
            4. onnx and external data file need to stay below 2GB to be cacheable in chromium
         | 
| 48 | 
            -
             | 
| 49 | 
            -
            To optimize a fine-tuned Phi3-mini-4k model to run with ONNX Runtime Web, please follow [this Olive example](https://github.com/microsoft/Olive/tree/main/examples/phi3). [Olive](https://github.com/microsoft/OLive) is an easy-to-use model optimization tool for generating an optimized ONNX model to efficiently run with ONNX Runtime across platforms.
         | 
| 50 | 
            -
             | 
| 51 | 
            -
             | 
| 52 | 
            -
            ## Model Description
         | 
| 53 | 
            -
             | 
| 54 | 
            -
            - **Developed by:**  Microsoft
         | 
| 55 | 
            -
            - **Model type:** ONNX
         | 
| 56 | 
            -
            - **Inference Language(s) (NLP):** JavaScript
         | 
| 57 | 
            -
            - **License:** MIT
         | 
| 58 | 
            -
            - **Model Description:** This is the web version of the Phi-3 Mini-4K-Instruct model for ONNX Runtime inference.
         | 
| 59 | 
            -
             | 
| 60 | 
            -
             | 
| 61 | 
            -
            ## Model Card Contact
         | 
| 62 | 
            -
            guschmue, qining
         | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: mit
         | 
| 3 | 
            +
            pipeline_tag: text-generation
         | 
| 4 | 
            +
            tags:
         | 
| 5 | 
            +
             - ONNX
         | 
| 6 | 
            +
             - ONNXRuntime
         | 
| 7 | 
            +
             - ONNXRuntimeWeb
         | 
| 8 | 
            +
             - phi3
         | 
| 9 | 
            +
             - transformers.js
         | 
| 10 | 
            +
             - transformers
         | 
| 11 | 
            +
             - nlp
         | 
| 12 | 
            +
             - conversational
         | 
| 13 | 
            +
             - custom_code
         | 
| 14 | 
            +
            inference: false
         | 
| 15 | 
            +
            ---
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            # Phi-3 Mini-4K-Instruct ONNX model for in-browser inference
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            <!-- Provide a quick summary of what the model is/does. -->
         | 
| 20 | 
            +
            Running Phi3-mini-4K entirely in the browser! Check out this [demo](https://guschmue.github.io/ort-webgpu/chat/index.html).
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            This repository hosts the optimized Web version of ONNX Phi-3-mini-4k-instruct model to accelerate inference in the browser with ONNX Runtime Web.
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            [The Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. 
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            ## How to run
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/build-web-app.html) is a JavaScript library to enable web developers to deploy machine learning models directly in web browsers, offering multiple backends leveraging hardware acceleration. WebGPU backend is recommended to run Phi-3-mini efficiently. 
         | 
| 29 | 
            +
             | 
| 30 | 
            +
             | 
| 31 | 
            +
            Here is an [E2E example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/chat) for running this optimized  Phi3-mini-4K for the web, with ONNX Runtime harnessing WebGPU.
         | 
| 32 | 
            +
             | 
| 33 | 
            +
             | 
| 34 | 
            +
            **Supported devices and browser with WebGPU**: Chrome 113+ and Edge 113+ for Mac, Windows, ChromeOS, and Chrome 121+ for Android. Pls visit [here](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#safari-in-progress) for tracking WebGPU support in browsers
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            ## Performance Metrics
         | 
| 37 | 
            +
            Performance vary between GPUs. The more powerful the GPU, the faster the speed. On a NVIDIA GeForce RTX 4090: ~42 tokens/second 
         | 
| 38 | 
            +
             | 
| 39 | 
            +
             | 
| 40 | 
            +
            ## Additional Details
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            To obtain other optimized Phi3-mini-4k ONNX models for server platforms, Windows, Linux, Mac desktops, and mobile, please visit [Phi-3-mini-4k-instruct onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx). The model differences in the web version compared to other versions:
         | 
| 43 | 
            +
             | 
| 44 | 
            +
            1. the model is fp16 with int4 block quantization for weights
         | 
| 45 | 
            +
            2. the 'logits' output is fp32 
         | 
| 46 | 
            +
            3. the model uses MHA instead of GQA
         | 
| 47 | 
            +
            4. onnx and external data file need to stay below 2GB to be cacheable in chromium
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            To optimize a fine-tuned Phi3-mini-4k model to run with ONNX Runtime Web, please follow [this Olive example](https://github.com/microsoft/Olive/tree/main/examples/phi3). [Olive](https://github.com/microsoft/OLive) is an easy-to-use model optimization tool for generating an optimized ONNX model to efficiently run with ONNX Runtime across platforms.
         | 
| 50 | 
            +
             | 
| 51 | 
            +
             | 
| 52 | 
            +
            ## Model Description
         | 
| 53 | 
            +
             | 
| 54 | 
            +
            - **Developed by:**  Microsoft
         | 
| 55 | 
            +
            - **Model type:** ONNX
         | 
| 56 | 
            +
            - **Inference Language(s) (NLP):** JavaScript
         | 
| 57 | 
            +
            - **License:** MIT
         | 
| 58 | 
            +
            - **Model Description:** This is the web version of the Phi-3 Mini-4K-Instruct model for ONNX Runtime inference.
         | 
| 59 | 
            +
             | 
| 60 | 
            +
             | 
| 61 | 
            +
            ## Model Card Contact
         | 
| 62 | 
            +
            guschmue, qining
         | 

