Phi-4-mini-instruct INT4_SYM for Intel NPU

πŸŽ‰ First NPU-optimized Phi-4-mini model with correct quantization for Intel NPU!

Model Description

This is microsoft/Phi-4-mini-instruct (2.6B parameters) converted to OpenVINO IR format with NPU-specific INT4 symmetric quantization.

Key Difference from Standard OpenVINO Models

Critical Discovery: Intel NPU requires INT4_SYM (symmetric, channel-wise) quantization, not the INT4_ASYM (asymmetric, grouped) used by standard OpenVINO pre-converted models.

Quantization Type NPU Compatibility
INT4_ASYM (group_size=64) ❌ FAILS (MatMul errors)
INT4_SYM (channel-wise) βœ… WORKS (this model)

Quantization Details

  • Method: INT4_SYM (symmetric)
  • Group size: -1 (channel-wise, not grouped)
  • Calibration: AWQ + scale_estimation on wikitext2 dataset
  • Distribution: 84% INT4_SYM (128 layers), 16% INT8_ASYM (1 layer)
  • Size: 2.13 GB

Performance on Intel NPU

Tested on Intel Core Ultra 7 155H (NPU driver v32.0.100.4297):

  • Speed: 6.8 tok/s
  • Compilation: 68.5s
  • Inference: Stable, production-ready

Comparison to other models on same hardware (Intel Core Ultra 7 155H):

  • Qwen2.5-1.5B-Instruct (INT4_SYM): 10.7 tok/s (0.87 GB) - Baseline performance
  • Phi-4-mini-instruct (INT4_SYM): 6.8 tok/s (2.13 GB) - 73% more parameters, reasoning capabilities
  • Performance ratio: ~64% of Qwen speed, but significantly more capable model

Usage

Requirements

pip install openvino-genai huggingface-hub

Python API

from openvino_genai import LLMPipeline

# Load and run on Intel NPU
pipe = LLMPipeline("AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov", device="NPU")

# Generate text
response = pipe.generate("Explain quantum computing:", max_new_tokens=100)
print(response)

Streaming

for token in pipe.generate("Write a story:", max_new_tokens=200, stream=True):
    print(token, end='', flush=True)

Why This Matters

Standard OpenVINO Phi-4 models (e.g., OpenVINO/Phi-4-mini-instruct-int4-ov) use INT4_ASYM quantization which fails NPU compilation with errors like:

[ERROR] Channels count of input tensor shape and filter shape must be the same: 0 != 48

This model uses the correct NPU-optimized quantization as specified in Intel's NPU documentation:

optimum-cli export openvino -m microsoft/Phi-4-mini-instruct \
    --weight-format int4 \
    --sym \                    # Symmetric (key for NPU!)
    --group-size -1 \          # Channel-wise (not grouped!)
    --awq --scale-estimation \
    --dataset wikitext2

Model Capabilities

  • Instruction following: Fine-tuned for chat/instruction tasks
  • Reasoning: Enhanced reasoning capabilities (Phi-4 series)
  • Context length: 4096 tokens
  • NPU acceleration: Full hardware offload to Intel NPU

Hardware Requirements

  • Intel NPU: Core Ultra 7 155H (tested), or other NPU 3720/4000 series
  • Driver: v32.0.100.4297 or newer
  • OpenVINO: 2025.3.0 or newer
  • Memory: ~3 GB for model + inference

Limitations

  • NPU only: This model is quantized specifically for Intel NPU
  • Speed trade-off: 6.8 tok/s vs Qwen2.5-1.5B @ 10.7 tok/s on Intel Core Ultra 7 155H
  • Size vs capability: Larger model (2.13 GB) but enhanced reasoning and instruction-following
  • Hardware specific: Performance validated on Intel Core Ultra 7 155H NPU

Citation

If you use this model, please cite:

@misc{phi4-mini-npu-optimized,
  title={Phi-4-mini-instruct INT4_SYM for Intel NPU},
  author={OpenVINO Community},
  year={2025},
  howpublished={\url{https://huggingface.co/AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov}},
}

Acknowledgments

  • Base model: Microsoft Phi-4-mini-instruct
  • Framework: Intel OpenVINO
  • Quantization: NNCF (Neural Network Compression Framework)
  • Discovery: Community finding on NPU quantization requirements

License

MIT (following base model license)

Model Card Contact

For issues or questions about NPU compatibility, please open an issue on the model repository.


Note: This model demonstrates the importance of quantization method selection for hardware-specific optimization. Always verify quantization parameters match target hardware requirements!

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov

Finetuned
(50)
this model