license: mit
Model Architecture
Base Model: Llama 3 (70 billion parameters)
Quantization: 4-bit integer quantization for memory and computational efficiency
Framework: Fine-tuned with PyTorch, leveraging Hugging Face Transformers
PIM Optimization: Enhanced for PIM hardware to process data directly in memory, minimizing latency and maximizing throughput
Intended Use
Primary Use Cases:
Large-scale text generation Summarization Question answering Conversational AI Text classification
Research Focus:
This model is specifically designed for research and industrial applications that require efficient handling of large language models with constrained hardware resources.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support