File size: 3,003 Bytes

4b7555e

---
license: apache-2.0
tags:
- llm
- tinyllama
- function-calling
- cpu-optimized
- low-resource
---

# TinyLlama Function Calling (CPU Optimized)

This is a CPU-optimized version of TinyLlama that has been fine-tuned for function calling capabilities.

## Model Details

- **Base Model**: TinyLlama-1.1B-Chat-v1.0
- **Parameters**: 1.1 billion
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Training Data**: Function calling examples from Glaive Function Calling v2 dataset
- **Optimization**: Merged LoRA weights, converted to float32 for CPU deployment

## Key Features

1. **Function Calling Capabilities**: The model can identify when functions should be called and generate appropriate function call syntax
2. **CPU Optimized**: Ready to run efficiently on low-end hardware without GPUs
3. **Lightweight**: Only 1.1B parameters, making it suitable for older hardware
4. **Low Resource Requirements**: Requires only 4-6 GB RAM for loading

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model
model = AutoModelForCausalLM.from_pretrained("tinyllama-function-calling-cpu-optimized")
tokenizer = AutoTokenizer.from_pretrained("tinyllama-function-calling-cpu-optimized")

# Example prompt for function calling
prompt = """### Instruction:
Given the available functions and the user query, determine which function(s) to call and with what arguments.

Available functions:
{
    "name": "get_exchange_rate",
    "description": "Get the exchange rate between two currencies",
    "parameters": {
        "type": "object",
        "properties": {
            "base_currency": {
                "type": "string",
                "description": "The currency to convert from"
            },
            "target_currency": {
                "type": "string",
                "description": "The currency to convert to"
            }
        },
        "required": [
            "base_currency",
            "target_currency"
        ]
    }
}

User query: What is the exchange rate from USD to EUR?

### Response:"""

# Tokenize and generate response
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Performance on Low-End Hardware

The CPU-optimized model requires approximately:
- 4-6 GB RAM for loading
- 2-4 CPU cores for inference
- No GPU required

This makes it suitable for:
- Older laptops (2018 and newer)
- Low-end desktops
- Edge devices with ARM processors

## Training Process

The model was fine-tuned using LoRA (Low-Rank Adaptation) on the Glaive Function Calling v2 dataset. Only a subset of 50 examples was used for demonstration purposes.

## License

This model is licensed under the Apache 2.0 license.