--- license: apache-2.0 tags: - llm - tinyllama - function-calling - cpu-optimized - low-resource --- # TinyLlama Function Calling (CPU Optimized) This is a CPU-optimized version of TinyLlama that has been fine-tuned for function calling capabilities. ## Model Details - **Base Model**: TinyLlama-1.1B-Chat-v1.0 - **Parameters**: 1.1 billion - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) - **Training Data**: Function calling examples from Glaive Function Calling v2 dataset - **Optimization**: Merged LoRA weights, converted to float32 for CPU deployment ## Key Features 1. **Function Calling Capabilities**: The model can identify when functions should be called and generate appropriate function call syntax 2. **CPU Optimized**: Ready to run efficiently on low-end hardware without GPUs 3. **Lightweight**: Only 1.1B parameters, making it suitable for older hardware 4. **Low Resource Requirements**: Requires only 4-6 GB RAM for loading ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the model model = AutoModelForCausalLM.from_pretrained("tinyllama-function-calling-cpu-optimized") tokenizer = AutoTokenizer.from_pretrained("tinyllama-function-calling-cpu-optimized") # Example prompt for function calling prompt = """### Instruction: Given the available functions and the user query, determine which function(s) to call and with what arguments. Available functions: { "name": "get_exchange_rate", "description": "Get the exchange rate between two currencies", "parameters": { "type": "object", "properties": { "base_currency": { "type": "string", "description": "The currency to convert from" }, "target_currency": { "type": "string", "description": "The currency to convert to" } }, "required": [ "base_currency", "target_currency" ] } } User query: What is the exchange rate from USD to EUR? ### Response:""" # Tokenize and generate response inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Performance on Low-End Hardware The CPU-optimized model requires approximately: - 4-6 GB RAM for loading - 2-4 CPU cores for inference - No GPU required This makes it suitable for: - Older laptops (2018 and newer) - Low-end desktops - Edge devices with ARM processors ## Training Process The model was fine-tuned using LoRA (Low-Rank Adaptation) on the Glaive Function Calling v2 dataset. Only a subset of 50 examples was used for demonstration purposes. ## License This model is licensed under the Apache 2.0 license.