Successmove commited on
Commit
4b7555e
·
verified ·
1 Parent(s): 0fcc129

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - llm
5
+ - tinyllama
6
+ - function-calling
7
+ - cpu-optimized
8
+ - low-resource
9
+ ---
10
+
11
+ # TinyLlama Function Calling (CPU Optimized)
12
+
13
+ This is a CPU-optimized version of TinyLlama that has been fine-tuned for function calling capabilities.
14
+
15
+ ## Model Details
16
+
17
+ - **Base Model**: TinyLlama-1.1B-Chat-v1.0
18
+ - **Parameters**: 1.1 billion
19
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
20
+ - **Training Data**: Function calling examples from Glaive Function Calling v2 dataset
21
+ - **Optimization**: Merged LoRA weights, converted to float32 for CPU deployment
22
+
23
+ ## Key Features
24
+
25
+ 1. **Function Calling Capabilities**: The model can identify when functions should be called and generate appropriate function call syntax
26
+ 2. **CPU Optimized**: Ready to run efficiently on low-end hardware without GPUs
27
+ 3. **Lightweight**: Only 1.1B parameters, making it suitable for older hardware
28
+ 4. **Low Resource Requirements**: Requires only 4-6 GB RAM for loading
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ import torch
35
+
36
+ # Load the model
37
+ model = AutoModelForCausalLM.from_pretrained("tinyllama-function-calling-cpu-optimized")
38
+ tokenizer = AutoTokenizer.from_pretrained("tinyllama-function-calling-cpu-optimized")
39
+
40
+ # Example prompt for function calling
41
+ prompt = """### Instruction:
42
+ Given the available functions and the user query, determine which function(s) to call and with what arguments.
43
+
44
+ Available functions:
45
+ {
46
+ "name": "get_exchange_rate",
47
+ "description": "Get the exchange rate between two currencies",
48
+ "parameters": {
49
+ "type": "object",
50
+ "properties": {
51
+ "base_currency": {
52
+ "type": "string",
53
+ "description": "The currency to convert from"
54
+ },
55
+ "target_currency": {
56
+ "type": "string",
57
+ "description": "The currency to convert to"
58
+ }
59
+ },
60
+ "required": [
61
+ "base_currency",
62
+ "target_currency"
63
+ ]
64
+ }
65
+ }
66
+
67
+ User query: What is the exchange rate from USD to EUR?
68
+
69
+ ### Response:"""
70
+
71
+ # Tokenize and generate response
72
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
73
+ with torch.no_grad():
74
+ outputs = model.generate(
75
+ **inputs,
76
+ max_new_tokens=150,
77
+ do_sample=True,
78
+ temperature=0.7,
79
+ top_k=50,
80
+ top_p=0.95
81
+ )
82
+
83
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
84
+ print(response)
85
+ ```
86
+
87
+ ## Performance on Low-End Hardware
88
+
89
+ The CPU-optimized model requires approximately:
90
+ - 4-6 GB RAM for loading
91
+ - 2-4 CPU cores for inference
92
+ - No GPU required
93
+
94
+ This makes it suitable for:
95
+ - Older laptops (2018 and newer)
96
+ - Low-end desktops
97
+ - Edge devices with ARM processors
98
+
99
+ ## Training Process
100
+
101
+ The model was fine-tuned using LoRA (Low-Rank Adaptation) on the Glaive Function Calling v2 dataset. Only a subset of 50 examples was used for demonstration purposes.
102
+
103
+ ## License
104
+
105
+ This model is licensed under the Apache 2.0 license.
chat_template.jinja ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}
2
+ {% if message['role'] == 'user' %}
3
+ {{ '<|user|>
4
+ ' + message['content'] + eos_token }}
5
+ {% elif message['role'] == 'system' %}
6
+ {{ '<|system|>
7
+ ' + message['content'] + eos_token }}
8
+ {% elif message['role'] == 'assistant' %}
9
+ {{ '<|assistant|>
10
+ ' + message['content'] + eos_token }}
11
+ {% endif %}
12
+ {% if loop.last and add_generation_prompt %}
13
+ {{ '<|assistant|>' }}
14
+ {% endif %}
15
+ {% endfor %}
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 64,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5632,
14
+ "max_position_embeddings": 2048,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 22,
19
+ "num_key_value_heads": 4,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 10000.0,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.52.4",
27
+ "use_cache": true,
28
+ "vocab_size": 32000
29
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.52.4"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c815995697257be7a51dc7b41c6eafc6ffe0a37a9a11ac0d864bee27c07e8c7b
3
+ size 4400216536
optimization_info.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Model optimized for CPU deployment
2
+ ==================================
3
+ - Merged LoRA weights with base model
4
+ - Converted to float32 precision
5
+ - Optimized for CPU inference
6
+ - Ready for function calling tasks
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": false,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "padding_side": "right",
39
+ "sp_model_kwargs": {},
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }