| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						library_name: transformers | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- bitnet | 
					
					
						
						| 
							 | 
						- falcon-e | 
					
					
						
						| 
							 | 
						- edge | 
					
					
						
						| 
							 | 
						license: other | 
					
					
						
						| 
							 | 
						license_name: falcon-llm-license | 
					
					
						
						| 
							 | 
						license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						#  Table of Contents | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						0. [TL;DR](#TL;DR) | 
					
					
						
						| 
							 | 
						1. [Model Details](#model-details) | 
					
					
						
						| 
							 | 
						2. [Training Details](#training-details) | 
					
					
						
						| 
							 | 
						3. [Usage](#usage) | 
					
					
						
						| 
							 | 
						4. [Evaluation](#evaluation) | 
					
					
						
						| 
							 | 
						5. [Citation](#citation) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# TL;DR | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# Model Details | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Model Description | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Developed by:** [https://www.tii.ae](https://www.tii.ae) | 
					
					
						
						| 
							 | 
						- **Model type:** Causal decoder-only / Base version | 
					
					
						
						| 
							 | 
						- **Architecture:** Pure-transformer - 1.58bit version | 
					
					
						
						| 
							 | 
						- **Language(s) (NLP):** English | 
					
					
						
						| 
							 | 
						- **License:** Falcon-LLM License | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# Training details | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						For more details about the training protocol of this model, please refer to the [Falcon-E technical blogpost](https://falcon-lm.github.io/blog/falcon-edge/). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# Usage | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Currently to use this model you can either rely on Hugging Face transformers library or [BitNet](https://github.com/microsoft/BitNet) library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the `bfloat16` version of the BitNet model. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Inference | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						#### 🤗 transformers | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						In case you want to perform inference on the BitNet checkpoint run: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						import torch | 
					
					
						
						| 
							 | 
						from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model_id = "tiiuae/Falcon-E-1B-Base" | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model = AutoModelForCausalLM.from_pretrained( | 
					
					
						
						| 
							 | 
						  model_id, | 
					
					
						
						| 
							 | 
						  torch_dtype=torch.bfloat16, | 
					
					
						
						| 
							 | 
						).to("cuda") | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Perform text generation | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						If you want to rather use the classic `bfloat16` version, you can run: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						import torch | 
					
					
						
						| 
							 | 
						from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model_id = "tiiuae/Falcon-E-1B-Base" | 
					
					
						
						| 
							 | 
						revision = "bfloat16" | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model = AutoModelForCausalLM.from_pretrained( | 
					
					
						
						| 
							 | 
						  model_id, | 
					
					
						
						| 
							 | 
						  torch_dtype=torch.bfloat16, | 
					
					
						
						| 
							 | 
						  revision=revision, | 
					
					
						
						| 
							 | 
						).to("cuda") | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Perform text generation | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						#### BitNet | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						git clone https://github.com/microsoft/BitNet && cd BitNet | 
					
					
						
						| 
							 | 
						pip install -r requirements.txt | 
					
					
						
						| 
							 | 
						python setup_env.py --hf-repo tiiuae/Falcon-E-1B-Base -q i2_s | 
					
					
						
						| 
							 | 
						python run_inference.py -m models/Falcon-E-1B-Base/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						#### Apply mlx-lm | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						pip install -U mlx-lm | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Then: | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						mlx_lm.generate --model tiiuae/Falcon-E-3B-Instruct --prompt "Implement bubble sort" --max-tokens 100 --temp 0.1 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Fine-tuning | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						For fine-tuning the model, you should load the `prequantized` revision of the model and use the `onebitllms` Python package: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```diff | 
					
					
						
						| 
							 | 
						import torch | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
					
						
						| 
							 | 
						from trl import SFTTrainer | 
					
					
						
						| 
							 | 
						+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model_id = "tiiuae/Falcon-E-1B-Base" | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized") | 
					
					
						
						| 
							 | 
						model = AutoModelForCausalLM.from_pretrained( | 
					
					
						
						| 
							 | 
						    model_id, | 
					
					
						
						| 
							 | 
						    torch_dtype=torch.bfloat16, | 
					
					
						
						| 
							 | 
						+    revision="prequantized" | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						+ model = replace_linear_with_bitnet_linear(model) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						trainer = SFTTrainer( | 
					
					
						
						| 
							 | 
						    model, | 
					
					
						
						| 
							 | 
						    ... | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						trainer.train() | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						+ quantize_to_1bit(output_directory) | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# Evaluation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						We report in the following table our internal pipeline benchmarks: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						**Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks** | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<details> | 
					
					
						
						| 
							 | 
						<summary class="bold"> For 1B scale models and below </summary> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Model    | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | 
					
					
						
						| 
							 | 
						| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | 
					
					
						
						| 
							 | 
						| Qwen-2.5-0.5B | 0.5B | 1GB | 16.27  | 3.93 | 0.0 | 2.08 | 6.95 | 10.06 | 6.55 | | 
					
					
						
						| 
							 | 
						| SmolLM2-360M | 0.36B | 720MB | 21.15  | 1.21 | 0.0 | 7.73 | 5.54 | 1.88 | 6.25 | | 
					
					
						
						| 
							 | 
						| Qwen-2.5-1.5B | 1.5B  | 3.1GB  | 26.74 | 9.14 | 16.66 | 5.27 | 20.61 | 4.7  | 13.85 | | 
					
					
						
						| 
							 | 
						| Llama-3.2-1B  | 1.24B | 2.47GB | 14.78 | 1.21 | 4.37  | 2.56 | 2.26  | 0    | 4.2   | | 
					
					
						
						| 
							 | 
						| SmolLM2-1.7B  | 1.7B  | 3.4GB  | 24.4  | 2.64 | 9.3   | 4.6  | 12.64 | 3.91 | 9.58  | | 
					
					
						
						| 
							 | 
						| Falcon-3-1B-Base | 1.5B | 3GB   | 24.28 | 3.32 | 11.34 | 9.71 | 6.76  | 3.91 | 9.89  | | 
					
					
						
						| 
							 | 
						| Hymba-1.5B-Base  | 1.5B | 3GB   | 22.95 | 1.36 | 7.69  | 5.18 | 10.25 | 0.78 | 8.04  | | 
					
					
						
						| 
							 | 
						| Falcon-E-1B-Base | 1.8B | **635MB** | 32.9  | 10.97 | 2.8 | 3.65 | 12.28 | 17.82 | 13.40 | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</details> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<details> | 
					
					
						
						| 
							 | 
						<summary class="bold"> For 3B scale models </summary> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Model    | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | 
					
					
						
						| 
							 | 
						| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | 
					
					
						
						| 
							 | 
						| Falcon-3-3B-Base | 3B   | 6.46GB | 15.74 | 11.78 | 21.58 | 6.27  | 18.09 | 6.26 | 15.74 | | 
					
					
						
						| 
							 | 
						| Qwen2.5-3B       | 3B   | 6.17GB | 26.9  | 14.8  | 24.3  | 11.76 | 24.48 | 6.38 | 18.1  | | 
					
					
						
						| 
							 | 
						| Falcon-E-3B-Base |  3B | **999MB** | 36.67  | 13.45 | 8.67 | 4.14 | 19.83 | 27.16 | 18.32 | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</details> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Below are the results for instruction fine-tuned models: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<details> | 
					
					
						
						| 
							 | 
						<summary class="bold"> For 1B scale models and below </summary> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Model    | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | 
					
					
						
						| 
							 | 
						| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | 
					
					
						
						| 
							 | 
						| Qwen-2.5-0.5B-Instruct    | 500M | 1GB    | 30.71 | 0     | 8.43  | 0.94  | 7.75  | 0     | 6.59        | | 
					
					
						
						| 
							 | 
						| SmolLM2-360M-Instruct     | 360M | 720MB  | 38.42 | 1.51  | 4.17  | 2.77  | 1.3   | 0.67  | 8.14        | | 
					
					
						
						| 
							 | 
						| Qwen-2.5-1.5B-Instruct    | 1.5B | 3.1GB  | 44.76 | 22.05 | 19.81 | 3.19  | 19.99 | 0.78  | 18.43       | | 
					
					
						
						| 
							 | 
						| SmolLM2-1.7B              | 1.7B | 3.4GB  | 53.68 | 5.82  | 10.92 | 4.1   | 11.71 | 0     | 15.02       | | 
					
					
						
						| 
							 | 
						| Falcon-3-1B-Instruct      | 1.5B | 3GB    | 55.57 | 6.34  | 12.96 | 10.56 | 9.32  | 2.24  | 16.16       | | 
					
					
						
						| 
							 | 
						| Hymba-1.5B-Instruct           | 1.5B | 3GB    | 60.09 | 2.72  | 4.59  | 1.05  | 11.56 | 5.515 | 14.19       | | 
					
					
						
						| 
							 | 
						| Falcon-E-1B-Instruct | 1.8B | **635MB**  | 54.35 | 9.12  | 16.5  | 2.51  | 19.42 | 9.64  | 18.59       | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</details> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<details> | 
					
					
						
						| 
							 | 
						<summary class="bold"> For 3B scale models </summary> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Model    | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. | | 
					
					
						
						| 
							 | 
						| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- | | 
					
					
						
						| 
							 | 
						| Falcon-3-3B-Instruct      | 3B   | 6.46GB | 69.77 | 25    | 26.29 | 11.13 | 22.28 | 5.15  | 26.6        | | 
					
					
						
						| 
							 | 
						| Qwen2.5-3B-Instruct       | 3B   | 6.17GB | 64.75 | 36.78 | 25.8  | 7.57  | 25.05 | 3.02  | 27.16       | | 
					
					
						
						| 
							 | 
						| Falcon-E-3B-Instruct   | 3B   | **999MB**  | 60.97 | 15.3  | 23.59 | 2.12  | 26.45 | 7.45  | 22.64666667 | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</details> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Useful links | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- View [our release blogpost](https://falcon-lm.github.io/blog/falcon-edge/). | 
					
					
						
						| 
							 | 
						- Learn more about [`onebitllms` library](https://github.com/tiiuae/onebitllms). | 
					
					
						
						| 
							 | 
						- Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Citation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						If the Falcon-E family of models were helpful to your work, feel free to give us a cite. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						@misc{tiionebitllms, | 
					
					
						
						| 
							 | 
						    title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.}, | 
					
					
						
						| 
							 | 
						    author = {Falcon-LLM Team}, | 
					
					
						
						| 
							 | 
						    month = {April}, | 
					
					
						
						| 
							 | 
						    url = {https://falcon-lm.github.io/blog/falcon-edge}, | 
					
					
						
						| 
							 | 
						    year = {2025} | 
					
					
						
						| 
							 | 
						} | 
					
					
						
						| 
							 | 
						``` |