Falcon-E-3B-Base / README.md

Update README.md

ad18b07 verified 28 days ago

7.39 kB

	---
	library_name: transformers
	tags:
	- bitnet
	- falcon-e
	- edge
	license: other
	license_name: falcon-llm-license
	license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/KVAEDoch-o0HgA0e2L4HL.png)

	# Table of Contents

	0. [TL;DR](#TL;DR)
	1. [Model Details](#model-details)
	2. [Training Details](#training-details)
	3. [Usage](#usage)
	4. [Evaluation](#evaluation)
	5. [Citation](#citation)


	# TL;DR

	# Model Details

	## Model Description

	- Developed by: [https://www.tii.ae](https://www.tii.ae)
	- Model type: Causal decoder-only / Base version
	- Architecture: Pure-transformer - 1.58bit version
	- Language(s) (NLP): English
	- License: Falcon-LLM License

	# Training details

	For more details about the training protocol of this model, please refer to the [Falcon-E technical blogpost](https://falcon-lm.github.io/blog/falcon-edge/).

	# Usage

	Currently to use this model you can either rely on Hugging Face transformers library or [BitNet](https://github.com/microsoft/BitNet) library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the `bfloat16` version of the BitNet model.

	### Inference

	#### 🤗 transformers

	In case you want to perform inference on the BitNet checkpoint run:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "tiiuae/Falcon-E-1B-Base"

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	# Perform text generation
	```

	If you want to rather use the classic `bfloat16` version, you can run:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "tiiuae/Falcon-E-1B-Base"
	revision = "bfloat16"

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	revision=revision,
	).to("cuda")

	# Perform text generation
	```


	#### BitNet

	```
	git clone https://github.com/microsoft/BitNet && cd BitNet
	pip install -r requirements.txt
	python setup_env.py --hf-repo tiiuae/Falcon-E-1B-Base -q i2_s
	python run_inference.py -m models/Falcon-E-1B-Base/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
	```

	#### Apply mlx-lm

	```
	pip install -U mlx-lm
	```

	Then:
	```
	mlx_lm.generate --model tiiuae/Falcon-E-3B-Instruct --prompt "Implement bubble sort" --max-tokens 100 --temp 0.1
	```


	### Fine-tuning

	For fine-tuning the model, you should load the `prequantized` revision of the model and use the `onebitllms` Python package:

	```diff
	import torch

	from transformers import AutoModelForCausalLM, AutoTokenizer
	from trl import SFTTrainer
	+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit

	model_id = "tiiuae/Falcon-E-1B-Base"

	tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	+ revision="prequantized"
	)
	+ model = replace_linear_with_bitnet_linear(model)

	trainer = SFTTrainer(
	model,
	...
	)

	trainer.train()

	+ quantize_to_1bit(output_directory)
	```

	# Evaluation

	We report in the following table our internal pipeline benchmarks:

	Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks

	<details>
	<summary class="bold"> For 1B scale models and below </summary>

	\| Model \| Nb Params \| Mem Footprint \| IFEVAL \| Math-Hard \| GPQA \| MuSR \| BBH \| MMLU-Pro \| Avg. \|
	\| -------- \| ------- \| ------- \| ------- \| ------ \| ----- \| ----- \| ----- \| ------ \| ---- \|
	\| Qwen-2.5-0.5B \| 0.5B \| 1GB \| 16.27 \| 3.93 \| 0.0 \| 2.08 \| 6.95 \| 10.06 \| 6.55 \|
	\| SmolLM2-360M \| 0.36B \| 720MB \| 21.15 \| 1.21 \| 0.0 \| 7.73 \| 5.54 \| 1.88 \| 6.25 \|
	\| Qwen-2.5-1.5B \| 1.5B \| 3.1GB \| 26.74 \| 9.14 \| 16.66 \| 5.27 \| 20.61 \| 4.7 \| 13.85 \|
	\| Llama-3.2-1B \| 1.24B \| 2.47GB \| 14.78 \| 1.21 \| 4.37 \| 2.56 \| 2.26 \| 0 \| 4.2 \|
	\| SmolLM2-1.7B \| 1.7B \| 3.4GB \| 24.4 \| 2.64 \| 9.3 \| 4.6 \| 12.64 \| 3.91 \| 9.58 \|
	\| Falcon-3-1B-Base \| 1.5B \| 3GB \| 24.28 \| 3.32 \| 11.34 \| 9.71 \| 6.76 \| 3.91 \| 9.89 \|
	\| Hymba-1.5B-Base \| 1.5B \| 3GB \| 22.95 \| 1.36 \| 7.69 \| 5.18 \| 10.25 \| 0.78 \| 8.04 \|
	\| Falcon-E-1B-Base \| 1.8B \| 635MB \| 32.9 \| 10.97 \| 2.8 \| 3.65 \| 12.28 \| 17.82 \| 13.40 \|

	</details>


	<details>
	<summary class="bold"> For 3B scale models </summary>

	\| Model \| Nb Params \| Mem Footprint \| IFEVAL \| Math-Hard \| GPQA \| MuSR \| BBH \| MMLU-Pro \| Avg. \|
	\| -------- \| ------- \| ------- \| ------- \| ------ \| ----- \| ----- \| ----- \| ------ \| ---- \|
	\| Falcon-3-3B-Base \| 3B \| 6.46GB \| 15.74 \| 11.78 \| 21.58 \| 6.27 \| 18.09 \| 6.26 \| 15.74 \|
	\| Qwen2.5-3B \| 3B \| 6.17GB \| 26.9 \| 14.8 \| 24.3 \| 11.76 \| 24.48 \| 6.38 \| 18.1 \|
	\| Falcon-E-3B-Base \| 3B \| 999MB \| 36.67 \| 13.45 \| 8.67 \| 4.14 \| 19.83 \| 27.16 \| 18.32 \|

	</details>

	Below are the results for instruction fine-tuned models:

	<details>
	<summary class="bold"> For 1B scale models and below </summary>

	\| Model \| Nb Params \| Mem Footprint \| IFEVAL \| Math-Hard \| GPQA \| MuSR \| BBH \| MMLU-Pro \| Avg. \|
	\| -------- \| ------- \| ------- \| ------- \| ------ \| ----- \| ----- \| ----- \| ------ \| ---- \|
	\| Qwen-2.5-0.5B-Instruct \| 500M \| 1GB \| 30.71 \| 0 \| 8.43 \| 0.94 \| 7.75 \| 0 \| 6.59 \|
	\| SmolLM2-360M-Instruct \| 360M \| 720MB \| 38.42 \| 1.51 \| 4.17 \| 2.77 \| 1.3 \| 0.67 \| 8.14 \|
	\| Qwen-2.5-1.5B-Instruct \| 1.5B \| 3.1GB \| 44.76 \| 22.05 \| 19.81 \| 3.19 \| 19.99 \| 0.78 \| 18.43 \|
	\| SmolLM2-1.7B \| 1.7B \| 3.4GB \| 53.68 \| 5.82 \| 10.92 \| 4.1 \| 11.71 \| 0 \| 15.02 \|
	\| Falcon-3-1B-Instruct \| 1.5B \| 3GB \| 55.57 \| 6.34 \| 12.96 \| 10.56 \| 9.32 \| 2.24 \| 16.16 \|
	\| Hymba-1.5B-Instruct \| 1.5B \| 3GB \| 60.09 \| 2.72 \| 4.59 \| 1.05 \| 11.56 \| 5.515 \| 14.19 \|
	\| Falcon-E-1B-Instruct \| 1.8B \| 635MB \| 54.35 \| 9.12 \| 16.5 \| 2.51 \| 19.42 \| 9.64 \| 18.59 \|

	</details>


	<details>
	<summary class="bold"> For 3B scale models </summary>

	\| Model \| Nb Params \| Mem Footprint \| IFEVAL \| Math-Hard \| GPQA \| MuSR \| BBH \| MMLU-Pro \| Avg. \|
	\| -------- \| ------- \| ------- \| ------- \| ------ \| ----- \| ----- \| ----- \| ------ \| ---- \|
	\| Falcon-3-3B-Instruct \| 3B \| 6.46GB \| 69.77 \| 25 \| 26.29 \| 11.13 \| 22.28 \| 5.15 \| 26.6 \|
	\| Qwen2.5-3B-Instruct \| 3B \| 6.17GB \| 64.75 \| 36.78 \| 25.8 \| 7.57 \| 25.05 \| 3.02 \| 27.16 \|
	\| Falcon-E-3B-Instruct \| 3B \| 999MB \| 60.97 \| 15.3 \| 23.59 \| 2.12 \| 26.45 \| 7.45 \| 22.64666667 \|

	</details>


	## Useful links

	- View [our release blogpost](https://falcon-lm.github.io/blog/falcon-edge/).
	- Learn more about [`onebitllms` library](https://github.com/tiiuae/onebitllms).
	- Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.

	## Citation

	If the Falcon-E family of models were helpful to your work, feel free to give us a cite.

	```
	@misc{tiionebitllms,
	title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
	author = {Falcon-LLM Team},
	month = {April},
	url = {https://falcon-lm.github.io/blog/falcon-edge},
	year = {2025}
	}
	```