nielsr HF Staff commited on
Commit
7978002
·
verified ·
1 Parent(s): 9641ce1

Improve model card: Add pipeline tag, library name, paper and GitHub links

Browse files

This PR enhances the model card for `F2LLM-4B` by:
- Adding `pipeline_tag: feature-extraction` to correctly categorize the model's functionality on the Hub.
- Including `library_name: transformers` to enable the automated "How to use" widget, as evidenced by the existing usage snippet.
- Adding direct links to the paper ([F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data](https://huggingface.co/papers/2510.02294)) and the GitHub repository for better visibility and easier access.

These changes will improve the model's discoverability and usability on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -1,13 +1,21 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - codefuse-ai/F2LLM
5
  language:
6
  - en
7
- base_model:
8
- - Qwen/Qwen3-4B
 
9
  ---
10
 
 
 
 
 
 
 
11
  F2LLMs (Foundation to Feature Large Language Models) are foundation models directly finetuned on 6 million high-quality query-document pairs (available in [codefuse-ai/F2LLM](https://huggingface.co/datasets/codefuse-ai/F2LLM)) covering a diverse range of retrieval, classification, and clustering data, curated solely from open-source datasets without any synthetic data. These models are trained with homogeneous macro batches in a single stage, without sophisticated multi-stage pipelines.
12
 
13
  ## Usage
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-4B
4
  datasets:
5
  - codefuse-ai/F2LLM
6
  language:
7
  - en
8
+ license: apache-2.0
9
+ pipeline_tag: feature-extraction
10
+ library_name: transformers
11
  ---
12
 
13
+ # F2LLM-4B: Matching SOTA Embedding Performance with 6 Million Open-Source Data
14
+
15
+ This model is a part of the F2LLM family, presented in the paper [F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data](https://huggingface.co/papers/2510.02294).
16
+
17
+ **Code**: [https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/F2LLM](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/F2LLM)
18
+
19
  F2LLMs (Foundation to Feature Large Language Models) are foundation models directly finetuned on 6 million high-quality query-document pairs (available in [codefuse-ai/F2LLM](https://huggingface.co/datasets/codefuse-ai/F2LLM)) covering a diverse range of retrieval, classification, and clustering data, curated solely from open-source datasets without any synthetic data. These models are trained with homogeneous macro batches in a single stage, without sophisticated multi-stage pipelines.
20
 
21
  ## Usage