# SmolLM3-3B Inference

In [1]:
!pip install -U -q transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m95.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Local Inference on GPU
Model page: https://huggingface.co/HuggingFaceTB/SmolLM3-3B



### Using pipeline

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline
from transformers import AutoTokenizer

model_id = "HuggingFaceTB/SmolLM3-3B"

tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline("text-generation", model=model_id, tokenizer=tokenizer)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/289 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/182 [00:00<?, ?B/s]

Device set to use cuda:0


#### With Extended Thinking Mode Enabled

No need to add anything special

In [None]:
messages = [
    {"role": "user", "content": "Give me a brief explanation of gravity in simple terms."},
]
pipe(messages)

[{'generated_text': [{'role': 'user',
    'content': 'Give me a brief explanation of gravity in simple terms.'},
   {'role': 'assistant',
    'content': "<think>\nOkay, I need to explain gravity in simple terms. Let me start by recalling what I know. Gravity is the force that pulls objects towards each other. Like how the Earth pulls you down to the ground. But why does that happen? I remember Newton's law of universal gravitation. It says that every mass attracts every other mass with a force proportional to their masses and inversely proportional to the square of the distance between them. So, the bigger the objects, the stronger the gravitational pull. But how do I explain that without using complex math?\n\nMaybe start with the basic idea: gravity is a force that keeps things from floating away. For example, planets orbit the sun because of gravity. Without it, they'd just float off. But why do we feel it on Earth? Because the Earth's gravity is pulling us down. If there were no Ea

#### With Extended Thinking Mode Disabled

We disabled it adding `/no_think` to the system prompt

In [None]:
messages = [
    {"role": "system", "content": "/no_think"},
    {"role": "user", "content": "Give me a brief explanation of gravity in simple terms."},
]
pipe(messages)

[{'generated_text': [{'role': 'system', 'content': '/no_think'},
   {'role': 'user',
    'content': 'Give me a brief explanation of gravity in simple terms.'},
   {'role': 'assistant',
    'content': "Gravity is a force that pulls objects towards each other. Imagine you're holding a rubber ball. If you let go, the ball will fall towards the ground because of gravity. This force is what makes things stay on the ground, like you standing on the floor. It also keeps the moon orbiting around Earth, and the Earth orbiting around the sun. Gravity is always pulling things towards each other, but sometimes, like when the moon and Earth orbit each other, their speed keeps them moving in circles instead of falling towards each other."}]}]

### Loading the model directly

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HuggingFaceTB/SmolLM3-3B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
).to(device)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/289 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/182 [00:00<?, ?B/s]

In [2]:
# prepare the model input
prompt = "Give me a brief explanation of gravity in simple terms."
messages_think = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages_think,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

<think>
Okay, so I need to explain gravity in simple terms. Let me start by recalling what I know about gravity. Gravity is the force that pulls things towards each other. I remember that it's what keeps us on the ground and makes things fall when we drop them. But how exactly does it work?

Hmm, Newton's law of universal gravitation comes to mind. He said that every mass attracts every other mass. The force depends on the masses of the objects and the distance between them. The formula is F = G*(m1*m2)/r². But I need to simplify this for someone who might not know calculus or physics terms.

Wait, maybe I should start with a basic analogy. Like, imagine two balls. If you have a heavy ball and a light ball, they attract each other. The heavier the balls, the stronger the pull. The closer they are, the stronger the pull. The farther apart, the weaker. That makes sense.

But how do I explain it without using math? Maybe talk about everyday examples. Like, why do we stay on the ground? Be

#### Enabling and Disabling Extended Thinking Mode

We disabled it adding `/no_think` to the system prompt. By default, it's active.

In [3]:
prompt = "Give me a brief explanation of gravity in simple terms."
messages = [
    {"role": "system", "content": "/no_think"},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Gravity is a force that pulls objects towards each other. Imagine you're holding a ball, and you let go of it. The ball starts to fall towards the ground because of gravity. This force is always pulling objects towards each other, no matter how far apart they are. It's what keeps the moon orbiting around the Earth, the Earth orbiting around the Sun, and keeps us grounded on the ground. Gravity is what makes things "stick" to the ground and keeps everything in place. It's a very weak force, but it's incredibly strong when you consider the vast distances between objects in space.


#### Agentic Usage

SmolLM3 supports tool calling. Pass a list of tools under the argument `xml_tools` (for standard tool-calling), or `python_tools` (for calling tools like python functions in a `<code>` snippet).

In [9]:
tools = [
    {
        "name": "get_weather",
        "description": "Get the weather in a city",
        "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city to get the weather for"}}}}
]

messages = [
    {
        "role": "user",
        "content": "Hello! How is the weather today in Copenhagen?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    enable_thinking=False, # True works as well, your choice!
    xml_tools=tools,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 09 July 2025
Reasoning Mode: /no_think

## Custom Instructions

You are a helpful AI assistant named SmolLM, trained by Hugging Face.

### Tools

You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:

<tools>
{'name': 'get_weather', 'description': 'Get the weather in a city', 'parameters': {'type': 'object', 'properties': {'city': {'type':'string', 'description': 'The city to get the weather for'}}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

<|im_end|>
<|im_start|>user
Hello! How is the weather today in Copenhagen?<|im_end|>
<|im_start|>assistant
<think>

</think>
<tool_call>
{"name": "get_weather", "arguments": {"city": "Copenhagen"}}
</tool_call>


## Remote Inference via Inference Providers
Ensure you have a valid **HF_TOKEN** set in your environment, running this may bill your account above the free tier.
The following Python example shows how to run the model remotely on HF Inference Providers, using the **auto** provider setting (automatically selects an available inference provider).

In [13]:
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="auto",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="HuggingFaceTB/SmolLM3-3B",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

ChatCompletionOutputMessage(role='assistant', content="<think>\nOkay, so the user is asking for the capital of France. Let me start by recalling what I know. I think the capital is Paris. But wait, maybe I should double-check to be sure. Sometimes people might confuse it with other cities like Lyon or Marseille, but those are major cities in France, not the capital. \n\nI remember that Paris is known for the Eiffel Tower and the Louvre Museum. Also, it's the heart of French culture and fashion. The government is based there, right? The President of France lives in the Élysée Palace, which is in Paris. \n\nWait, could there be any other city that's considered the capital? Maybe in the past? I don't think so. The capital has been Paris since the city became the capital of the Kingdom of France in the 12th century, and it's been the capital ever since. \n\nI should also consider if there's any recent change. For example, sometimes cities like Lyon or Marseille might have been proposed as 