Spaces:
Running
Multimodal to Text-Only Model Converter
Overview
This Python script is a utility designed to convert a sharded, multimodal (text and vision) Mistral-based model into a text-only version. It achieves this by selectively removing the vision-related weights from the model's safetensors files and restructuring the remaining tensors to create a valid, language-only model.
This is particularly useful for adapting multimodal finetunes for tasks that only require the language model, such as merging with other text-based models (e.g., via SLERP) or for more efficient deployment in text-only environments.
Features
- Handles Sharded Models: Automatically processes models split across multiple
safetensorsfiles. - Targeted Weight Removal: Removes tensors based on specific prefixes, targeting the vision tower and multimodal projector layers.
- Tensor Renaming: Correctly renames the language model tensors by stripping the multimodal prefix (e.g.,
language_model.model...becomesmodel...), ensuring compatibility with standardMistralForCausalLMarchitecture. - Automated Index Generation: Creates a new, clean
model.safetensors.index.jsonfor the converted model. - Efficient Processing: Skips creating new files for shards that contained only vision weights, saving disk space.
Prerequisites
- Python 3.6+
- PyTorch
- Safetensors
Install the required libraries using pip:
pip install torch safetensors
How to Use
Prepare Directories:
- Have your original multimodal model in an input directory. This folder should contain the
model-*.safetensorsfiles and themodel.safetensors.index.json. - Create a new, empty directory where the converted text-only model will be saved.
- Have your original multimodal model in an input directory. This folder should contain the
Configure the Script:
- Open the Python script (
vision_stripper.pyor your chosen name). - Locate the
if __name__ == "__main__":block at the bottom of the file. - Set the
input_model_directoryvariable to the path of your original multimodal model. - Set the
output_model_directoryvariable to the path of your new, empty output folder.
# --- Example Configuration --- # On Windows, use raw strings (r"...") to avoid path errors input_model_directory = r"C:\path\to\your\multimodal_model" output_model_directory = r"C:\path\to\your\new_text_only_model"- Open the Python script (
Run the Conversion:
- Execute the script from your terminal:
python vision_stripper.pyFinalize Model Files:
- After the script completes, copy any other necessary non-weight files (like
config.json,tokenizer_config.json,chat_template.jinja.txt, etc.) to your new output directory. - Crucially, ensure the
config.jsonin the output directory is updated to reflect a text-only architecture (e.g., changing thearchitecturesvalue to["MistralForCausalLM"]and removing thevision_configsection).
- After the script completes, copy any other necessary non-weight files (like
The script will report its progress in the console, and upon completion, your output directory will contain the converted, text-only model, ready for use.