Spaces:

Naphula
/

model_tools

Running

App Files Files Community

model_tools / textonly_ripper.md

Naphula

Upload 2 files

9895dce verified 24 days ago

preview code

raw

history blame contribute delete

3.2 kB

Multimodal to Text-Only Model Converter

Overview

This Python script is a utility designed to convert a sharded, multimodal (text and vision) Mistral-based model into a text-only version. It achieves this by selectively removing the vision-related weights from the model's safetensors files and restructuring the remaining tensors to create a valid, language-only model.

This is particularly useful for adapting multimodal finetunes for tasks that only require the language model, such as merging with other text-based models (e.g., via SLERP) or for more efficient deployment in text-only environments.

Features

Handles Sharded Models: Automatically processes models split across multiple safetensors files.
Targeted Weight Removal: Removes tensors based on specific prefixes, targeting the vision tower and multimodal projector layers.
Tensor Renaming: Correctly renames the language model tensors by stripping the multimodal prefix (e.g., language_model.model... becomes model...), ensuring compatibility with standard MistralForCausalLM architecture.
Automated Index Generation: Creates a new, clean model.safetensors.index.json for the converted model.
Efficient Processing: Skips creating new files for shards that contained only vision weights, saving disk space.

Prerequisites

Python 3.6+
PyTorch
Safetensors

Install the required libraries using pip:

pip install torch safetensors

How to Use

Prepare Directories:
- Have your original multimodal model in an input directory. This folder should contain the model-*.safetensors files and the model.safetensors.index.json.
- Create a new, empty directory where the converted text-only model will be saved.
Configure the Script:
- Open the Python script (vision_stripper.py or your chosen name).
- Locate the if __name__ == "__main__": block at the bottom of the file.
- Set the input_model_directory variable to the path of your original multimodal model.
- Set the output_model_directory variable to the path of your new, empty output folder.
```
# --- Example Configuration ---
# On Windows, use raw strings (r"...") to avoid path errors
input_model_directory = r"C:\path\to\your\multimodal_model"
output_model_directory = r"C:\path\to\your\new_text_only_model"
```
Run the Conversion:
- Execute the script from your terminal:
```
python vision_stripper.py
```
Finalize Model Files:
- After the script completes, copy any other necessary non-weight files (like config.json, tokenizer_config.json, chat_template.jinja.txt, etc.) to your new output directory.
- Crucially, ensure the config.json in the output directory is updated to reflect a text-only architecture (e.g., changing the architectures value to ["MistralForCausalLM"] and removing the vision_config section).

The script will report its progress in the console, and upon completion, your output directory will contain the converted, text-only model, ready for use.