Spaces:

Naphula
/

model_tools

Running

App Files Files Community

model_tools / textonly_ripper.md

Naphula

Upload 2 files

9895dce verified 25 days ago

preview code

raw

history blame contribute delete

3.2 kB

	# Multimodal to Text-Only Model Converter

	## Overview

	This Python script is a utility designed to convert a sharded, multimodal (text and vision) Mistral-based model into a text-only version. It achieves this by selectively removing the vision-related weights from the model's `safetensors` files and restructuring the remaining tensors to create a valid, language-only model.

	This is particularly useful for adapting multimodal finetunes for tasks that only require the language model, such as merging with other text-based models (e.g., via SLERP) or for more efficient deployment in text-only environments.

	## Features

	- Handles Sharded Models: Automatically processes models split across multiple `safetensors` files.
	- Targeted Weight Removal: Removes tensors based on specific prefixes, targeting the vision tower and multimodal projector layers.
	- Tensor Renaming: Correctly renames the language model tensors by stripping the multimodal prefix (e.g., `language_model.model...` becomes `model...`), ensuring compatibility with standard `MistralForCausalLM` architecture.
	- Automated Index Generation: Creates a new, clean `model.safetensors.index.json` for the converted model.
	- Efficient Processing: Skips creating new files for shards that contained only vision weights, saving disk space.

	## Prerequisites

	- Python 3.6+
	- PyTorch
	- Safetensors

	Install the required libraries using pip:
	```bash
	pip install torch safetensors
	```

	## How to Use

	1. Prepare Directories:
	- Have your original multimodal model in an input directory. This folder should contain the `model-*.safetensors` files and the `model.safetensors.index.json`.
	- Create a new, empty directory where the converted text-only model will be saved.

	2. Configure the Script:
	- Open the Python script (`vision_stripper.py` or your chosen name).
	- Locate the `if __name__ == "__main__":` block at the bottom of the file.
	- Set the `input_model_directory` variable to the path of your original multimodal model.
	- Set the `output_model_directory` variable to the path of your new, empty output folder.

	```python
	# --- Example Configuration ---
	# On Windows, use raw strings (r"...") to avoid path errors
	input_model_directory = r"C:\path\to\your\multimodal_model"
	output_model_directory = r"C:\path\to\your\new_text_only_model"
	```

	3. Run the Conversion:
	- Execute the script from your terminal:
	```bash
	python vision_stripper.py
	```

	4. Finalize Model Files:
	- After the script completes, copy any other necessary non-weight files (like `config.json`, `tokenizer_config.json`, `chat_template.jinja.txt`, etc.) to your new output directory.
	- Crucially, ensure the `config.json` in the output directory is updated to reflect a text-only architecture (e.g., changing the `architectures` value to `["MistralForCausalLM"]` and removing the `vision_config` section).

	The script will report its progress in the console, and upon completion, your output directory will contain the converted, text-only model, ready for use.