YAML Metadata
		Warning:
	empty or missing yaml metadata in repo card
	(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Pretrained models for the paper Scaling up Masked Diffusion Models on Text
Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders. 
For instance, the checkpoint mdm-1028M-1600e18.safetensors represents an MDM model with 1,028 million non-embedding 
parameters and 1,600e18 training FLOPs. Similarly, the checkpoint mdm-170M-100e18-rsl-0.01.safetensors indicates 
an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected 
to random sequence lengths during pretraining.
Math reasoning: please see the gsm8k_safetensors folder.
Conditional generation: please see the sharegpt_safetensors folder.
Reverse curse: please see the reverse_safetensors folder
For all models, we provide models in .pth and .safetensors formats.
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support