Mangosteen, a 47 billion-token Thai corpus built with a Thai-adapted pipeline, improves language model performance on Thai benchmarks.
			
	
	Wannaphong Phatthiyaphaibun
wannaphong
		AI & ML interests
None yet
		Recent Activity
						updated
								a dataset
							
						about 13 hours ago
						
					
						
						
						
						wannaphong/cc_dolma
						
						published
								a dataset
							
						about 21 hours ago
						
					
						
						
						
						wannaphong/fineweb2-thai
						
						updated
								a dataset
							
						4 days ago
						
					
						
						
						
						wannaphong/eclektic
						
 
								 
								 
								











