combines reinforcement learning (RL) and large language models (LLMs) to improve exploration using diverse tool generation during inference
			
	
	Gabriel Bo
gabrielbo
		·
				AI & ML interests
NLP, Scaling, Test-time Compute
		
		Organizations
			datasets
			9
		
			
	
	
	
	
	gabrielbo/swirl-trajectories-mmlu-pro
			Viewer
			• 
	
				Updated
					
				• 
			
			24.8k
	
				• 
					
					18
				
				• 
					
					2
				
gabrielbo/explore-rl-hotpota-trajectories
	
				Updated
					
				
	
				• 
					
					4
				
				
				
gabrielbo/gpqa-llama-3-8b-verifier
			Viewer
			• 
	
				Updated
					
				• 
			
			910
	
				• 
					
					13
				
				
				
gabrielbo/mmlu-college-llama-3-8b-verifiers
			Viewer
			• 
	
				Updated
					
				• 
			
			870
	
				• 
					
					9
				
				
				
gabrielbo/mmlu-pro-specific-choice-scored
			Viewer
			• 
	
				Updated
					
				• 
			
			870
	
				• 
					
					1
				
				
				
gabrielbo/mmlu-pro-baseline-scored
			Viewer
			• 
	
				Updated
					
				• 
			
			87
	
				• 
					
					4
				
				
				
gabrielbo/mmlu-pro-verifiers-specific-choice
			Viewer
			• 
	
				Updated
					
				• 
			
			870
	
				• 
					
					3
				
				
				
gabrielbo/mmlu-pro-verifiers-baseline
			Viewer
			• 
	
				Updated
					
				• 
			
			87
	
				• 
					
					12
				
				
				
gabrielbo/mmlu-pro-justifications-llama-3
			Viewer
			• 
	
				Updated
					
				• 
			
			87
	
				• 
					
					7
				
				
				
