Spaces:

Luigi
/

rts-commander

Sleeping

App Files Files Community

rts-commander / docs /MCP_MODEL_CAPABILITY_ANALYSIS.md

Luigi

Initial commit: Complete RTS project with MCP evaluation

551ad28 about 2 months ago

preview code

raw

history blame contribute delete

11 kB

	# Qwen2.5 0.5B Model Capability for MCP Instruction Translation

	## Model Assessment

	### Strengths for This Task

	1. Instruction Following: Qwen2.5 is specifically designed for instruction following and has strong capabilities in understanding and executing complex instructions.

	2. Code Understanding: As a coding-focused model, it has good comprehension of APIs, protocols, and structured data formats like JSON.

	3. Task-Specific Fine-Tuning: Your implementation can provide specific examples and context that guide the model toward correct translations.

	4. Context Awareness: The model can work with the detailed game state information provided via MCP to make informed decisions.

	### Limitations to Consider

	1. Size Constraint: At 0.5B parameters, it's smaller than the largest models, which may affect complex reasoning.

	2. Specialized Knowledge: It may not have specific training on the MCP protocol itself (though it can understand the concept from examples).

	3. Consistency: Smaller models can sometimes be less consistent in output quality.

	## Recommended Approach

	### Prompt Engineering Strategy

	The key to success is providing the model with clear, structured prompts that guide it toward correct behavior:

	```python
	def create_translation_prompt(user_instruction: str, game_state: dict) -> str:
	return f"""
	You are an RTS game command interpreter. Convert natural language instructions
	into specific MCP tool calls for an RTS game.

	GAME CONTEXT:
	- You are controlling the PLAYER (player_id: 0)
	- Enemy is player_id: 1
	- Game uses a grid coordinate system
	- Units have specific capabilities and movement patterns

	AVAILABLE MCP TOOLS:
	1. get_game_state() - Retrieve current game situation
	2. move_units(unit_ids: List[str], target_x: float, target_y: float)
	3. attack_unit(attacker_ids: List[str], target_id: str)
	4. build_building(building_type: str, position_x: float, position_y: float, player_id: int)
	5. build_unit(unit_type: str, player_id: int, building_id: str)
	6. get_ai_analysis(language: str) - Get tactical advice

	CURRENT GAME STATE:
	{json.dumps(game_state, indent=2)}

	USER INSTRUCTION: "{user_instruction}"

	TRANSLATION GUIDELINES:
	1. ALWAYS verify that referenced units/buildings exist in the game state
	2. Check that player has sufficient resources for construction actions
	3. Ensure coordinates are valid (within map bounds, not in water)
	4. Use appropriate unit types for actions (infantry for barracks, etc.)
	5. Return ONLY a JSON array of tool calls in this exact format:
	[
	{{"tool": "move_units", "arguments": {{"unit_ids": ["unit1"], "target_x": 100, "target_y": 200}}}}
	]

	EXAMPLE TRANSLATIONS:
	User: "Move my tanks to position 200,300"
	AI: [{{"tool": "move_units", "arguments": {{"unit_ids": ["tank1", "tank2"], "target_x": 200, "target_y": 300}}}}]

	User: "Build a barracks near my HQ"
	AI: [{{"tool": "build_building", "arguments": {{"building_type": "barracks", "position_x": 240, "position_y": 240, "player_id": 0}}}}]

	Now translate the user instruction:
	"""
	```

	### Few-Shot Learning Approach

	Provide several examples in the prompt to guide the model:

	```python
	EXAMPLES = [
	{
	"instruction": "Attack the enemy with my infantry",
	"game_state_context": "Player has infantry1, infantry2. Enemy has barracks at location barracks1",
	"translation": [
	{"tool": "attack_unit", "arguments": {"attacker_ids": ["infantry1", "infantry2"], "target_id": "barracks1"}}
	]
	},
	{
	"instruction": "I need more power",
	"game_state_context": "Player has 500 credits, HQ at 100,100",
	"translation": [
	{"tool": "build_building", "arguments": {"building_type": "power_plant", "position_x": 140, "position_y": 100, "player_id": 0}}
	]
	}
	]
	```

	## Implementation Strategies

	### 1. Validation Layer
	Implement a validation system that checks AI-generated tool calls:

	```python
	def validate_tool_call(tool_call: dict, game_state: dict) -> bool:
	"""Validate that an AI-generated tool call is reasonable"""
	tool_name = tool_call.get("tool")
	args = tool_call.get("arguments", {})

	if tool_name == "move_units":
	# Check that units exist
	unit_ids = args.get("unit_ids", [])
	for unit_id in unit_ids:
	if unit_id not in game_state.get("units", {}):
	return False, f"Unit {unit_id} not found"

	# Check coordinate bounds
	x, y = args.get("target_x", 0), args.get("target_y", 0)
	if not (0 <= x <= 3840 and 0 <= y <= 2880): # Map bounds
	return False, "Target coordinates out of bounds"

	elif tool_name == "build_building":
	# Check resources
	building_type = args.get("building_type")
	cost = BUILDING_COSTS.get(building_type, 0)
	player_credits = game_state.get("players", {}).get("0", {}).get("credits", 0)
	if player_credits < cost:
	return False, "Insufficient credits"

	return True, "Valid"
	```

	### 2. Iterative Refinement
	Implement a feedback loop to improve translations:

	```python
	class MCPTranslationEngine:
	def __init__(self):
	self.successful_translations = []
	self.failed_translations = []

	def translate_instruction(self, instruction: str, game_state: dict) -> List[dict]:
	"""Translate instruction with learning from past examples"""
	# Include successful examples in prompt
	prompt = self.build_prompt_with_examples(instruction, game_state)
	response = self.query_model(prompt)
	return self.parse_response(response)

	def record_result(self, instruction: str, translation: List[dict], success: bool):
	"""Record translation results for future learning"""
	if success:
	self.successful_translations.append((instruction, translation))
	else:
	self.failed_translations.append((instruction, translation))
	```

	### 3. Fallback Mechanisms
	Implement fallback strategies for complex instructions:

	```python
	def translate_with_fallback(instruction: str, game_state: dict) -> List[dict]:
	"""Attempt translation with multiple strategies"""

	# Try direct translation first
	try:
	direct_result = attempt_direct_translation(instruction, game_state)
	if validate_translation(direct_result, game_state):
	return direct_result
	except:
	pass

	# Try breaking into simpler steps
	try:
	steps = break_into_simple_steps(instruction)
	results = []
	for step in steps:
	step_result = attempt_direct_translation(step, game_state)
	if validate_translation(step_result, game_state):
	results.extend(step_result)
	return results
	except:
	pass

	# Fallback to AI analysis request
	return [{"tool": "get_ai_analysis", "arguments": {"language": "en"}}]
	```

	## Performance Expectations

	### Likely Success Cases
	1. Simple Commands: "Move tanks to position X,Y" - High accuracy
	2. Basic Strategy: "Build a power plant" - High accuracy
	3. Direct Attacks: "Attack enemy barracks" - High accuracy
	4. Resource Management: "Build more harvesters" - Moderate to high accuracy

	### Challenging Cases
	1. Complex Tactics: "Flank the enemy while defending our base" - Moderate accuracy
	2. Abstract Concepts: "Win the game" - Lower accuracy, needs breakdown
	3. Multi-step Plans: "Expand economy then build army" - Needs iterative approach
	4. Contextual Nuances: "Defend aggressively" - Interpretation challenges

	## Enhancement Recommendations

	### 1. Model Fine-Tuning
	If possible, fine-tune the model on RTS command examples:
	- Collect successful translation examples
	- Create a dataset of instruction → tool call mappings
	- Fine-tune for better consistency

	### 2. Hybrid Approach
	Combine LLM with rule-based systems:
	```python
	def smart_translate(instruction: str, game_state: dict):
	# Simple pattern matching for common commands
	if "move" in instruction.lower() and "to" in instruction.lower():
	return pattern_based_move_translation(instruction, game_state)

	# Complex reasoning for abstract commands
	elif "win" in instruction.lower() or "strategy" in instruction.lower():
	return ai_assisted_strategic_translation(instruction, game_state)

	# Default to LLM for everything else
	else:
	return llm_based_translation(instruction, game_state)
	```

	### 3. Confidence Scoring
	Implement confidence scoring for translations:
	```python
	def translate_with_confidence(instruction: str, game_state: dict) -> Tuple[List[dict], float]:
	"""Return translation with confidence score (0.0 to 1.0)"""
	translation = generate_translation(instruction, game_state)
	confidence = calculate_confidence(translation, instruction, game_state)
	return translation, confidence

	# Only execute high-confidence translations automatically
	# Ask for confirmation on low-confidence ones
	```

	## Testing Strategy

	### Unit Tests for Translation
	```python
	def test_translation_accuracy():
	test_cases = [
	("Move my tanks to 200,300", expected_tank_move_call),
	("Build a barracks", expected_build_barracks_call),
	("Attack enemy HQ", expected_attack_call),
	]

	for instruction, expected in test_cases:
	result = translate_instruction(instruction, sample_game_state)
	assert result == expected, f"Failed for: {instruction}"
	```

	### A/B Testing Framework
	```python
	def compare_translation_strategies():
	instructions = load_test_instructions()

	strategy_a_results = []
	strategy_b_results = []

	for instruction in instructions:
	# Test different approaches
	result_a = strategy_a(instruction, game_state)
	result_b = strategy_b(instruction, game_state)

	# Measure success (manual or automated evaluation)
	success_a = evaluate_success(result_a)
	success_b = evaluate_success(result_b)

	strategy_a_results.append(success_a)
	strategy_b_results.append(success_b)

	# Compare effectiveness
	avg_a = sum(strategy_a_results) / len(strategy_a_results)
	avg_b = sum(strategy_b_results) / len(strategy_b_results)
	```

	## Conclusion

	While Qwen2.5 0.5B may not be the largest model available, it is absolutely capable of translating user instructions to MCP tool calls for your RTS game, especially with proper:

	1. Structured prompting with clear examples
	2. Validation layers to catch errors
	3. Fallback mechanisms for complex cases
	4. Iterative improvement through learning

	The key is not raw model size, but intelligent implementation that works with the model's strengths while compensating for its limitations. Your existing investment in the Qwen2.5 model, combined with the robust MCP interface, provides an excellent foundation for natural language game control.