Coding Dataset
Production-grade dataset for training AI coding agents.
Dataset Summary
- Total Examples: 6 (demo)
- Languages: Python, JavaScript, Java
- Task Types: Code Generation
- License: CC0-1.0
Dataset Structure
Data Splits
- train: 70% of data
- validation: 15% of data
- test: 15% of data
Features
id(string): Unique identifiercode(string): Source code snippetcode_description(string): Natural language descriptionprogramming_language(string): Language (python, javascript, java, etc.)task_type(string): Type of taskdifficulty_level(string): Difficulty (beginner, intermediate, advanced, expert)quality_score(float): Quality score 0.0-1.0is_tested(bool): Code is testedhas_bugs(bool): Known bugs existlines_of_code(int): Number of linescollected_at(string): Collection timestamp
Usage
from datasets import load_dataset
# Load dataset
dataset = load_dataset("romcmu863/code-dataset")
# Access splits
train = dataset['train']
validation = dataset['validation']
test = dataset['test']
# Get first example
example = train[0]
print(example['code_description'])
print(example['code'])
License
CC0-1.0
Created
2025-10-25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support