Mangosteen, a 47 billion-token Thai corpus built with a Thai-adapted pipeline, improves language model performance on Thai benchmarks.
Wannaphong Phatthiyaphaibun
wannaphong
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 16 hours ago
wannaphong/enwikivoyage-20251101-markdown
published
a dataset
about 16 hours ago
wannaphong/enwikivoyage-20251101-markdown
updated
a dataset
about 19 hours ago
wannaphong/thwiki-20251101-markdown-fix