Apply for community grant: Academic project (gpu)

#1
by robinhad - opened
Lapa LLM org

Lapa LLM is a cutting-edge open large language model based on Gemma-3-12B with a focus on Ukrainian language processing. The project is the result of many months of work by a team of Ukrainian researchers in artificial intelligence from the Ukrainian Catholic University, AGH University of Krakow, Igor Sikorsky Kyiv Polytechnic Institute, and Lviv Polytechnic, who united to create the best model for Ukrainian language processing.

Key Achievements:

  • Best tokenizer for the Ukrainian language: thanks to a SOTA method for tokenizer adaptation developed by Mykola Haltiuk as part of this project. Compared to the original Gemma 3, our model for working with Ukrainian requires 1.5 times fewer tokens, resulting in fewer computations and better results.
  • Instruction version of the model in some benchmark categories is only slightly behind the current leader, MamayLM, we're working on new open datasets to lessen the gap
  • Best English-to-Ukrainian translator with a result of 33 BLEU on FLORES and vice versa, which enables natural and cost-effective translation of new NLP datasets into Ukrainian
  • One of the best models for Summarization and Q&A, which means excellent performance for RAG; and image processing in Ukrainian in its size class
  • The best pretraining checkpoint as measured across 18 Ukrainian-specific benchmarks, thanks to the Kobza dataset and Institutional Books texts from Harvard Library
  • Tests on propaganda and disinformation show the effectiveness of our filtering approach at the pretraining stage and during instruction fine-tuning
  • The model is open for commercial use. 25 training datasets will be published soon, along with the training code

Next steps:

  • We are collecting community feedback on the model's performance to gather dataset with community prompts, and for that a GPU-accelerated space would help us to gather that dataset faster.

Could you please support our project with GPU for this space?

@michellehbn @lvwerra might be good POC here

Sign up or log in to comment