Announcing RealPerformance, a dataset of functional issues of language models that mirrors failure patterns identified through rigorous testing in real LLM agents
Extending datasets just got a whole lot easier! 🚀 With Sheets, I was able to create a Spanish version of the popular fka/awesome-chatgpt-prompts dataset in just a few minutes ⏱️.
Want to try it out for yourself? Head over to the Sheets space and see how easy it is to extend and modify existing datasets 🤯. The possibilities are endless! 🌐
Hey! I built RAG MCP Server Space, a simple Gradio MCP server for RAG systems that allows you to search relevant results without passing huge contexts to your LLM.
You can use this space to integrate with your agents and improve the efficiency of your search results. Feel free to try it out and let me know if you have any feedback or questions!
📜 Accepted at ACL 2025! Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs We propose to fine-tune LLMs to generate diverse chains of thought (DCoT) in a single inference step. This enables within-inference refinement of the cots, no external feedback needed! 🔗 https://arxiv.org/abs/2407.03181