Spaces:
Paused
Paused
| TITLE = """<h1 align="center" id="space-title">FinanceBench Leaderboard</h1>""" | |
| INTRODUCTION_TEXT = """ | |
| FinanceBench is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). | |
| ## Leaderboard | |
| Submission made by our team are labelled "financebench". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard. | |
| See below for submissions. | |
| """ | |
| SUBMISSION_TEXT = """ | |
| ## Submissions | |
| Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split. | |
| Each question calls for an answer that is either a string (one or a few words), a number, or a comma separated list of strings or floats, unless specified otherwise. There is only one correct answer. | |
| Hence, evaluation is done via quasi exact match between a model’s answer and the ground truth (up to some normalization that is tied to the “type” of the ground truth). | |
| In our evaluation, we use a system prompt to instruct the model about the required format: | |
| ``` | |
| You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER]. YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string. | |
| ``` | |
| We advise you to use the system prompt provided in the paper to ensure your agents answer using the correct and expected format. In practice, GPT4 level models easily follow it. | |
| We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optional: | |
| ``` | |
| {"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"} | |
| {"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"} | |
| ``` | |
| """ | |
| CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
| CITATION_BUTTON_TEXT = r"""@misc{ | |
| }""" | |
| def format_error(msg): | |
| return f"<p style='color: red; font-size: 20px; text-align: center;'>{msg}</p>" | |
| def format_warning(msg): | |
| return f"<p style='color: orange; font-size: 20px; text-align: center;'>{msg}</p>" | |
| def format_log(msg): | |
| return f"<p style='color: green; font-size: 20px; text-align: center;'>{msg}</p>" | |
| def model_hyperlink(link, model_name): | |
| return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>' | |