Update src/tasks_content.py
Browse files- src/tasks_content.py +7 -7
src/tasks_content.py
CHANGED
|
@@ -38,7 +38,7 @@ TASKS_DESCRIPTIONS = {
|
|
| 38 |
As a context, we pass a prefix of the list of APIs available in the target library.
|
| 39 |
We select the prefix based on their BM-25 similarity with the provided instruction.
|
| 40 |
|
| 41 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `library_based_code_generation` directory in [our baselines repository](https://
|
| 42 |
|
| 43 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 44 |
""",
|
|
@@ -57,7 +57,7 @@ TASKS_DESCRIPTIONS = {
|
|
| 57 |
* `oracle: files` β ground truth diffs are used to select files that should be corrected to fix the issue;
|
| 58 |
* `oracle: files, lines` β ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
|
| 59 |
|
| 60 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://
|
| 61 |
|
| 62 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 63 |
""",
|
|
@@ -82,7 +82,7 @@ TASKS_DESCRIPTIONS = {
|
|
| 82 |
* *non-informative* β short/long lines, import/print lines, or comment lines;
|
| 83 |
* *random* β lines that don't fit any of the previous categories.
|
| 84 |
|
| 85 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `project_level_code_completion` directory in [our baselines repository](https://
|
| 86 |
|
| 87 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 88 |
""",
|
|
@@ -97,7 +97,7 @@ TASKS_DESCRIPTIONS = {
|
|
| 97 |
* [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
|
| 98 |
* [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
|
| 99 |
|
| 100 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `commit_message_generation` directory in [our baselines repository](https://
|
| 101 |
|
| 102 |
**Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
|
| 103 |
|
|
@@ -119,7 +119,7 @@ TASKS_DESCRIPTIONS = {
|
|
| 119 |
* **All incorrect** - percentage of cases where all buggy files were incorrectly identified
|
| 120 |
* **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
|
| 121 |
|
| 122 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `bug_localization` directory in [our baselines repository](https://
|
| 123 |
|
| 124 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 125 |
""",
|
|
@@ -129,9 +129,9 @@ TASKS_DESCRIPTIONS = {
|
|
| 129 |
The model is required to generate such description, given the relevant context code and the intent behind the documentation.
|
| 130 |
|
| 131 |
We use a novel metric for evaluation:
|
| 132 |
-
* `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://
|
| 133 |
|
| 134 |
-
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `module_summarization` directory in [our baselines repository](https://
|
| 135 |
|
| 136 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 137 |
""",
|
|
|
|
| 38 |
As a context, we pass a prefix of the list of APIs available in the target library.
|
| 39 |
We select the prefix based on their BM-25 similarity with the provided instruction.
|
| 40 |
|
| 41 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `library_based_code_generation` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 42 |
|
| 43 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 44 |
""",
|
|
|
|
| 57 |
* `oracle: files` β ground truth diffs are used to select files that should be corrected to fix the issue;
|
| 58 |
* `oracle: files, lines` β ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
|
| 59 |
|
| 60 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 61 |
|
| 62 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 63 |
""",
|
|
|
|
| 82 |
* *non-informative* β short/long lines, import/print lines, or comment lines;
|
| 83 |
* *random* β lines that don't fit any of the previous categories.
|
| 84 |
|
| 85 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `project_level_code_completion` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 86 |
|
| 87 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 88 |
""",
|
|
|
|
| 97 |
* [ChrF](https://huggingface.co/spaces/evaluate-metric/chrf)
|
| 98 |
* [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore)
|
| 99 |
|
| 100 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `commit_message_generation` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 101 |
|
| 102 |
**Note.** The leaderboard is sorted by the `ROUGE-1` metric by default.
|
| 103 |
|
|
|
|
| 119 |
* **All incorrect** - percentage of cases where all buggy files were incorrectly identified
|
| 120 |
* **# Output** - average number of buggy files detected, to further assess performance, particularly concerning high **FPR**.
|
| 121 |
|
| 122 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `bug_localization` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 123 |
|
| 124 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 125 |
""",
|
|
|
|
| 129 |
The model is required to generate such description, given the relevant context code and the intent behind the documentation.
|
| 130 |
|
| 131 |
We use a novel metric for evaluation:
|
| 132 |
+
* `CompScore`: the new metric based on LLM as an assessor proposed for this task. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. More details on how it is calculated can be found in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/tree/main/module_summarization).
|
| 133 |
|
| 134 |
+
For further details on the dataset and the baselines from the Long Code Arena team, refer to the `module_summarization` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines/).
|
| 135 |
|
| 136 |
**Terms of use**. As this dataset is collected from GitHub, researchers may use it for research purposes only if any publications resulting from that research are open access (see [GitHub Acceptable Use Policies](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions)).
|
| 137 |
""",
|