Spaces:
Runtime error
Runtime error
File size: 5,614 Bytes
fbdafda fb722e8 963cb02 72096f6 963cb02 6fe86f3 963cb02 6fe86f3 963cb02 6fe86f3 963cb02 6fe86f3 963cb02 6fe86f3 963cb02 6fe86f3 963cb02 6fe86f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
title: AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection
emoji: 🍭
colorFrom: yellow
colorTo: pink
sdk: gradio
python_version: "3.12.9"
sdk_version: "5.35.0"
app_file: app.py
pinned: true
---
# AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection 🍭
## Results
| Subtask | Submission | Model | (strict) F1 Score | |
|---------|------------|--------------------|------------------:|-|
| 1 | 1 | Qwen3-Embedding-8B | 0.875 | [Notebook](https://github.com/dslaborg/germeval2025/blob/main/subtask_1/submission_subtask1.ipynb) |
| 1 | 2 | XLM-RoBERTa-Large | 0.891 | [Notebook](https://github.com/dslaborg/germeval2025/blob/main/subtask_1/submission_subtask1-2.ipynb) |
| 2 | 1 | GBERT-Large | 0.623 | [Notebook](https://github.com/dslaborg/germeval2025/blob/main/subtask_2/submission_subtask2.ipynb) |
| 2 | 2 | XLM-RoBERTa-Large | 0.631 | [Notebook](https://github.com/dslaborg/germeval2025/blob/main/subtask_2/submission_subtask2-2.ipynb) |
## Setup
```bash
python_version="$(cat .python-version)"
# install the interpreter if it’s missing
pyenv install -s "${python_version}"
# select python version for current shell
pyenv shell "${python_version}"
# create venv if missing
if [[ ! -d venv ]]; then
python -m venv venv
fi
# activate venv & install packages
source venv/bin/activate
pip install -U pip setuptools wheel
pip install -r requirements.txt
```
---
# :trophy: Model
Model on [Huggingface](https://huggingface.co/cortex359/germeval2025)
## Model Details
- **Model Type:** Transformer-based encoder (XLM-RoBERTa-Large)
- **Developed by:** Christian Rene Thelen, Patrick Gustav Blaneck, Tobias Bornheim, Niklas Grieger, Stephan Bialonski (FH Aachen, RWTH Aachen, ORDIX AG, Utrecht University)
- **Paper:** [AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training](https://arxiv.org/abs/2509.07459v2)
- **Base Model:** [XLM-RoBERTa-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) (Conneau et al., 2020)
- **Fine-tuning Objective:** Detection of *candy speech* (positive/supportive language) in German YouTube comments.
## Model Description
This model is a fine-tuned **XLM-RoBERTa-Large** adapted for the **GermEval 2025 Shared Task on Candy Speech Detection**.
It was trained to identify *candy speech* at both:
- **Binary level:** Classify whether a comment contains candy speech.
- **Span level:** Detect the exact spans and categories of candy speech within comments, using a BIO tagging scheme across **10 categories** (positive feedback, compliment, affection declaration, encouragement, gratitude, agreement, ambiguous, implicit, group membership, sympathy).
The span-level model also proved effective for binary detection by classifying a comment as candy speech if at least one positive span was detected.
## Intended Uses
- **Research:** Analysis of positive/supportive communication in German social media.
- **Applications:** Social media analytics, conversational AI safety (mitigating sycophancy), computational social science.
- **Not for:** Deployments without fairness/robustness testing on out-of-domain data.
## Performance
- **Dataset:** 46k German YouTube comments, annotated with candy speech spans.
- **Training Data Split:** 37,057 comments (train), 9,229 (test).
- **Shared Task Results:**
- **Subtask 1 (binary detection):** Positive F1 = **0.891** (ranked 1st)
- **Subtask 2 (span detection):** Strict F1 = **0.631** (ranked 1st)
## Training Procedure
- **Architecture:** XLM-RoBERTa-Large + linear classification layer (BIO tagging, 21 labels including “O”).
- **Optimizer:** AdamW
- **Learning Rate:** Peak 2e-5 with linear decay and warmup (500 steps).
- **Epochs:** 20 (with early stopping).
- **Batch Size:** 32
- **Regularization:** Dropout (0.1), weight decay (0.01), gradient clipping (L2 norm 1.0).
- **Postprocessing:** BIO tag correction and subword alignment.
## Limitations
- **Domain Specificity:** Trained only on German YouTube comments; performance may degrade on other platforms, genres, or languages.
- **Overlapping Spans:** Cannot handle overlapping spans, as they were rare (<2%) in the training data.
- **Biases:** May reflect biases present in the dataset (e.g., demographic skews in YouTube communities).
- **Generalization:** Needs evaluation before deployment in real-world moderation systems.
## Ethical Considerations
- **Positive speech detection** is less studied than toxic speech, but automatic labeling of “supportiveness” may reinforce cultural biases about what counts as “positive.”
- Must be complemented with **human-in-the-loop moderation** to avoid misuse.
## Citation
If you use this model, please cite:
```
@inproceedings{thelen-etal-2025-aixcellent,
title = "{AI}xcellent Vibes at {G}erm{E}val 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training",
author = "Thelen, Christian Rene and
Blaneck, Patrick Gustav and
Bornheim, Tobias and
Grieger, Niklas and
Bialonski, Stephan",
editor = "Wartena, Christian and
Heid, Ulrich",
booktitle = "Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops",
month = sep,
year = "2025",
address = "Hannover, Germany",
publisher = "HsH Applied Academics",
url = "https://aclanthology.org/2025.konvens-2.33/",
pages = "398--403"
}
``` |