Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -31,11 +31,7 @@ conforms to common characteristics.
|
|
| 31 |
I used all variants of the open-source GPT-2 model except xl size to compute the PPL (both text-level and sentence-level PPLs) of the collected \
|
| 32 |
texts. It is observed that, regardless of whether it is at the text level or the sentence level, the content generated by LLMs have relatively \
|
| 33 |
lower PPLs compared to the text written by humans. LLM captured common patterns and structures in the text it was trained on, and is very good at \
|
| 34 |
-
reproducing them. As a result, text generated by LLMs have relatively concentrated low PPLs
|
| 35 |
-
|
| 36 |
-
Humans have the ability to express themselves in a wide variety of ways, depending on the context, audience, and purpose of the text they are \
|
| 37 |
-
writing. This can include using creative or imaginative elements, such as metaphors, similes, and unique word choices, which can make it more \
|
| 38 |
-
difficult for GPT2 to predict. The PPL distributions of text written by humans and text generated by LLMs are shown in the figure below.\
|
| 39 |
"""
|
| 40 |
|
| 41 |
|
|
@@ -124,11 +120,11 @@ with gr.Blocks() as demo:
|
|
| 124 |
## Detect text generated using LLMs 🤖
|
| 125 |
|
| 126 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
| 127 |
-
texts. This solution scored an ROC of 0.956 and 8th position in the DAIGT LLM Competition on Kaggle.
|
| 128 |
|
|
|
|
| 129 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 130 |
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
| 131 |
-
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 132 |
|
| 133 |
### Linguistic Analysis: Language Model Perplexity
|
| 134 |
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
|
|
|
| 31 |
I used all variants of the open-source GPT-2 model except xl size to compute the PPL (both text-level and sentence-level PPLs) of the collected \
|
| 32 |
texts. It is observed that, regardless of whether it is at the text level or the sentence level, the content generated by LLMs have relatively \
|
| 33 |
lower PPLs compared to the text written by humans. LLM captured common patterns and structures in the text it was trained on, and is very good at \
|
| 34 |
+
reproducing them. As a result, text generated by LLMs have relatively concentrated low PPLs.\
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
"""
|
| 36 |
|
| 37 |
|
|
|
|
| 120 |
## Detect text generated using LLMs 🤖
|
| 121 |
|
| 122 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
| 123 |
+
texts. This solution scored an ROC of 0.956 and 8th position in the DAIGT LLM Competition on Kaggle.
|
| 124 |
|
| 125 |
+
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 126 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 127 |
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
|
|
|
| 128 |
|
| 129 |
### Linguistic Analysis: Language Model Perplexity
|
| 130 |
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|