Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -116,7 +116,7 @@ def predict(text):
|
|
| 116 |
|
| 117 |
with gr.Blocks() as demo:
|
| 118 |
gr.Markdown(
|
| 119 |
-
"""
|
| 120 |
## Detect text generated using LLMs 🤖
|
| 121 |
|
| 122 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
|
@@ -124,31 +124,36 @@ with gr.Blocks() as demo:
|
|
| 124 |
|
| 125 |
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 126 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 127 |
-
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
| 128 |
-
|
| 129 |
-
### Linguistic Analysis: Language Model Perplexity
|
| 130 |
-
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
| 131 |
-
of the negative average log-likelihood of the text under the LM. A lower PPL indicates that the language model is more confident in its \
|
| 132 |
-
predictions, and is therefore considered to be a better model. The training of LMs is carried out on large-scale text corpora, it can \
|
| 133 |
-
be considered that it has learned some common language patterns and text structures. Therefore, PPL can be used to measure how \
|
| 134 |
-
well a text conforms to common characteristics.
|
| 135 |
-
|
| 136 |
-
### GLTR: Giant Language Model Test Room
|
| 137 |
-
This idea originates from the following paper: arxiv.org/pdf/1906.04043.pdf. It studies 3 tests to compute features of an input text. Their \
|
| 138 |
-
major assumption is that to generate fluent and natural-looking text, most decoding strategies sample high probability tokens from the head \
|
| 139 |
-
of the distribution. I selected the most powerful Test-2 feature, which is the number of tokens in the Top-10, Top-100, Top-1000, and 1000+ \
|
| 140 |
-
ranks from the LM predicted probability distributions.
|
| 141 |
-
|
| 142 |
-
### Modelling
|
| 143 |
-
Scikit-learn's VotingClassifier consisting of XGBClassifier, LGBMClassifier, CatBoostClassifier and RandomForestClassifier with default parameters
|
| 144 |
"""
|
| 145 |
)
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
demo.launch()
|
|
|
|
| 116 |
|
| 117 |
with gr.Blocks() as demo:
|
| 118 |
gr.Markdown(
|
| 119 |
+
"""\
|
| 120 |
## Detect text generated using LLMs 🤖
|
| 121 |
|
| 122 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
|
|
|
| 124 |
|
| 125 |
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 126 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 127 |
+
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)\
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
"""
|
| 129 |
)
|
| 130 |
+
with gr.Column():
|
| 131 |
+
gr.Markdown(
|
| 132 |
+
"""\
|
| 133 |
+
### Linguistic Analysis: Language Model Perplexity
|
| 134 |
+
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
| 135 |
+
of the negative average log-likelihood of the text under the LM. A lower PPL indicates that the language model is more confident in its \
|
| 136 |
+
predictions, and is therefore considered to be a better model. The training of LMs is carried out on large-scale text corpora, it can \
|
| 137 |
+
be considered that it has learned some common language patterns and text structures. Therefore, PPL can be used to measure how \
|
| 138 |
+
well a text conforms to common characteristics.
|
| 139 |
+
|
| 140 |
+
### GLTR: Giant Language Model Test Room
|
| 141 |
+
This idea originates from the following paper: arxiv.org/pdf/1906.04043.pdf. It studies 3 tests to compute features of an input text. Their \
|
| 142 |
+
major assumption is that to generate fluent and natural-looking text, most decoding strategies sample high probability tokens from the head \
|
| 143 |
+
of the distribution. I selected the most powerful Test-2 feature, which is the number of tokens in the Top-10, Top-100, Top-1000, and 1000+ \
|
| 144 |
+
ranks from the LM predicted probability distributions.
|
| 145 |
+
|
| 146 |
+
### Modelling
|
| 147 |
+
Scikit-learn's VotingClassifier consisting of XGBClassifier, LGBMClassifier, CatBoostClassifier and RandomForestClassifier with default parameters\
|
| 148 |
+
"""
|
| 149 |
+
)
|
| 150 |
+
with gr.Group()
|
| 151 |
+
a1 = gr.Textbox( lines=7, label='Text', value=example )
|
| 152 |
+
button1 = gr.Button("🤖 Predict!")
|
| 153 |
+
gr.Markdown("Prediction:")
|
| 154 |
+
label1 = gr.Textbox(lines=1, label='Predicted Label')
|
| 155 |
+
score1 = gr.Textbox(lines=1, label='Predicted Probability')
|
| 156 |
+
|
| 157 |
+
button1.click(predict, inputs=[a1], outputs=[label1, score1])
|
| 158 |
|
| 159 |
demo.launch()
|