Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -10,14 +10,14 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
| 10 |
|
| 11 |
|
| 12 |
print("Loading model & Tokenizer...")
|
| 13 |
-
model_id = 'gpt2
|
| 14 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 15 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 16 |
|
| 17 |
print("Loading NLTL & and scikit-learn model...")
|
| 18 |
NLTK = nltk_load('data/english.pickle')
|
| 19 |
sent_cut_en = NLTK.tokenize
|
| 20 |
-
clf = joblib.load(f'data/gpt2-
|
| 21 |
|
| 22 |
CROSS_ENTROPY = torch.nn.CrossEntropyLoss(reduction='none')
|
| 23 |
|
|
@@ -126,9 +126,9 @@ with gr.Blocks() as demo:
|
|
| 126 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
| 127 |
texts. This solution scored an ROC of 0.956 and 8th position in the DAIGT LLM Competition on Kaggle. Fork of and credits to this github repo
|
| 128 |
|
| 129 |
-
Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 130 |
-
Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
| 131 |
-
Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 132 |
|
| 133 |
### Linguistic Analysis: Language Model Perplexity
|
| 134 |
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
|
|
|
| 10 |
|
| 11 |
|
| 12 |
print("Loading model & Tokenizer...")
|
| 13 |
+
model_id = 'gpt2'
|
| 14 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 15 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 16 |
|
| 17 |
print("Loading NLTL & and scikit-learn model...")
|
| 18 |
NLTK = nltk_load('data/english.pickle')
|
| 19 |
sent_cut_en = NLTK.tokenize
|
| 20 |
+
clf = joblib.load(f'data/gpt2-small-model')
|
| 21 |
|
| 22 |
CROSS_ENTROPY = torch.nn.CrossEntropyLoss(reduction='none')
|
| 23 |
|
|
|
|
| 126 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
| 127 |
texts. This solution scored an ROC of 0.956 and 8th position in the DAIGT LLM Competition on Kaggle. Fork of and credits to this github repo
|
| 128 |
|
| 129 |
+
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
| 130 |
+
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
| 131 |
+
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
| 132 |
|
| 133 |
### Linguistic Analysis: Language Model Perplexity
|
| 134 |
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|