Update README.md
Browse files
README.md
CHANGED
|
@@ -42,11 +42,11 @@ pip install transformers
|
|
| 42 |
```
|
| 43 |
|
| 44 |
```python
|
| 45 |
-
from transformers import AutoModelForSequenceClassification,
|
| 46 |
import torch
|
| 47 |
|
| 48 |
# Load tokenizer and model
|
| 49 |
-
tokenizer =
|
| 50 |
model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
|
| 51 |
|
| 52 |
# Prepare text
|
|
@@ -67,6 +67,35 @@ predicted_label = labels[predicted_class_index]
|
|
| 67 |
print(f"Predicted Language: {predicted_label}")
|
| 68 |
```
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
|
| 72 |
Training Performance
|
|
|
|
| 42 |
```
|
| 43 |
|
| 44 |
```python
|
| 45 |
+
from transformers import AutoModelForSequenceClassification, XLMRobertaTokenizer
|
| 46 |
import torch
|
| 47 |
|
| 48 |
# Load tokenizer and model
|
| 49 |
+
tokenizer = XLMRobertaTokenizer.from_pretrained("LocalDoc/language_detection")
|
| 50 |
model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
|
| 51 |
|
| 52 |
# Prepare text
|
|
|
|
| 67 |
print(f"Predicted Language: {predicted_label}")
|
| 68 |
```
|
| 69 |
|
| 70 |
+
## Language Label Information
|
| 71 |
+
|
| 72 |
+
The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
|
| 73 |
+
|
| 74 |
+
| Label | Language Code | Language Name |
|
| 75 |
+
|-------|---------------|---------------|
|
| 76 |
+
| 0 | az | Azerbaijani |
|
| 77 |
+
| 1 | ar | Arabic |
|
| 78 |
+
| 2 | bg | Bulgarian |
|
| 79 |
+
| 3 | de | German |
|
| 80 |
+
| 4 | el | Greek |
|
| 81 |
+
| 5 | en | English |
|
| 82 |
+
| 6 | es | Spanish |
|
| 83 |
+
| 7 | fr | French |
|
| 84 |
+
| 8 | hi | Hindi |
|
| 85 |
+
| 9 | it | Italian |
|
| 86 |
+
| 10 | ja | Japanese |
|
| 87 |
+
| 11 | nl | Dutch |
|
| 88 |
+
| 12 | pl | Polish |
|
| 89 |
+
| 13 | pt | Portuguese |
|
| 90 |
+
| 14 | ru | Russian |
|
| 91 |
+
| 15 | sw | Swahili |
|
| 92 |
+
| 16 | th | Thai |
|
| 93 |
+
| 17 | tr | Turkish |
|
| 94 |
+
| 18 | ur | Urdu |
|
| 95 |
+
| 19 | vi | Vietnamese |
|
| 96 |
+
| 20 | zh | Chinese |
|
| 97 |
+
|
| 98 |
+
This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.
|
| 99 |
|
| 100 |
|
| 101 |
Training Performance
|