Update README.md
Browse files
README.md
CHANGED
|
@@ -8,21 +8,37 @@ pinned: false
|
|
| 8 |
---
|
| 9 |
|
| 10 |
Multilingual language models are typically large, requiring significant computational resources.
|
|
|
|
| 11 |
|
| 12 |
-
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds?
|
|
|
|
| 13 |
|
| 14 |
# Techniques:
|
|
|
|
| 15 |
- Pruning
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
-
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
- Minitron: Compact Language models via Pruning & Knowledge Distillation
|
| 23 |
- DistiLLM: Towards Streamlined Distillation for Large Language Models
|
|
|
|
| 24 |
- Quantization
|
| 25 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
- Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning)
|
| 27 |
|
| 28 |
# Datasets:
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
Multilingual language models are typically large, requiring significant computational resources.
|
| 11 |
+

|
| 12 |
|
| 13 |
+
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds running in production with huge batch sizes?
|
| 14 |
+
.png)
|
| 15 |
|
| 16 |
# Techniques:
|
| 17 |
+
|
| 18 |
- Pruning
|
| 19 |
+
- Unstructured Pruning
|
| 20 |
+
- Structured Pruning
|
| 21 |
+
- Semi-Structured Pruning
|
| 22 |
+
|
| 23 |
+
- Methods Used
|
| 24 |
+
- SparseGPT | [GitHub](https://github.com/VishnuVardhanSaiLanka/sparsegpt/tree/aya)
|
| 25 |
+
- ShortGPT | [KLDBasedPruning & Perplexity Sensivities](https://github.com/rsk2327/DistAya/tree/main)
|
| 26 |
+
|
| 27 |
+
- Knowledge Distillation
|
| 28 |
+
- Hidden State-Based Distillation ~ [DistillKit](https://arcee-ai-distillkit.my.canva.site/) | [GitHub](https://github.com/ShayekhBinIslam/DistillKit)
|
| 29 |
+
- Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
|
| 30 |
+
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
|
| 31 |
- Minitron: Compact Language models via Pruning & Knowledge Distillation
|
| 32 |
- DistiLLM: Towards Streamlined Distillation for Large Language Models
|
| 33 |
+
|
| 34 |
- Quantization
|
| 35 |
+
- Quantization Aware Training (QAT)
|
| 36 |
+
- Post Training Quantization (PTQ)
|
| 37 |
+
- KV Cache Quantization
|
| 38 |
+
- Weight & Activation Quantization
|
| 39 |
+
|
| 40 |
+
- Low-Rank Factorization
|
| 41 |
+
|
| 42 |
- Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning)
|
| 43 |
|
| 44 |
# Datasets:
|