| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # DeBERTa | |
| ## Overview | |
| The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's | |
| BERT model released in 2018 and Facebook's RoBERTa model released in 2019. | |
| It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in | |
| RoBERTa. | |
| The abstract from the paper is the following: | |
| *Recent progress in pre-trained neural language models has significantly improved the performance of many natural | |
| language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with | |
| disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the | |
| disentangled attention mechanism, where each word is represented using two vectors that encode its content and | |
| position, respectively, and the attention weights among words are computed using disentangled matrices on their | |
| contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to | |
| predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency | |
| of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of | |
| the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% | |
| (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and | |
| pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.* | |
| This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was | |
| contributed by [kamalkraj](https://huggingface.co/kamalkraj) . The original code can be found [here](https://github.com/microsoft/DeBERTa). | |
| ## Resources | |
| A list of official Hugging Face and community (indicated by π) resources to help you get started with DeBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
| <PipelineTag pipeline="text-classification"/> | |
| - A blog post on how to [Accelerate Large Model Training using DeepSpeed](https://huggingface.co/blog/accelerate-deepspeed) with DeBERTa. | |
| - A blog post on [Supercharged Customer Service with Machine Learning](https://huggingface.co/blog/supercharge-customer-service-with-machine-learning) with DeBERTa. | |
| - [`DebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). | |
| - [`TFDebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). | |
| - [Text classification task guide](../tasks/sequence_classification) | |
| <PipelineTag pipeline="token-classification" /> | |
| - [`DebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). | |
| - [`TFDebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). | |
| - [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the π€ Hugging Face Course. | |
| - [Byte-Pair Encoding tokenization](https://huggingface.co/course/chapter6/5?fw=pt) chapter of the π€ Hugging Face Course. | |
| - [Token classification task guide](../tasks/token_classification) | |
| <PipelineTag pipeline="fill-mask"/> | |
| - [`DebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). | |
| - [`TFDebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). | |
| - [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the π€ Hugging Face Course. | |
| - [Masked language modeling task guide](../tasks/masked_language_modeling) | |
| <PipelineTag pipeline="question-answering"/> | |
| - [`DebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). | |
| - [`TFDebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). | |
| - [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the π€ Hugging Face Course. | |
| - [Question answering task guide](../tasks/question_answering) | |
| ## DebertaConfig | |
| [[autodoc]] DebertaConfig | |
| ## DebertaTokenizer | |
| [[autodoc]] DebertaTokenizer | |
| - build_inputs_with_special_tokens | |
| - get_special_tokens_mask | |
| - create_token_type_ids_from_sequences | |
| - save_vocabulary | |
| ## DebertaTokenizerFast | |
| [[autodoc]] DebertaTokenizerFast | |
| - build_inputs_with_special_tokens | |
| - create_token_type_ids_from_sequences | |
| ## DebertaModel | |
| [[autodoc]] DebertaModel | |
| - forward | |
| ## DebertaPreTrainedModel | |
| [[autodoc]] DebertaPreTrainedModel | |
| ## DebertaForMaskedLM | |
| [[autodoc]] DebertaForMaskedLM | |
| - forward | |
| ## DebertaForSequenceClassification | |
| [[autodoc]] DebertaForSequenceClassification | |
| - forward | |
| ## DebertaForTokenClassification | |
| [[autodoc]] DebertaForTokenClassification | |
| - forward | |
| ## DebertaForQuestionAnswering | |
| [[autodoc]] DebertaForQuestionAnswering | |
| - forward | |
| ## TFDebertaModel | |
| [[autodoc]] TFDebertaModel | |
| - call | |
| ## TFDebertaPreTrainedModel | |
| [[autodoc]] TFDebertaPreTrainedModel | |
| - call | |
| ## TFDebertaForMaskedLM | |
| [[autodoc]] TFDebertaForMaskedLM | |
| - call | |
| ## TFDebertaForSequenceClassification | |
| [[autodoc]] TFDebertaForSequenceClassification | |
| - call | |
| ## TFDebertaForTokenClassification | |
| [[autodoc]] TFDebertaForTokenClassification | |
| - call | |
| ## TFDebertaForQuestionAnswering | |
| [[autodoc]] TFDebertaForQuestionAnswering | |
| - call | |