Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/dslim/bert-base-NER/README.md
    	
        README.md
    ADDED
    
    | 
         @@ -0,0 +1,114 @@ 
     | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            ---
         
     | 
| 2 | 
         
            +
            language: en
         
     | 
| 3 | 
         
            +
            datasets:
         
     | 
| 4 | 
         
            +
            - conll2003
         
     | 
| 5 | 
         
            +
            ---
         
     | 
| 6 | 
         
            +
            # bert-base-NER
         
     | 
| 7 | 
         
            +
             
     | 
| 8 | 
         
            +
            ## Model description
         
     | 
| 9 | 
         
            +
             
     | 
| 10 | 
         
            +
            **bert-base-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). 
         
     | 
| 11 | 
         
            +
             
     | 
| 12 | 
         
            +
            Specifically, this model is a *bert-base-cased* model that was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. 
         
     | 
| 13 | 
         
            +
            ## Intended uses & limitations
         
     | 
| 14 | 
         
            +
             
     | 
| 15 | 
         
            +
            #### How to use
         
     | 
| 16 | 
         
            +
             
     | 
| 17 | 
         
            +
            You can use this model with Transformers *pipeline* for NER.
         
     | 
| 18 | 
         
            +
             
     | 
| 19 | 
         
            +
            ```python
         
     | 
| 20 | 
         
            +
            from transformers import AutoTokenizer, AutoModelForTokenClassification
         
     | 
| 21 | 
         
            +
            from transformers import pipeline
         
     | 
| 22 | 
         
            +
             
     | 
| 23 | 
         
            +
            tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
         
     | 
| 24 | 
         
            +
            model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
         
     | 
| 25 | 
         
            +
             
     | 
| 26 | 
         
            +
            nlp = pipeline("ner", model=model, tokenizer=tokenizer)
         
     | 
| 27 | 
         
            +
            example = "My name is Wolfgang and I live in Berlin"
         
     | 
| 28 | 
         
            +
             
     | 
| 29 | 
         
            +
            ner_results = nlp(example)
         
     | 
| 30 | 
         
            +
            print(ner_results)
         
     | 
| 31 | 
         
            +
            ```
         
     | 
| 32 | 
         
            +
             
     | 
| 33 | 
         
            +
            #### Limitations and bias
         
     | 
| 34 | 
         
            +
             
     | 
| 35 | 
         
            +
            This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases. 
         
     | 
| 36 | 
         
            +
             
     | 
| 37 | 
         
            +
            ## Training data
         
     | 
| 38 | 
         
            +
             
     | 
| 39 | 
         
            +
            This model was fine-tuned on English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. 
         
     | 
| 40 | 
         
            +
             
     | 
| 41 | 
         
            +
            The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:
         
     | 
| 42 | 
         
            +
            Abbreviation|Description
         
     | 
| 43 | 
         
            +
            -|-
         
     | 
| 44 | 
         
            +
            O|Outside of a named entity
         
     | 
| 45 | 
         
            +
            B-MIS |Beginning of a miscellaneous entity right after another miscellaneous entity
         
     | 
| 46 | 
         
            +
            I-MIS |Miscellaneous entity
         
     | 
| 47 | 
         
            +
            B-PER |Beginning of a person’s name right after another person’s name
         
     | 
| 48 | 
         
            +
            I-PER |Person’s name
         
     | 
| 49 | 
         
            +
            B-ORG |Beginning of an organisation right after another organisation
         
     | 
| 50 | 
         
            +
            I-ORG |Organisation
         
     | 
| 51 | 
         
            +
            B-LOC |Beginning of a location right after another location
         
     | 
| 52 | 
         
            +
            I-LOC |Location
         
     | 
| 53 | 
         
            +
             
     | 
| 54 | 
         
            +
             
     | 
| 55 | 
         
            +
            ### CoNLL-2003 English Dataset Statistics
         
     | 
| 56 | 
         
            +
            This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. 
         
     | 
| 57 | 
         
            +
            #### # of training examples per entity type
         
     | 
| 58 | 
         
            +
            Dataset|LOC|MISC|ORG|PER
         
     | 
| 59 | 
         
            +
            -|-|-|-|-
         
     | 
| 60 | 
         
            +
            Train|7140|3438|6321|6600
         
     | 
| 61 | 
         
            +
            Dev|1837|922|1341|1842
         
     | 
| 62 | 
         
            +
            Test|1668|702|1661|1617
         
     | 
| 63 | 
         
            +
            #### # of articles/sentences/tokens per dataset
         
     | 
| 64 | 
         
            +
            Dataset |Articles |Sentences |Tokens
         
     | 
| 65 | 
         
            +
            -|-|-|-
         
     | 
| 66 | 
         
            +
            Train |946 |14,987 |203,621
         
     | 
| 67 | 
         
            +
            Dev |216 |3,466 |51,362
         
     | 
| 68 | 
         
            +
            Test |231 |3,684 |46,435
         
     | 
| 69 | 
         
            +
             
     | 
| 70 | 
         
            +
            ## Training procedure
         
     | 
| 71 | 
         
            +
             
     | 
| 72 | 
         
            +
            This model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task. 
         
     | 
| 73 | 
         
            +
             
     | 
| 74 | 
         
            +
            ## Eval results
         
     | 
| 75 | 
         
            +
            metric|dev|test
         
     | 
| 76 | 
         
            +
            -|-|-
         
     | 
| 77 | 
         
            +
            f1 |95.1 |91.3
         
     | 
| 78 | 
         
            +
            precision |95.0 |90.7
         
     | 
| 79 | 
         
            +
            recall |95.3 |91.9
         
     | 
| 80 | 
         
            +
             
     | 
| 81 | 
         
            +
            The test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results [here](https://github.com/google-research/bert/issues/223).
         
     | 
| 82 | 
         
            +
             
     | 
| 83 | 
         
            +
            ### BibTeX entry and citation info
         
     | 
| 84 | 
         
            +
             
     | 
| 85 | 
         
            +
            ```
         
     | 
| 86 | 
         
            +
            @article{DBLP:journals/corr/abs-1810-04805,
         
     | 
| 87 | 
         
            +
              author    = {Jacob Devlin and
         
     | 
| 88 | 
         
            +
                           Ming{-}Wei Chang and
         
     | 
| 89 | 
         
            +
                           Kenton Lee and
         
     | 
| 90 | 
         
            +
                           Kristina Toutanova},
         
     | 
| 91 | 
         
            +
              title     = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
         
     | 
| 92 | 
         
            +
                           Understanding},
         
     | 
| 93 | 
         
            +
              journal   = {CoRR},
         
     | 
| 94 | 
         
            +
              volume    = {abs/1810.04805},
         
     | 
| 95 | 
         
            +
              year      = {2018},
         
     | 
| 96 | 
         
            +
              url       = {http://arxiv.org/abs/1810.04805},
         
     | 
| 97 | 
         
            +
              archivePrefix = {arXiv},
         
     | 
| 98 | 
         
            +
              eprint    = {1810.04805},
         
     | 
| 99 | 
         
            +
              timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
         
     | 
| 100 | 
         
            +
              biburl    = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
         
     | 
| 101 | 
         
            +
              bibsource = {dblp computer science bibliography, https://dblp.org}
         
     | 
| 102 | 
         
            +
            }
         
     | 
| 103 | 
         
            +
            ```
         
     | 
| 104 | 
         
            +
            ```
         
     | 
| 105 | 
         
            +
            @inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
         
     | 
| 106 | 
         
            +
                title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
         
     | 
| 107 | 
         
            +
                author = "Tjong Kim Sang, Erik F.  and
         
     | 
| 108 | 
         
            +
                  De Meulder, Fien",
         
     | 
| 109 | 
         
            +
                booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
         
     | 
| 110 | 
         
            +
                year = "2003",
         
     | 
| 111 | 
         
            +
                url = "https://www.aclweb.org/anthology/W03-0419",
         
     | 
| 112 | 
         
            +
                pages = "142--147",
         
     | 
| 113 | 
         
            +
            }
         
     | 
| 114 | 
         
            +
            ```
         
     |