Upload folder using huggingface_hub
Browse files- README.md +263 -80
 - config.json +1 -1
 
    	
        README.md
    CHANGED
    
    | 
         @@ -14,67 +14,104 @@ tags: 
     | 
|
| 14 | 
         
             
            > library. Slight numerical differences may be observed between the original model and the optimized
         
     | 
| 15 | 
         
             
            > model. For instructions on how to install TransformerEngine, please refer to the
         
     | 
| 16 | 
         
             
            > [official documentation](https://github.com/NVIDIA/TransformerEngine?tab=readme-ov-file#installation).
         
     | 
| 17 | 
         
            -
             
     | 
| 18 | 
         
            -
             
     | 
| 19 | 
         
            -
             
     | 
| 20 | 
         
            -
            ##  
     | 
| 21 | 
         
            -
             
     | 
| 22 | 
         
            -
            AMPLIFY is an efficient, state-of-the-art protein language model  
     | 
| 23 | 
         
            -
             
     | 
| 24 | 
         
            -
             
     | 
| 25 | 
         
            -
             
     | 
| 26 | 
         
            -
             
     | 
| 27 | 
         
            -
             
     | 
| 28 | 
         
            -
             
     | 
| 29 | 
         
            -
             
     | 
| 30 | 
         
            -
             
     | 
| 31 | 
         
            -
             
     | 
| 32 | 
         
            -
             
     | 
| 33 | 
         
            -
             
     | 
| 34 | 
         
            -
             
     | 
| 35 | 
         
            -
             
     | 
| 36 | 
         
            -
             
     | 
| 37 | 
         
            -
             
     | 
| 38 | 
         
            -
             
     | 
| 39 | 
         
            -
             
     | 
| 40 | 
         
            -
             
     | 
| 41 | 
         
            -
             
     | 
| 42 | 
         
            -
             
     | 
| 43 | 
         
            -
             
     | 
| 44 | 
         
            -
             
     | 
| 45 | 
         
            -
             
     | 
| 46 | 
         
            -
             
     | 
| 47 | 
         
            -
             
     | 
| 48 | 
         
            -
             
     | 
| 49 | 
         
            -
             
     | 
| 50 | 
         
            -
             
     | 
| 51 | 
         
            -
             
     | 
| 52 | 
         
            -
             
     | 
| 53 | 
         
            -
             
     | 
| 54 | 
         
            -
             
     | 
| 55 | 
         
            -
             
     | 
| 56 | 
         
            -
             
     | 
| 57 | 
         
            -
             
     | 
| 58 | 
         
            -
             
     | 
| 59 | 
         
            -
             
     | 
| 60 | 
         
            -
             
     | 
| 61 | 
         
            -
             
     | 
| 62 | 
         
            -
             
     | 
| 63 | 
         
            -
             
     | 
| 64 | 
         
            -
             
     | 
| 65 | 
         
            -
             
     | 
| 66 | 
         
            -
             
     | 
| 67 | 
         
            -
             
     | 
| 68 | 
         
            -
             
     | 
| 69 | 
         
            -
             
     | 
| 70 | 
         
            -
             
     | 
| 71 | 
         
            -
             
     | 
| 72 | 
         
            -
             
     | 
| 73 | 
         
            -
             
     | 
| 74 | 
         
            -
             
     | 
| 75 | 
         
            -
             
     | 
| 76 | 
         
            -
             
     | 
| 77 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 78 | 
         | 
| 79 | 
         
             
            ```python
         
     | 
| 80 | 
         
             
            from transformers import AutoModel
         
     | 
| 
         @@ -82,8 +119,10 @@ from transformers import AutoTokenizer 
     | 
|
| 82 | 
         
             
            from datasets import load_dataset
         
     | 
| 83 | 
         | 
| 84 | 
         
             
            # Load AMPLIFY and tokenizer
         
     | 
| 85 | 
         
            -
            model = AutoModel.from_pretrained("nvidia/ 
     | 
| 86 | 
         
            -
            tokenizer = AutoTokenizer.from_pretrained( 
     | 
| 
         | 
|
| 
         | 
|
| 87 | 
         | 
| 88 | 
         
             
            # Move the model to GPU (required due to Flash Attention)
         
     | 
| 89 | 
         
             
            model = model.to("cuda")
         
     | 
| 
         @@ -107,20 +146,164 @@ for sample in dataset: 
     | 
|
| 107 | 
         
             
                break
         
     | 
| 108 | 
         
             
            ```
         
     | 
| 109 | 
         | 
| 110 | 
         
            -
            ##  
     | 
| 111 | 
         
            -
             
     | 
| 112 | 
         
            -
             
     | 
| 113 | 
         
            -
             
     | 
| 114 | 
         
            -
             
     | 
| 115 | 
         
            -
             
     | 
| 116 | 
         
            -
             
     | 
| 117 | 
         
            -
             
     | 
| 118 | 
         
            -
             
     | 
| 119 | 
         
            -
             
     | 
| 120 | 
         
            -
             
     | 
| 121 | 
         
            -
             
     | 
| 122 | 
         
            -
             
     | 
| 123 | 
         
            -
             
     | 
| 124 | 
         
            -
             
     | 
| 125 | 
         
            -
             
     | 
| 126 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 14 | 
         
             
            > library. Slight numerical differences may be observed between the original model and the optimized
         
     | 
| 15 | 
         
             
            > model. For instructions on how to install TransformerEngine, please refer to the
         
     | 
| 16 | 
         
             
            > [official documentation](https://github.com/NVIDIA/TransformerEngine?tab=readme-ov-file#installation).
         
     | 
| 17 | 
         
            +
             
     | 
| 18 | 
         
            +
            # AMPLIFY (TransformerEngine-Optimized) Overview
         
     | 
| 19 | 
         
            +
             
     | 
| 20 | 
         
            +
            ## Description:
         
     | 
| 21 | 
         
            +
             
     | 
| 22 | 
         
            +
            AMPLIFY is an efficient, state-of-the-art protein language model (pLM). AMPLIFY can generate residue and protein
         
     | 
| 23 | 
         
            +
            embeddings, suggest mutations, differentiate disordered proteins from non-protein sequences. AMPLIFY is available in two
         
     | 
| 24 | 
         
            +
            sizes, 120M and 350M parameters.
         
     | 
| 25 | 
         
            +
             
     | 
| 26 | 
         
            +
            This version of the AMPLIFY model is optimized with NVIDIA's
         
     | 
| 27 | 
         
            +
            [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) library. It is based on the original AMPLIFY model from
         
     | 
| 28 | 
         
            +
            Chandar Research Lab (CRL), and (within numerical precision) has identical weights and outputs.
         
     | 
| 29 | 
         
            +
             
     | 
| 30 | 
         
            +
            This model is ready for commercial/non-commercial use.
         
     | 
| 31 | 
         
            +
             
     | 
| 32 | 
         
            +
            ## Third-Party Community Consideration
         
     | 
| 33 | 
         
            +
             
     | 
| 34 | 
         
            +
            This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements
         
     | 
| 35 | 
         
            +
            for this application and use case; see link to Non-NVIDIA [AMPLIFY Model
         
     | 
| 36 | 
         
            +
            Card](https://huggingface.co/chandar-lab/AMPLIFY_120M).
         
     | 
| 37 | 
         
            +
             
     | 
| 38 | 
         
            +
            ### License/Terms of Use:
         
     | 
| 39 | 
         
            +
             
     | 
| 40 | 
         
            +
            AMPLIFY is provided under the [MIT license](https://github.com/chandar-lab/AMPLIFY/blob/main/LICENSE).
         
     | 
| 41 | 
         
            +
             
     | 
| 42 | 
         
            +
            ### Deployment Geography:
         
     | 
| 43 | 
         
            +
             
     | 
| 44 | 
         
            +
            Global
         
     | 
| 45 | 
         
            +
             
     | 
| 46 | 
         
            +
            ### Use Case:
         
     | 
| 47 | 
         
            +
             
     | 
| 48 | 
         
            +
            Protein design, mutation prediction, and function analysis.
         
     | 
| 49 | 
         
            +
             
     | 
| 50 | 
         
            +
            ### Release Date:
         
     | 
| 51 | 
         
            +
             
     | 
| 52 | 
         
            +
            Hugging Face 06/12/2025 via [https://huggingface.co/nvidia/AMPLIFY_120M](https://huggingface.co/nvidia/AMPLIFY_120M)
         
     | 
| 53 | 
         
            +
             
     | 
| 54 | 
         
            +
            ## References:
         
     | 
| 55 | 
         
            +
             
     | 
| 56 | 
         
            +
            - [Protein Language Models: Is Scaling
         
     | 
| 57 | 
         
            +
              Necessary?](https://www.biorxiv.org/content/biorxiv/early/2024/09/23/2024.09.23.614603.full.pdf) - detailed
         
     | 
| 58 | 
         
            +
              information on the model architecture and training data.
         
     | 
| 59 | 
         
            +
             
     | 
| 60 | 
         
            +
            ## Model Architecture:
         
     | 
| 61 | 
         
            +
             
     | 
| 62 | 
         
            +
            **Architecture Type:** Transformer
         
     | 
| 63 | 
         
            +
            **Network Architecture:** ESM-2
         
     | 
| 64 | 
         
            +
             
     | 
| 65 | 
         
            +
            **This model was developed based on:** [AMPLIFY](https://huggingface.co/chandar-lab/AMPLIFY_120M) <br>
         
     | 
| 66 | 
         
            +
            **Number of model parameters:** 1.2 x 10^8
         
     | 
| 67 | 
         
            +
             
     | 
| 68 | 
         
            +
            ## Input:
         
     | 
| 69 | 
         
            +
             
     | 
| 70 | 
         
            +
            **Input Type:** Text (Protein Sequences) <br>
         
     | 
| 71 | 
         
            +
            **Input Format:** String <br>
         
     | 
| 72 | 
         
            +
            **Input Parameters:** One-Dimensional (1D) <br>
         
     | 
| 73 | 
         
            +
            **Other Properties Related to Input:** Protein sequence represented as a string of canonical amino acids. The maximum
         
     | 
| 74 | 
         
            +
            context length is 2048 residues.
         
     | 
| 75 | 
         
            +
             
     | 
| 76 | 
         
            +
            ## Output:
         
     | 
| 77 | 
         
            +
             
     | 
| 78 | 
         
            +
            **Output Type:** Embeddings (Amino acid and sequence-level) <br>
         
     | 
| 79 | 
         
            +
            **Output Format:** Numeric vector <br>
         
     | 
| 80 | 
         
            +
            **Output Parameters:** One-Dimensional (1D) <br>
         
     | 
| 81 | 
         
            +
            **Other Properties Related to Output:** Numeric vector with floating-point values corresponding to an embedding for each
         
     | 
| 82 | 
         
            +
            amino acid in the input protein sequence.
         
     | 
| 83 | 
         
            +
             
     | 
| 84 | 
         
            +
            Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware
         
     | 
| 85 | 
         
            +
            (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times
         
     | 
| 86 | 
         
            +
            compared to CPU-only solutions.
         
     | 
| 87 | 
         
            +
             
     | 
| 88 | 
         
            +
            ## Software Integration:
         
     | 
| 89 | 
         
            +
             
     | 
| 90 | 
         
            +
            **Runtime Engines:**
         
     | 
| 91 | 
         
            +
             
     | 
| 92 | 
         
            +
            - Hugging Face Transformers
         
     | 
| 93 | 
         
            +
             
     | 
| 94 | 
         
            +
            **Supported Hardware Microarchitecture Compatibility:**
         
     | 
| 95 | 
         
            +
             
     | 
| 96 | 
         
            +
            - NVIDIA Ampere
         
     | 
| 97 | 
         
            +
            - NVIDIA Blackwell
         
     | 
| 98 | 
         
            +
            - NVIDIA Hopper
         
     | 
| 99 | 
         
            +
             
     | 
| 100 | 
         
            +
            **Preferred Operating System(s):**
         
     | 
| 101 | 
         
            +
             
     | 
| 102 | 
         
            +
            - Linux
         
     | 
| 103 | 
         
            +
             
     | 
| 104 | 
         
            +
            The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific
         
     | 
| 105 | 
         
            +
            data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at
         
     | 
| 106 | 
         
            +
            both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure
         
     | 
| 107 | 
         
            +
            compliance with safety and ethical standards before deployment.
         
     | 
| 108 | 
         
            +
             
     | 
| 109 | 
         
            +
            ## Model and checkpoint versions are noted below:
         
     | 
| 110 | 
         
            +
             
     | 
| 111 | 
         
            +
            - [AMPLIFY_350M](https://huggingface.co/nvidia/AMPLIFY_350M) <br>
         
     | 
| 112 | 
         
            +
            - [AMPLIFY_120M](https://huggingface.co/nvidia/AMPLIFY_120M) <br>
         
     | 
| 113 | 
         
            +
             
     | 
| 114 | 
         
            +
            **Get Started**
         
     | 
| 115 | 
         | 
| 116 | 
         
             
            ```python
         
     | 
| 117 | 
         
             
            from transformers import AutoModel
         
     | 
| 
         | 
|
| 119 | 
         
             
            from datasets import load_dataset
         
     | 
| 120 | 
         | 
| 121 | 
         
             
            # Load AMPLIFY and tokenizer
         
     | 
| 122 | 
         
            +
            model = AutoModel.from_pretrained("nvidia/AMPLIFY_120M", trust_remote_code=True)
         
     | 
| 123 | 
         
            +
            tokenizer = AutoTokenizer.from_pretrained(
         
     | 
| 124 | 
         
            +
                "nvidia/AMPLIFY_120M", trust_remote_code=True
         
     | 
| 125 | 
         
            +
            )
         
     | 
| 126 | 
         | 
| 127 | 
         
             
            # Move the model to GPU (required due to Flash Attention)
         
     | 
| 128 | 
         
             
            model = model.to("cuda")
         
     | 
| 
         | 
|
| 146 | 
         
             
                break
         
     | 
| 147 | 
         
             
            ```
         
     | 
| 148 | 
         | 
| 149 | 
         
            +
            ## Training and Evaluation Datasets:
         
     | 
| 150 | 
         
            +
             
     | 
| 151 | 
         
            +
            ## Training Datasets:
         
     | 
| 152 | 
         
            +
             
     | 
| 153 | 
         
            +
            **Link:** [UniRef100](https://www.uniprot.org/uniref?query=identity%3A1.0)
         
     | 
| 154 | 
         
            +
             
     | 
| 155 | 
         
            +
            **Data Modality:**
         
     | 
| 156 | 
         
            +
             
     | 
| 157 | 
         
            +
            - Text (Protein Sequences)
         
     | 
| 158 | 
         
            +
             
     | 
| 159 | 
         
            +
            **Text Training Data Size:**
         
     | 
| 160 | 
         
            +
             
     | 
| 161 | 
         
            +
            - 1 Billion to 10 Trillion Tokens
         
     | 
| 162 | 
         
            +
             
     | 
| 163 | 
         
            +
            **Data Collection Method:**
         
     | 
| 164 | 
         
            +
             
     | 
| 165 | 
         
            +
            - Human
         
     | 
| 166 | 
         
            +
             
     | 
| 167 | 
         
            +
            **Labeling Method:**
         
     | 
| 168 | 
         
            +
             
     | 
| 169 | 
         
            +
            - N/A
         
     | 
| 170 | 
         
            +
             
     | 
| 171 | 
         
            +
            **Properties (Quantity, Dataset Descriptions, Sensor(s)):** UniRef100 contains all records in the UniProt Knowledgebase
         
     | 
| 172 | 
         
            +
            and selected UniParc records. In UniRef100, identical sequences and subfragments are placed into a single cluster using
         
     | 
| 173 | 
         
            +
            the CD-HIT algorithm. The longest members of the cluster (seed sequences) are used to generate UniRef90. However, the
         
     | 
| 174 | 
         
            +
            longest sequence is not always the most informative. There is often more biologically relevant information and
         
     | 
| 175 | 
         
            +
            annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are
         
     | 
| 176 | 
         
            +
            ranked to facilitate the selection of a biologically relevant representative for the cluster.
         
     | 
| 177 | 
         
            +
             
     | 
| 178 | 
         
            +
            **Link:** [Observed Antibody Space (OAS)](https://opig.stats.ox.ac.uk/webapps/oas/downloads_paired/)
         
     | 
| 179 | 
         
            +
             
     | 
| 180 | 
         
            +
            **Data Modality:**
         
     | 
| 181 | 
         
            +
             
     | 
| 182 | 
         
            +
            - Text (Protein Sequences)
         
     | 
| 183 | 
         
            +
             
     | 
| 184 | 
         
            +
            **Text Training Data Size:**
         
     | 
| 185 | 
         
            +
             
     | 
| 186 | 
         
            +
            - 1 Billion to 10 Trillion Tokens
         
     | 
| 187 | 
         
            +
             
     | 
| 188 | 
         
            +
            **Data Collection Method:**
         
     | 
| 189 | 
         
            +
             
     | 
| 190 | 
         
            +
            - Human
         
     | 
| 191 | 
         
            +
             
     | 
| 192 | 
         
            +
            **Labeling Method:**
         
     | 
| 193 | 
         
            +
             
     | 
| 194 | 
         
            +
            - Human
         
     | 
| 195 | 
         
            +
             
     | 
| 196 | 
         
            +
            **Properties:** The Observed Antibody Space (OAS) database is a project to collect and annotate immune repertoires for
         
     | 
| 197 | 
         
            +
            use in large-scale analysis. It currently contains over one billion sequences, from over 80 different studies. These
         
     | 
| 198 | 
         
            +
            repertoires cover diverse immune states, organisms (primarily human and mouse), and individuals.
         
     | 
| 199 | 
         
            +
             
     | 
| 200 | 
         
            +
            **Link:** [Structural Classification of Proteins (SCOP)](https://www.ebi.ac.uk/pdbe/scop/download)
         
     | 
| 201 | 
         
            +
             
     | 
| 202 | 
         
            +
            **Data Modality:**
         
     | 
| 203 | 
         
            +
             
     | 
| 204 | 
         
            +
            - Text (Protein Sequences)
         
     | 
| 205 | 
         
            +
             
     | 
| 206 | 
         
            +
            **Text Training Data Size:**
         
     | 
| 207 | 
         
            +
             
     | 
| 208 | 
         
            +
            - 1 Billion to 10 Trillion Tokens
         
     | 
| 209 | 
         
            +
             
     | 
| 210 | 
         
            +
            **Data Collection Method:**
         
     | 
| 211 | 
         
            +
             
     | 
| 212 | 
         
            +
            - Hybrid: Human, Automated
         
     | 
| 213 | 
         
            +
             
     | 
| 214 | 
         
            +
            **Labeling Method:**
         
     | 
| 215 | 
         
            +
             
     | 
| 216 | 
         
            +
            - Hybrid: Human, Automated
         
     | 
| 217 | 
         
            +
             
     | 
| 218 | 
         
            +
            **Properties:** The main levels of classification in SCOP are:
         
     | 
| 219 | 
         
            +
             
     | 
| 220 | 
         
            +
            - Class: Groups proteins based on their secondary structure content, such as all-alpha, all-beta, alpha/beta, and
         
     | 
| 221 | 
         
            +
              alpha+beta.
         
     | 
| 222 | 
         
            +
            - Fold: Proteins within the same fold have the same major secondary structures arranged in the same way with the same
         
     | 
| 223 | 
         
            +
              topological connections.
         
     | 
| 224 | 
         
            +
            - Superfamily: Groups protein domains with a probable common evolutionary ancestry based on shared structural and
         
     | 
| 225 | 
         
            +
              functional features, even if sequence similarity is low.
         
     | 
| 226 | 
         
            +
            - Family: Groups closely related proteins with clear evidence of a common evolutionary origin, often detectable through
         
     | 
| 227 | 
         
            +
              sequence comparison methods.
         
     | 
| 228 | 
         
            +
            - Species: Represents a distinct protein sequence.
         
     | 
| 229 | 
         
            +
            - Protein: Groups similar sequences with the same function.
         
     | 
| 230 | 
         
            +
             
     | 
| 231 | 
         
            +
            ## Evaluation Datasets:
         
     | 
| 232 | 
         
            +
             
     | 
| 233 | 
         
            +
            **Link:** [Continuous Automated Model EvaluatiOn (CAMEO)](https://pmc.ncbi.nlm.nih.gov/articles/PMC8673552/)
         
     | 
| 234 | 
         
            +
             
     | 
| 235 | 
         
            +
            **Benchmark Score:** LR P@L of 17.8±14.1
         
     | 
| 236 | 
         
            +
             
     | 
| 237 | 
         
            +
            **Data Collection Method:**
         
     | 
| 238 | 
         
            +
             
     | 
| 239 | 
         
            +
            - Human
         
     | 
| 240 | 
         
            +
             
     | 
| 241 | 
         
            +
            **Labeling Method:**
         
     | 
| 242 | 
         
            +
             
     | 
| 243 | 
         
            +
            - N/A
         
     | 
| 244 | 
         
            +
             
     | 
| 245 | 
         
            +
            **Properties:** The data is collected by taking sequences of protein structures that are about to be released weekly by
         
     | 
| 246 | 
         
            +
            the Protein Data Bank (PDB). These sequences are sent as "blind targets" to participating protein structure prediction
         
     | 
| 247 | 
         
            +
            servers, which then return their predictions.
         
     | 
| 248 | 
         
            +
             
     | 
| 249 | 
         
            +
            **Link:** [CASP14 (Critical Assessment of Methods of Protein Structure
         
     | 
| 250 | 
         
            +
            Prediction)](https://pubmed.ncbi.nlm.nih.gov/34533838/)
         
     | 
| 251 | 
         
            +
             
     | 
| 252 | 
         
            +
            **Benchmark Score:** LR P@L of 12.4±11.3
         
     | 
| 253 | 
         
            +
             
     | 
| 254 | 
         
            +
            **Data Collection Method:**
         
     | 
| 255 | 
         
            +
             
     | 
| 256 | 
         
            +
            - Human
         
     | 
| 257 | 
         
            +
             
     | 
| 258 | 
         
            +
            **Labeling Method:**
         
     | 
| 259 | 
         
            +
             
     | 
| 260 | 
         
            +
            - N/A
         
     | 
| 261 | 
         
            +
             
     | 
| 262 | 
         
            +
            **Properties:** The data for CASP14 targets is collected from protein structures that are newly solved by experimental
         
     | 
| 263 | 
         
            +
            structural biologists. The CASP organizers receive the amino acid sequences of these proteins before their full,
         
     | 
| 264 | 
         
            +
            three-dimensional structures are publicly released in the Protein Data Bank (PDB). They then provide these sequences to
         
     | 
| 265 | 
         
            +
            participating research groups and servers, who must submit their predicted structures within a specific time frame.
         
     | 
| 266 | 
         
            +
             
     | 
| 267 | 
         
            +
            **Link:** [CASP15 (Critical Assessment of Methods of Protein Structure
         
     | 
| 268 | 
         
            +
            Prediction)](https://pubmed.ncbi.nlm.nih.gov/37920879/)
         
     | 
| 269 | 
         
            +
             
     | 
| 270 | 
         
            +
            **Benchmark Score:** LR P@L of 16.9±13.2
         
     | 
| 271 | 
         
            +
             
     | 
| 272 | 
         
            +
            **Data Collection Method:**
         
     | 
| 273 | 
         
            +
             
     | 
| 274 | 
         
            +
            - Human
         
     | 
| 275 | 
         
            +
             
     | 
| 276 | 
         
            +
            **Labeling Method:**
         
     | 
| 277 | 
         
            +
             
     | 
| 278 | 
         
            +
            - N/A
         
     | 
| 279 | 
         
            +
             
     | 
| 280 | 
         
            +
            **Properties:** The data for CASP15 targets is collected from protein structures that are newly solved by experimental
         
     | 
| 281 | 
         
            +
            structural biologists. The CASP organizers receive the amino acid sequences of these proteins before their full,
         
     | 
| 282 | 
         
            +
            three-dimensional structures are publicly released in the Protein Data Bank (PDB). They then provide these sequences to
         
     | 
| 283 | 
         
            +
            participating research groups and servers, who must submit their predicted structures within a specific time frame.
         
     | 
| 284 | 
         
            +
             
     | 
| 285 | 
         
            +
            ## Inference:
         
     | 
| 286 | 
         
            +
             
     | 
| 287 | 
         
            +
            **Acceleration Engine:**
         
     | 
| 288 | 
         
            +
             
     | 
| 289 | 
         
            +
            - Hugging Face Transformers
         
     | 
| 290 | 
         
            +
             
     | 
| 291 | 
         
            +
            **Test Hardware:**
         
     | 
| 292 | 
         
            +
             
     | 
| 293 | 
         
            +
            - A100
         
     | 
| 294 | 
         
            +
            - H100
         
     | 
| 295 | 
         
            +
            - H200
         
     | 
| 296 | 
         
            +
            - GB200
         
     | 
| 297 | 
         
            +
             
     | 
| 298 | 
         
            +
            ## Ethical Considerations:
         
     | 
| 299 | 
         
            +
             
     | 
| 300 | 
         
            +
            NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable
         
     | 
| 301 | 
         
            +
            development for a wide array of AI applications. When downloaded or used in accordance with our terms of service,
         
     | 
| 302 | 
         
            +
            developers should work with their internal model team to ensure this model meets requirements for the relevant industry
         
     | 
| 303 | 
         
            +
            and use case and addresses unforeseen product misuse.
         
     | 
| 304 | 
         
            +
             
     | 
| 305 | 
         
            +
            Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and
         
     | 
| 306 | 
         
            +
            comply with applicable safety regulations and ethical standards.
         
     | 
| 307 | 
         
            +
             
     | 
| 308 | 
         
            +
            Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns
         
     | 
| 309 | 
         
            +
            [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
         
     | 
    	
        config.json
    CHANGED
    
    | 
         @@ -32,7 +32,7 @@ 
     | 
|
| 32 | 
         
             
              "padded_vocab_size": 32,
         
     | 
| 33 | 
         
             
              "pre_activation_layer_norm": true,
         
     | 
| 34 | 
         
             
              "rms_norm": true,
         
     | 
| 35 | 
         
            -
              "transformers_version": "4.56. 
     | 
| 36 | 
         
             
              "unk_token_id": 1,
         
     | 
| 37 | 
         
             
              "vocab_path": "conf/tokenizer/amplify_vocab.txt",
         
     | 
| 38 | 
         
             
              "vocab_size": 27
         
     | 
| 
         | 
|
| 32 | 
         
             
              "padded_vocab_size": 32,
         
     | 
| 33 | 
         
             
              "pre_activation_layer_norm": true,
         
     | 
| 34 | 
         
             
              "rms_norm": true,
         
     | 
| 35 | 
         
            +
              "transformers_version": "4.56.2",
         
     | 
| 36 | 
         
             
              "unk_token_id": 1,
         
     | 
| 37 | 
         
             
              "vocab_path": "conf/tokenizer/amplify_vocab.txt",
         
     | 
| 38 | 
         
             
              "vocab_size": 27
         
     |