Update README.md with appropriate references
Browse files
README.md
CHANGED
|
@@ -1,17 +1,36 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
# Protein Sequence Modelling with Bayesian Flow Networks
|
| 6 |
-
|
| 7 |
-
Welcome to the model weights for the paper "Protein Sequence Modelling with Bayesian Flow Networks". Using the [code on our GitHub page](https://github.com/instadeepai/protein-sequence-bfn), you can sample from our trained models ProtBFN, for general proteins, and AbBFN, for antibody VH chains.
|
| 8 |
-
|
| 9 |
-
[Bayesian Flow Networks](https://arxiv.org/abs/2308.07037) are a new approach to generative modelling, and can be viewed as an extension of diffusion models to the parameter space of probability distributions. They define a continuous-time process that maps between a naive prior distribution and a psuedo-deterministic posterior distribution for each variable independently. By training our neural network to 'denoise' the current posterior, by taking into account mutual information between variables, we implicitly minimise a variational lower bound. We can then use our trained neural network to generate samples from the learned distribution.
|
| 10 |
-
|
| 11 |
-
One of the benefits of defining such a process in probability parameter space is that it can be applied to *any* family of distributions with continous-valued parameters. This means that BFNs can be directly applied to discrete data, allowing for diffusion-like generative modelling for sequences without restrictive left-to-right inductive biases or relying on discrete-time stochastic processes. The main focus of our work is to investigate the application of BFNs to *protein sequences*, as represented by a sequence of amino acids. The ProtBFN methodology is broadly summarised below:
|
| 12 |
-
|
| 13 |
-

|
| 14 |
-
|
| 15 |
-
Having trained ProtBFN, we find that it is exceptionally performant at unconditional generation of de novo protein sequences. For example, we find that we are able to rediscover a variety of structural motifs, according to structures predicted by ESMFold, with high sequence novelty:
|
| 16 |
-
|
| 17 |
-

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Protein Sequence Modelling with Bayesian Flow Networks
|
| 6 |
+
|
| 7 |
+
Welcome to the model weights for the paper ["Protein Sequence Modelling with Bayesian Flow Networks"](https://www.biorxiv.org/content/10.1101/2024.09.24.614734v1). Using the [code on our GitHub page](https://github.com/instadeepai/protein-sequence-bfn), you can sample from our trained models ProtBFN, for general proteins, and AbBFN, for antibody VH chains.
|
| 8 |
+
|
| 9 |
+
[Bayesian Flow Networks](https://arxiv.org/abs/2308.07037) are a new approach to generative modelling, and can be viewed as an extension of diffusion models to the parameter space of probability distributions. They define a continuous-time process that maps between a naive prior distribution and a psuedo-deterministic posterior distribution for each variable independently. By training our neural network to 'denoise' the current posterior, by taking into account mutual information between variables, we implicitly minimise a variational lower bound. We can then use our trained neural network to generate samples from the learned distribution.
|
| 10 |
+
|
| 11 |
+
One of the benefits of defining such a process in probability parameter space is that it can be applied to *any* family of distributions with continous-valued parameters. This means that BFNs can be directly applied to discrete data, allowing for diffusion-like generative modelling for sequences without restrictive left-to-right inductive biases or relying on discrete-time stochastic processes. The main focus of our work is to investigate the application of BFNs to *protein sequences*, as represented by a sequence of amino acids. The ProtBFN methodology is broadly summarised below:
|
| 12 |
+
|
| 13 |
+

|
| 14 |
+
|
| 15 |
+
Having trained ProtBFN, we find that it is exceptionally performant at unconditional generation of de novo protein sequences. For example, we find that we are able to rediscover a variety of structural motifs, according to structures predicted by ESMFold, with high sequence novelty:
|
| 16 |
+
|
| 17 |
+

|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## Cite our work
|
| 21 |
+
|
| 22 |
+
If you have used ProtBFN or AbBFN in your work, you can cite us using the following bibtex entry:
|
| 23 |
+
|
| 24 |
+
```text
|
| 25 |
+
@article {Atkinson2024.09.24.614734,
|
| 26 |
+
author = {Atkinson, Timothy and Barrett, Thomas D. and Cameron, Scott and Guloglu, Bora and Greenig, Matthew and Robinson, Louis and Graves, Alex and Copoiu, Liviu and Laterre, Alexandre},
|
| 27 |
+
title = {Protein Sequence Modelling with Bayesian Flow Networks},
|
| 28 |
+
elocation-id = {2024.09.24.614734},
|
| 29 |
+
year = {2024},
|
| 30 |
+
doi = {10.1101/2024.09.24.614734},
|
| 31 |
+
publisher = {Cold Spring Harbor Laboratory},
|
| 32 |
+
URL = {https://www.biorxiv.org/content/early/2024/09/26/2024.09.24.614734},
|
| 33 |
+
eprint = {https://www.biorxiv.org/content/early/2024/09/26/2024.09.24.614734.full.pdf},
|
| 34 |
+
journal = {bioRxiv}
|
| 35 |
+
}
|
| 36 |
+
```
|