File size: 3,026 Bytes
cc5f7d1
 
 
 
 
 
 
 
 
feec6b9
 
cc5f7d1
feec6b9
cc5f7d1
 
 
 
feec6b9
 
cc5f7d1
 
 
 
 
 
 
 
be1eed7
 
feec6b9
be1eed7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c19938
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: cc-by-nc-nd-4.0
---

<h1 align='center'>Token-Level Guided Discrete Diffusion for Membrane Protein Design</h1>

<div align="center">
  <a href="https://shreygoel09.github.io/" target="_blank">Shrey Goel</a><sup>1</sup>&ensp;<b>&middot;</b>&ensp;
  <a href="https://www.linkedin.com/in/perin-schray-96855a32b/" target="_blank">Perin Schray</a><sup>2</sup>&ensp;<b>&middot;</b>&ensp;
  <a href="https://www.linkedin.com/in/yinuozhang98/" target="_blank">Yinuo Zhang</a><sup>3</sup>&ensp;<b>&middot;</b>&ensp;
  <a href="https://www.linkedin.com/in/sophia-vincoff-185192146/" target="_blank">Sophia Vincoff</a><sup>4</sup>&ensp;<b>&middot;</b>&ensp;
  <a href="https://www.linkedin.com/in/htkratochvil/" target="_blank">Huong T. Kratochvil</a><sup>2</sup>&ensp;<b>&middot;</b>&ensp;
  <a href="https://www.chatterjeelab.com/" target="_blank">Pranam Chatterjee</a><sup>4<sup>
  <br>
  <p style="font-size: 16px;">
  <sup>1</sup> Duke University &emsp; 
  <sup>2</sup> UNC—Chapel Hill &emsp; 
  <sup>3</sup> Duke-NUS Medical School &emsp; 
  <sup>4</sup> University of Pennsylvania &emsp; 
</div>
    
<div align="center">
 <a href="https://arxiv.org/abs/2410.16735"><img src="https://img.shields.io/badge/Arxiv-2506.09007-red?style=for-the-badge&logo=Arxiv" alt="arXiv"/></a>

</div>




![MemDLM diagram](./memdlm_schematic.png)


Reparameterized diffusion models (RDMs) have recently matched autoregressive methods in protein generation, motivating their use for challenging tasks such as designing membrane proteins, which possess interleaved soluble and transmembrane (TM) regions.  

We introduce ***Membrane Diffusion Language Model (MemDLM)***, a fine-tuned RDM-based protein language model that enables controllable membrane protein sequence design. MemDLM-generated sequences recapitulate the TM residue density and structural features of natural membrane proteins, achieving comparable biological plausibility and outperforming state-of-the-art diffusion baselines in motif scaffolding tasks by producing:  

- Lower perplexity  
- Higher BLOSUM-62 scores  
- Improved pLDDT confidence  

To enhance controllability, we develop ***Per-Token Guidance (PET)***, a novel classifier-guided sampling strategy that selectively solubilizes residues while preserving conserved TM domains. This yields sequences with reduced TM density but intact functional cores.  

Importantly, MemDLM designs validated in TOXCAT β-lactamase growth assays demonstrate successful TM insertion, distinguishing high-quality generated sequences from poor ones.  

Together, our framework establishes the first experimentally validated diffusion-based model for rational membrane protein generation, integrating *de novo* design, motif scaffolding, and targeted property optimization.  



## **Repository Authors**
- <u>[Shrey Goel](https://shreygoel09.github.io/)</u> – undergraduate student at Duke University  
- <u>[Pranam Chatterjee](mailto:pranam@seas.upenn.edu)</u> – Assistant Professor at University of Pennsylvania