MemMamba: Rethinking Memory Patterns in State Space Model Paper • 2510.03279 • Published about 1 month ago • 69 • 3
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published Jul 10 • 25 • 4
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 68 • 21