Hi, could you consider training a 34b model using the rwkv architecture and compare it with Transformers + Mamba?

#3
by win10 - opened

Hi, could you consider training a 34b model using the rwkv architecture and compare it with Transformers + Mamba?
https://github.com/BlinkDL/RWKV-LM

Technology Innovation Institute org

Hi there!

RWKV is indeed an interesting architecture design. The preliminary studies or benchmarking on small scale will unblock next-step actions.

Stay tuned!

Sign up or log in to comment