Hi, could you consider training a 34b model using the rwkv architecture and compare it with Transformers + Mamba?
#3
by
win10
- opened
Hi, could you consider training a 34b model using the rwkv architecture and compare it with Transformers + Mamba?
https://github.com/BlinkDL/RWKV-LM
Hi there!
RWKV is indeed an interesting architecture design. The preliminary studies or benchmarking on small scale will unblock next-step actions.
Stay tuned!