feat: Add CPU support
#18
by
gabegoodhart
- opened
Description
This PR adds support to modeling_nemotron.py for running inference on CPU. This is a cleaned up version of the edits I made while working on support in llama.cpp.
Changes
- Handle failed imports of
rmsnorm_fn - Add un-optimized implementation of
MambaRMSNormGated.forward - Fix
NemotronHMamba2Mixer.torch_forwardto userepeat_interleavedforBandC(see discussion here)