Hi, thank you for releasing this model! While using it, I noticed that it seems to exhibit some characteristics similar to RL-trained models. May I kindly ask whether this model was distilled using only SFT, or through a combination of SFT and RL?
· Sign up or log in to comment