This model was distilled using only SFT, or through a combination of SFT and RL?

#23
by wizardII - opened

Hi, thank you for releasing this model! While using it, I noticed that it seems to exhibit some characteristics similar to RL-trained models. May I kindly ask whether this model was distilled using only SFT, or through a combination of SFT and RL?

Sign up or log in to comment