arxiv:2509.22611
Junkang Wu
junkang0909
AI & ML interests
LLM alignment
Recent Activity
upvoted
a
paper
about 1 month ago
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
authored
a paper
about 1 month ago
Aligning Multimodal LLM with Human Preference: A Survey
authored
a paper
about 1 month ago
Robust Preference Optimization via Dynamic Target Margins
Organizations
None yet