强化学习 (RL)
Reinforcement Learning (RL)
熟悉 Python/PyTorch,拥有丰富的 RL 算法设计开发与性能优化经验。在基于世界模型的 RL (World Models)、蒙特卡洛树搜索 (MCTS)、高效探索 (Exploration) 及多任务学习等前沿方向有深入研究与顶会发表记录。
Proficient in Python/PyTorch with extensive experience in RL algorithm design, development, and performance optimization. In-depth research and top-venue publications in world model-based RL, Monte Carlo Tree Search (MCTS), efficient exploration, and multi-task learning.
大语言模型 (LLM)
Large Language Models (LLM)
具备百亿参数级多模态大模型的后训练 (Post-training) 实战经验。熟练掌握 PPO、GRPO 等强化学习微调技术,在样本高效及训练稳定性等方面有独立探究经验。密切关注 RLVR/RLHF 与世界模型相融合的前沿技术。
Hands-on post-training experience with 10B+ parameter multimodal LLMs. Proficient in PPO, GRPO and other RL fine-tuning techniques, with independent research on sample efficiency and training stability. Closely tracking the frontier of RLVR/RLHF integrated with world models.