Training on Peng Tan's AI Blog

Training on Peng Tan's AI Bloghttps://c44db530.hobbytp-github-io.pages.dev/zh/tags/training/一个关注 AI 各领域的专题博客Reflect, Retry, Reward: 大型语言模型的自我进化新范式https://c44db530.hobbytp-github-io.pages.dev/zh/papers/reflect_retry_reward_rl_finetunning/Fri, 04 Jul 2025 22:30:00 +0800https://c44db530.hobbytp-github-io.pages.dev/zh/papers/reflect_retry_reward_rl_finetunning/Reflect, Retry, Reward: 大型语言模型的自我进化新范式微调https://c44db530.hobbytp-github-io.pages.dev/zh/training/finetuning/Wed, 26 Feb 2025 22:14:00 +0800https://c44db530.hobbytp-github-io.pages.dev/zh/training/finetuning/本文介绍了微调的常见挑战及其克服方法，并详细介绍了如何使用Unsloth在消费级GPU上对DeepSeek-R1进行微调。