<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reflect, Retry, Reward on Peng Tan's AI Blog</title><link>https://c44db530.hobbytp-github-io.pages.dev/zh/tags/reflect-retry-reward/</link><description>一个关注 AI 各领域的专题博客</description><atom:link href="https://c44db530.hobbytp-github-io.pages.dev/zh/tags/reflect-retry-reward/index.xml" rel="self" type="application/rss+xml"/><item><title>Reflect, Retry, Reward: 大型语言模型的自我进化新范式</title><link>https://c44db530.hobbytp-github-io.pages.dev/zh/papers/reflect_retry_reward_rl_finetunning/</link><pubDate>Fri, 04 Jul 2025 22:30:00 +0800</pubDate><guid>https://c44db530.hobbytp-github-io.pages.dev/zh/papers/reflect_retry_reward_rl_finetunning/</guid><description>Reflect, Retry, Reward: 大型语言模型的自我进化新范式</description></item></channel></rss>