<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforced on Peng Tan's AI Blog</title><link>https://c44db530.hobbytp-github-io.pages.dev/zh/tags/reinforced/</link><description>一个关注 AI 各领域的专题博客</description><atom:link href="https://c44db530.hobbytp-github-io.pages.dev/zh/tags/reinforced/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforced Self-play Reasoning with Zero Data 论文解读</title><link>https://c44db530.hobbytp-github-io.pages.dev/zh/training/reinforced_selfplay_reasoning_w_zero_data/</link><pubDate>Sun, 11 May 2025 20:10:00 +0800</pubDate><guid>https://c44db530.hobbytp-github-io.pages.dev/zh/training/reinforced_selfplay_reasoning_w_zero_data/</guid><description>论文介绍了强化自博弈推理的零数据范式，通过自博弈生成任务和验证，实现无需依赖人工标注数据或预设任务的自主学习推理。</description></item></channel></rss>