The tweet was deleted by the author.
But we saved everything 🙂.
Tanishq Mathew Abraham has raised pertinent questions regarding the use of torchtitan for reinforcement learning (RL) training.
He noted that, despite its popularity, torchtitan does not directly support GRPO, prompting users to create their own implementations.
Abraham inquired about the existence of any reliable open-source forks of torchtitan that integrate GRPO functionality. This calls attention to a potential gap in available resources for those leveraging the platform in research and application development within machine learning.
Developers and practitioners in the field may need to collaborate to address this need, or explore existing forks that could help streamline their processes.
Abraham’s observations highlight recurring challenges in the reinforcement learning community, particularly regarding tooling and resource accessibility. These concerns build upon his recent work advancing data efficiency, as seen with the introduction of DEPO for reinforcement learning. Additionally, his reflections on the durability of vital AI coding tools underscore the importance of robust, well-supported platforms—a need further emphasized by current gaps in torchtitan’s ecosystem.