The tweet was deleted by the author.
But we saved everything 🙂.
OpenAI has revealed that a limited extent of accidental chain of thought (CoT) grading impacted some of its released AI models.
The company explained its commitment to preserving monitorability in AI agents by avoiding penalties for misaligned reasoning during reinforcement learning. OpenAI is now openly sharing its analysis of the grading issue to inform users and the broader AI community. The move underscores ongoing industry efforts to mitigate AI agent misalignment and strengthen quality control.
OpenAI recently unveiled advanced security measures for ChatGPT accounts, adding phishing-resistant sign-ins and enhanced user protections. The company has also spotlighted how young developers are building new AI applications using its accessible tools. These moves reflect ongoing product updates and engagement with the developer community.