Microsoft’s strategy for Artificial Intelligence remains clear: instead of trying to build an in-house model, the Redmond giant is cementing its position as the great orchestrator of a global AI ecosystem. This week, the Windows company launched new features Critique e Council for Copilot Researcher who break the exclusivity barrier with OpenAI and manage to integrate Claude, from rival Anthropic, into a unified workflow. This movement confirms that Microsoft intends to be the “home” where the best models in the world collaborate.
This paradigm shift, which puts the market’s two biggest rivals (GPT and Claude) to work side by side, promises to solve the biggest obstacle to the mass adoption of generative AI in the corporate world: the lack of factual rigor and recurring “hallucinations”. By combining the generative power of GPT with the analytical rigor of Claude, Microsoft establishes what many experts now consider the new gold standard for computer-assisted research.
Since the start of the generative AI craze, Microsoft has followed a distinct path from competitors like Google or Meta. While others try to develop “verticalized” models from scratch, Microsoft has invested many billions to ensure it is the first to commercialize third-party innovations. The partnership with OpenAI was the first step on a path that now becomes more complex, with the integration of Anthropic’s Claude 4. In acting as a “conductor”, Microsoft uses a technological layer called Work IQintegrated into the Microsoft 365 ecosystem, to manage these external models.
Instead of processing an entire request in a single engine, Copilot Researcher works as a decision center that delegates specific parts of a complex task to the model that best performs it.
Critique: a ‘peer review’ system
The isolation of a single model is often the cause of its failure. A solitary system lacks an independent review mechanism, which leads to answers that, although they seem convincing, are sometimes wrong. Enter the mode here Critique, designed to replicate the rigor of an audit firm, or newsroom, eliminating “lonely AI” in favor of a sequential verification process.
In this flow, the process begins with the OpenAI model — recognized for its versatility and planning capacity — which takes on the role of ‘main writer’. GPT is responsible for planning the research, accessing sources in real time through Bing (Microsoft’s search engine), structuring the fundamental topics and writing the first draft of the technical report. But once this draft is completed, the system delegates the auditing task to Claude from Anthropic. (Although, according to Microsoft, the system is dynamic, so it can be changed as it evolves.)
This model, often praised for its logical precision and strict adherence to ethical guidelines, serves here as a ‘senior editor’. Its sole mission is to look for inconsistencies, verify that the cited citations and sources actually exist, and ensure that the information is factual. If Claude detects errors or omissions, the system triggers automatic refinement, returning the document to GPT with specific correction instructions. The end user only receives the report when both systems validate the integrity of the information.
According to Microsoft, Critique significantly improves factual accuracy, reflecting a 13.88% gain over Perplexity Deep Research, the Claude Opus 4.6-based system that led the ranking (read more below)
Council: collective intelligence
In addition to sequential collaboration, Microsoft also introduced the Council (Council), which explores the diversity of “personalities” and architectures of its partners’ models. In this scenario, the logic stops being one of auditing and becomes one of debate. GPT and Claude receive the same briefing and work completely independently on the proposed topic, without prior knowledge of what the other is producing.
At the end of the process, a third AI model acts as a synthesizer, presenting the user with a comparative picture of both approaches. This is where the multiple partner strategy reveals its greatest competitive advantage. While GPT can focus on profitability metrics and dynamic market trends, Claude can identify compliance risks or ethical dilemmas that the first model ignored. This plurality of views, generated by algorithms trained with different philosophies and data sets, will be able to offer decision makers a 360-degree analysis that no single model could replicate.
O benchmark DRACO demonstrates effectiveness
Empirical validation of this research “super AI” comes through the benchmark DRACO (Deep Research Accuracy, Completeness, and Objectivity), a test that assesses the quality of research in critical areas such as Law, Medicine, Finance and Technology, with tasks ranging from complex legal cases to advanced engineering problems. The results obtained confirmed that the sum of the parts orchestrated in this way is more intelligent than each of the AI companies individually, says the North American giant.
In DRACO, Researcher with Critique recorded a 7-point increase in score In aggregate, as mentioned, an improvement of 13.88% compared to Perplexity Deep Research — the reference system based on Claude Opus 4.6.
This nearly 14% improvement over any other system available, including heavyweight solutions like Google Gemini or Perplexity’s deep search tool, points to significant gains in citation fidelity and depth of critical analysis, areas where collaboration between disparate partner models has proven to be the deciding factor.

Leave a Reply