Finalmente ha llegado el indicador que mide la capacidad de llamada del modelo MCP GPT5 está muy por delante... Todo el poder se ha utilizado aquí.
Michael Qizhe Shieh
Michael Qizhe Shieh26 ago 2025
Introducing MCPMark, a collaboration with @EvalSysOrg and @lobehub! We created a challenging benchmark to stress-test MCP use in comprehensive contexts. - 127 high-quality data samples created by experts. - GPT-5 takes the current lead and achieves a Pass@1 of 46.96% while the other models fall in the range of 10-30%. - Diverse test cases on Notion, Github, Filesystem, Playwright (browser), and Postgres. 9🧵s ahead
10,47K