Mittarit, jotka mittaavat mallin MCP-kutsuttavuutta, ovat vihdoin täällä GPT5 on kaukana edellä... Täällä käytetään kaikkia taitoja
Michael Qizhe Shieh
Michael Qizhe Shieh26.8.2025
Introducing MCPMark, a collaboration with @EvalSysOrg and @lobehub! We created a challenging benchmark to stress-test MCP use in comprehensive contexts. - 127 high-quality data samples created by experts. - GPT-5 takes the current lead and achieves a Pass@1 of 46.96% while the other models fall in the range of 10-30%. - Diverse test cases on Notion, Github, Filesystem, Playwright (browser), and Postgres. 9🧵s ahead
10,47K