Benchmark reveals flaws: Microsoft's DELEGATE-52 benchmark shows top AI models corrupt around 25% of document content in long ...