Cockton and Woolrych from back in 2001. Can’t find online version of the study, but Cockton’s site mentions it (no link).
Had 75 heuristic evaluators that only detected 74% of the actual problems. 5 problems experienced by users were totally missed by the evaluators. For the 14 overall accuarately predicted problems there were 71 heuristics applied, 61% of which ended up being inappropriately applied.
So why the hell do we continue to do heuristic (pr expert) evaluations?
Because they are cheap, fast, and often easily actionable on found problems. Doesn’t mean you are getting all the problems, but it should make you feel good.
Personally, I vote for heuristability.