Performance measurement in the public sector is largely based on `hard' metrics, which have the benefit of being transparent, but may be subject to gaming behavior. Subjective performance evaluation offers the potential advantage of `measuring what matters', but is open to manipulation by the bureaucrats charged with oversight. This paper investigates a novel school inspection system where independent inspectors visit schools at very short notice, write and disclose school quality reports and sanction those schools rated Fail. First, I demonstrate that inspection ratings can aid in distinguishing between more and less effective schools, even after controlling for standard observed school characteristics. Second, I evaluate the causal effect of a fail inspection on subsequent student performance. The evidence shows that a fail inspection leads to test score gains. The largest gains accrue to students with lower prior ability; this result cannot be accounted for by `ceiling effects' for high ability students. The evidence also shows that at least some of these gains persist in the medium term. Furthermore, and in contrast with much evidence from test-based accountability regimes, I find no evidence to suggest that fail schools are able to inflate test score performance by gaming the system. Oversight by inspectors may play an important role in mitigating such strategic behavior. JEL: H11, I20, I28 Key words: subjective performance evaluation; gaming behavior; school inspections.