The first approach we used was called ‘topic modelling’. We used computational methods for automatically extracting topics from sets of documents. LDA finds topics that documents belong to on the basis of words in the document, assuming that documents with similar topics will use similar words. In a second approach, we checked whether the tone of reports changes over time, in the form of a sentiment analysis. Finally, we also looked at “semantic similarity” around 2014 and 2019.
What did we find? Before and after america rcs data publishing the myths document (10,544 reports with 2,097 unique terms) there was limited change in sentiment and topics in the reports. If anything, the semantic similarity reduced slightly. However, the introduction of the EIF in 2019 did lead to an increase of certain themes and topics from 2018 to 2022 as demonstrated in the figure.
Notable topics in the 11,245 reports with 1,944 terms, are Topics 12 (blue-green), 13 (pink) and 14 (pale green), with overall more emphasis on curriculum. emphasis on the ‘Quality of Education’ in the new framework. Other notable topics were leadership and subject specialism, including themes of reading and phonics, and an emphasis on specific subjects. Framework changes also included an increased emphasis on leadership, especially in the context of supporting teachers in schools. The semantic similarity of inspection reports also increased with the introduction of the EIF.
Analysis of sentiments
An additional analysis of the language in the most recent inspection reports shows that, since reports became much shorter in 2020, only the reports for Outstanding and Inadequate schools demonstrate ‘sentiments’ commensurate with the judgement, respectively very positive and very negative, something which can be seen in the figure.