University of Sussex
Prieto Chavana, Nestor.pdf (2.46 MB)

Automated fact-checking for supporting sub-editing

Download (2.46 MB)
posted on 2023-07-05, 15:28 authored by Nestor Prieto ChavanaNestor Prieto Chavana

Fact-checking is an important step in the process of sub-editing in the newsroom. As part of this process, sub-editors verify the factual information written in articles before their publication, to ensure that it is accurate. It is a slow and time-consuming task, which stands to benefit from automation. This thesis focuses on investigating techniques to automate some of the steps in fact-checking. 

While fact-checking more generally has gained attention from the natural language processing research community in recent years, there is comparatively little research done in the area of its application to sub-editing. In this thesis, a training and evaluation dataset is developed from an archive of published news, created by replicating the factchecking process as performed by sub-editors. 

The first step in fact-checking is the detection of claims requiring verification. Existing research in the field has mainly focused on claims made by political figures, and verified by external organisations. That work is contrasted with the requirements of newsroom fact-checkers, and the transferability of existing work to the sub-editing task is evaluated. 

Fact-checking requires access to relevant evidence that can be used for verification. In sub-editing, the source of this evidence needs to be constantly up to date, since the facts needing verification may be breaking news. Web search is an ideal source of evidence for this task. A number of conditional query generation methods are used to produce relevant evidence from web search engines, based on the claims requiring verification. An approach that combines the outputs from different methods is explored and results in increased performance. 

In order to streamline the process of fact-checking, it is not enough to present sources of evidence to the user. The precise spans of text that contain the information needed to verify the target claims should be identified. A method of ranking evidence spans is presented which leverages transfer learning. Results are improved through the use of an intermediate training task, in comparison to training exclusively on the target task data.


File Version

  • Published version



Department affiliated with

  • Informatics Theses

Qualification level

  • doctoral

Qualification name

  • phd


  • eng


University of Sussex

Full text available

  • Yes

Usage metrics

    University of Sussex (Theses)


    No categories selected