University of Sussex
Browse

Towards generalisation in machine learning under subpopulation shifts: active, passive, and spectral bias perspectives

Download (41.52 MB)
thesis
posted on 2025-07-03, 19:48 authored by Yeat Jeng NgYeat Jeng Ng

Subpopulation shift refers to a type of distributional shift where the performance of the machine learning model degrades on specific subpopulations in environments that differ from the training data. This issue is of critical concern in real-world applications, as it can lead to unfairness and discrimination in machine learning models. In this thesis, I investigate the problem of subpopulation shift from multiple perspectives. First, I analyse a specific form of subpopulation shift known as spurious correlation, where certain irrelevant features in the training data are correlated with target labels. In such cases, models will tend to rely excessively on these spurious features, resulting in poor generalisation in the test environment. I explore this issue using the deep learning framework of neural tangent kernel and identify that the disparity in complexity between spurious and core features is a key factor contributing to poor generalisation. Based on this observation, I proposed a method that adjusts the spectral properties of neural networks to mitigate bias, without requiring explicit knowledge of spurious attributes. Second, I recognise that some state-of-the-art methods addressing subpopulation shifts share underlying principles with active learning. In response, I propose two active learning algorithms designed to acquire new samples that help to debias the existing training biased training data. The first algorithm focuses on minimizing the distributional gap between the training and test data, requiring annotations of spurious or sensitive attributes. In the second algorithm, this requirement is removed by leveraging the training dynamics to identify informative samples that can help reduce bias in the existing labelled pool.

History

File Version

  • Published version

Pages

215

Department affiliated with

  • Informatics Theses

Qualification level

  • doctoral

Qualification name

  • phd

Language

  • eng

Institution

University of Sussex

Full text available

  • Yes

Supervisor

Novi Quadrianto and Viktoriia Sharmanska

Usage metrics

    University of Sussex (Theses)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC