Master's thesis: Application of Convolutional networks in the Multiple Hypothesis Testing

UC3M master thesis in Statistics for Data Science

Abstract

Multiple Hypothesis testing consists of a series of statistical procedures for solving hypothesis tests on high-dimensional data marginally. Several approaches have been developed for dealing with this statistical problem. Classical methods consist of defining and controlling a specific error rate. These methods usually rely on the p-values for collecting the evidence against the null hypothesis. Other alternatives have also been developed under a semi-supervised approach. In this case, we solve the large-scale testing using a null train sampling collected via endogenous or exogenous mechanisms. In this project, we combine both approaches. Employing simulations, we explore the possibility of handling cases where the user knows and controls the ground truth for solving multiple hypothesis testing problems in a supervised framework. Starting with calibrated p-values under the null hypothesis, we represent the p-values in terms of odds, converting them into lower bounds of Bayes Factors. Additionally, we take a step further, creating a matrix of the relative evidence among tests as the previously ordered minimum bounds quotients. This matrix representation is considered analogous to an image. With these ingredients, we train Convolutional Neural Networks to determine if this framework can detect the cases where the null hypothesis is rejected. We explore the ability of the CNNs to correctly classifying the hypothesis based on two primary examples. First, we study the efficiency of a diet applied to a female mice sample under several scenarios. Then, we explore the mean difference of two independent populations sampling from the normal distribution.

Dedicatory

In memoriam to my father Jorge Conejo Solis. (1959-2021). Requiescat in Pace.

Avatar
Cesar Conejo Villalobos
Graduate Student/Data Scientist

My research interests include anomaly detection, imbalanced data, and fraud detection.