Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

Citation DataApplied Soft Computing, ISSN: 1568-4946, Vol: 111, Page: 107692

Publication Year2021

31
Citations
0
Usage
79
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
31
- Citation Indexes
  31
Captures
79
- Readers
  79

Article Description

A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model’s accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.

Bibliographic Details

DOI10.1016/j.asoc.2021.107692

PMID34276263

URL IDhttp://www.sciencedirect.com/science/article/pii/S156849462100613X; http://dx.doi.org/10.1016/j.asoc.2021.107692; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85111056484&origin=inward; http://www.ncbi.nlm.nih.gov/pubmed/34276263; https://linkinghub.elsevier.com/retrieve/pii/S156849462100613X; https://dx.doi.org/10.1016/j.asoc.2021.107692

AUTHOR(S)

Calderon-Ramirez, Saul; Yang, Shengxiang; Moemeni, Armaghan; Elizondo, David; Colreavy-Donnelly, Simon; Chavarría-Estrada, Luis Fernando; Molina-Cabello, Miguel A

PUBLISHER(S)

Elsevier BV

TAG(S)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know