PlumX Metrics
Embed PlumX Metrics

Outlier detection for questionnaire data in biobanks

International Journal of Epidemiology, ISSN: 1464-3685, Vol: 48, Issue: 4, Page: 1305-1315
2019
  • 10
    Citations
  • 0
    Usage
  • 29
    Captures
  • 0
    Mentions
  • 21
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Citations
    10
  • Captures
    29
  • Social Media
    21
    • Shares, Likes & Comments
      21
      • Facebook
        21

Article Description

Background: Biobanks increasingly collect, process and store omics with more conventional epidemiologic information necessitating considerable effort in data cleaning. An efficient outlier detection method that reduces manual labour is highly desirable. Method: We develop an unsupervised machine-learning method for outlier detection, namely kurPCA, that uses principal component analysis combined with kurtosis to ascertain the existence of outliers. In addition, we propose a novel regression adjustment approach to improve detection, namely the regression adjustment for data by systematic missing patterns (RAMP). Result: Application to epidemiological record data in a large-scale biobank (Tohoku Medical Megabank Organization, Japan) shows that a combination of kurPCA and RAMP effectively detects known errors or inconsistent patterns. Conclusions: We confirm through the results of the simulation and the application that our methods showed good performance. The proposed methods are useful for many practical analysis scenarios.

Bibliographic Details

Sakurai, Rieko; Ueki, Masao; Makino, Satoshi; Hozawa, Atsushi; Kuriyama, Shinichi; Takai-Igarashi, Takako; Kinoshita, Kengo; Yamamoto, Masayuki; Tamiya, Gen

Oxford University Press (OUP)

Medicine

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know