TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES

Publication Year2014

0
Citations
287
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
287
- Abstract Views
  179
- Downloads
  108

Artifact Description

This work is aimed at finding potential Simpson´s paradoxes in Big Data. Simpson´s paradox (SP) arises when choosing the level of data aggregation for causal inference. It describes the phenomenon where the direction of a cause on an effect is reversed when examining the aggregate vs. disaggregates of a sample or population. The practical decision making dilemma that SP raises is which level of data aggregation presents the right answer. \ \ We propose a tree-based approach for detecting SP in data. Classification and regression trees are popular predictive algorithms that capture relationships between an outcome and set of inputs. They are used for record-level predictions and for variable selection. We introduce a novel usage for a cause-and-effect scenario with potential confounding variables. A tree is used to capture the relationship between the effect and the set of cause and potential confounders. We show that the tree structure determines whether a paradox is possible. The resulting tree graphically displays potential confounders and the confounding direction, allowing researchers or decision makers identify potential SPs to be further investigated with a causal toolkit.. We illustrate our SP detection approach using real data for both a single confounder and for multiple confounder in a large dataset on Kidney transplant waiting time.

Bibliographic Details

REPOSITORY URLhttps://aisel.aisnet.org/ecis2014/proceedings/track08/2

URL IDhttps://aisel.aisnet.org/ecis2014/proceedings/track08/2; https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1272&context=ecis2014

AUTHOR(S)

Galit Shmueli; Inbal Yahav

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know