Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Chen, Wei
logistic regression; shrinkage method; rheumatoid arthritis clinical trial data
In this paper, a rheumatoid arthritis (RA) medicine clinical dataset with an ordinal response is selected to study this new medicine. In the dataset, there are four features, sex, age,treatment, and preliminary. Sex is a binary categorical variable with 1 indicates male, and 0 indicates female. Age is the numerical age of the patients. And treatment is a binary categorical variable with 1 indicates has RA, and 0 indicates does not have RA. And preliminary is a five class categorical variable indicates the patient’s RA severity status before taking the medication. The response Y is 5 class ordinal variable shows the severity of patient’s RA severity after taking the medication.The primary aim of this study is to determine what factors play a significant role in determine the response after taking the medicine. First, cumulative logistic regression is applied to the dataset to examine the effect of various factors on ordinal response. Secondly, the ordinal response is categorized into two classes. Then logistic regression is conducted to the RA dataset to see if the variable selection would be different. Moreover, the shrinkage methods, elastic net and lasso are used to make a variable selection on the RA dataset of two-class response for the purpose of adding penalization to increase the model’s robustness.The four model results were compared at the end of the paper. From the comparison result, logistic regression has a better performance on variable selection than the other three approaches based on P-value.