Statistical Industry Classification

Citation data:

SSRN Electronic Journal

Publication Year:
2016
Usage 10187
Abstract Views 7226
Downloads 2956
Clicks 5
Captures 30
Readers 30
Mentions 1
News Mentions 1
Social Media 30
Tweets 27
Shares, Likes & Comments 3
Citations 1
Citation Indexes 1
Ratings
SSRN
SSRN Id:
2802753
DOI:
10.2139/ssrn.2802753
Author(s):
Kakushadze, Zura ; Yu, Willie
Publisher(s):
Elsevier BV
Tags:
industry classification; clustering; cluster numbers; machine learning; statistical risk models; industry risk factors; optimization; regression; mean-reversion; correlation matrix; factor loadings; principal components; hierarchical agglomerative clustering; k-means; statistical methods; multilevel
Most Recent Tweet View All Tweets
Most Recent News Mention
article description
We give complete algorithms and source code for constructing (multilevel) statistical industry classifications, including methods for fixing the number of clusters at each level (and the number of levels). Under the hood there are clustering algorithms (e.g., k-means). However, what should we cluster? Correlations? Returns? The answer turns out to be neither and our backtests suggest that these details make a sizable difference. We also give an algorithm and source code for building "hybrid" industry classifications by improving off-the-shelf "fundamental" industry classifications by applying our statistical industry classification methods to them. The presentation is intended to be pedagogical and geared toward practical applications in quantitative trading.