Emphasizing personal information for Author Profiling: New approaches for term selection and weighting

Citation data:

Knowledge-Based Systems, ISSN: 0950-7051, Vol: 145, Page: 169-181

Publication Year:
2018
Captures 15
Readers 15
Social Media 89
Shares, Likes & Comments 86
Tweets 3
Citations 3
Citation Indexes 3
DOI:
10.1016/j.knosys.2018.01.014
Author(s):
Rosa María Ortega-Mendoza; A. Pastor López-Monroy; Anilu Franco-Arcega; Manuel Montes-y-Gómez
Publisher(s):
Elsevier BV
Tags:
Computer Science; Business, Management and Accounting; Decision Sciences
Most Recent Tweet View All Tweets
article description
The Author Profiling (AP) task aims to predict specific profile characteristics of authors by analyzing their written documents. Nowadays, its relevance has been highlighted thanks to several applications in computer forensics, security and marketing. Most previous contributions in AP have been devoted to determine a suitable set of features to model the writing profile of authors. However, in social media this task is challenging due to the informal communication. In this regard, we present a novel approach, which considers that terms located in phrases exposing personal information have a special value for discriminating the author’s profile. The aim of this research work is to emphasize the value of such personal phrases by means of two new proposals: a feature selection method and term weighting scheme, both based on a novel measure called Personal Expression Intensity (PEI) which scores the quantity of personal information revealed by a term. For evaluating the latter ideas, we show experimental results in age and gender prediction of media users on six different collections. Average improvements of 7.34% and 5.76% for age and gender classification were obtained when comparing to the best result from state-of-the-art, indicating that personal phrases play a key role for the AP task by means of selecting and weighting terms.