A parametrized approach for linear regression of interval data

Citation data:

Knowledge-Based Systems, ISSN: 0950-7051, Vol: 131, Page: 149-159

Publication Year:
2017
Usage 63
Abstract Views 41
Link-outs 22
Captures 2
Readers 2
Social Media 49
Shares, Likes & Comments 49
Citations 1
Citation Indexes 1
DOI:
10.1016/j.knosys.2017.06.012
Author(s):
Leandro C. Souza; Renata M.C.R. Souza; GetĂșlio J.A. Amaral; Telmo M. Silva Filho
Publisher(s):
Elsevier BV
Tags:
Computer Science; Business, Management and Accounting; Decision Sciences
article description
Interval symbolic data is a complex data type that can often be obtained by summarizing large datasets. All existing linear regression approaches for interval data use certain fixed reference points to model intervals, such as midpoints, ranges and lower and upper bounds. This is a limitation, because different datasets might be better represented by different reference points. In this paper, we propose a new method for extracting knowledge from interval data. Our parametrized approach automatically extracts the best reference points from the regressor variables. These reference points are then used to build two linear regressions: one for the lower bounds of the response variable and another for its upper bounds. Before the regressions are applied, we compute a criterion to verify the mathematical coherence of predicted values. Mathematical coherence means that the upper bounds are greater than the lower bounds. If the criterion shows that the coherence is not guaranteed, we suggest the use of a novel interval Box-Cox transformation of the response variable. Experimental evaluations with synthetic and real interval datasets illustrate the advantages and the usefulness of the proposed method to perform interval linear regression.