Activist: A New Framework for Dataset Labelling

Citation data:

Conference papers

Publication Year:
2016
Usage 67
Downloads 43
Abstract Views 24
Repository URL:
http://arrow.dit.ie/scschcomcon/199
DOI:
10.21427/d7qk8m
Author(s):
O'Neill, Jack; Delany, Sarah Jane; MacNamee, Brian
Publisher(s):
Dublin Institute of Technology
Tags:
Active Learning; Labelling; Software; Semi-Supervised Learning; Computer Sciences; Computer Engineering
conference paper description
Acquiring labels for large datasets can be a costly and time-consuming process. This has motivated the development of the semi-supervised learning problem domain, which makes use of unlabelled data — in conjunction with a small amount of labelled data — to infer the correct labels of a partially labelled dataset. Active Learning is one of the most successful approaches to semi-supervised learning, and has been shown to reduce the cost and time taken to produce a fully labelled dataset. In this paper we present Activist; a free, online, state-of-the-art platform which leverages active learning techniques to improve the efficiency of dataset labelling. Using a simulated crowd-sourced label gathering scenario on a number of datasets, we show that the Activist software can speed up, and ultimately reduce the cost of label acquisition.