PlumX Metrics
Embed PlumX Metrics

Collective Human Opinions in Semantic Textual Similarity

Transactions of the Association for Computational Linguistics, ISSN: 2307-387X, Vol: 11, Page: 997-1013
2023
  • 5
    Citations
  • 205
    Usage
  • 8
    Captures
  • 2
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Most Recent News

PRC-UAE Collaboration and US Technology Transfer Concerns in Abu Dhabi

Executive Summary: The Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) poses a high risk of technology transfer due to its deep connections to the

Article Description

Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know