Ranking comments: An Entropy-based Method with Word Embedding Clustering

Publication Year2020

0
Citations
775
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
775
- Downloads
  688
- Abstract Views
  87

Article Description

Automatically ranking comments by their relevance plays an important role in text mining and text summarization area. In this thesis, firstly, we introduce a new text digitalization method: the bag of word clusters model. Unlike the traditional bag of words model that treats each word as an independent item, we group semantic-related words as clusters using pre-trained word2vec word embeddings and represent each comment as a distribution of word clusters. This method can extract both semantic and statistical information from texts. Next, we propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. The “ideal” comment is the maximum general entropy comment with respect to the global word cluster distribution. The intuition is that the “ideal” comment highlights aspects of a product that many other comments frequently mention. Therefore, it can be regarded as a standard to judge a comment’s relevance to this product. At last, we analyze our algorithm’s performance on a real Amazon product.

Bibliographic Details

REPOSITORY URLhttps://ir.lib.uwo.ca/etd/7300

URL IDhttps://ir.lib.uwo.ca/etd/7300; https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=9798&context=etd

AUTHOR(S)

Yuyang Zhang

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know