PlumX Metrics
Embed PlumX Metrics

Cleaning out web spam by entropy-based cascade outlier detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN: 1611-3349, Vol: 10439 LNCS, Page: 232-246
2017
  • 2
    Citations
  • 0
    Usage
  • 9
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Conference Paper Description

Web spam refers to those Web pages where tricks are played to mislead search engines to increase their rank than they really deserved. It causes huge damages on e-commerce and Web users, and threats the Web security. Combating Web spam is an urgent task. In this paper, Web quality and semantic measurements are integrated with the content and link features to construct a more representative characteristic set. A cascade detection mechanism based on entropy-based outlier mining (EOM) algorithm is proposed. The mechanism consists of three stages with different feature groups. The experiments on WEBSPAM-UK2007 show that the quality and semantic features can effectively improve the detection, and the EOM algorithm outperforms many classic classification algorithms under the circumstance of data unbalanced. The cascade detection mechanism can clean out more spam.

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know