Clustering validation by distribution hypothesis learning

Citation DataStatistics and Computing, ISSN: 1573-1375, Vol: 34, Issue: 6

Publication Year2024

0
Citations
0
Usage
0
Captures
1
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Mentions
1
- News Mentions
  1

Most Recent News

New Data from National Scientific and Technical Research Council (CONICET) Illuminate Findings in Statistics and Computing (Clustering Validation By Distribution Hypothesis Learning)

December 4, 2024
Computer News Today

2024 DEC 04 (NewsRx) -- By a News Reporter-Staff News Editor at Computer News Today -- Data detailed on Statistics and Computing have been presented.

Article Description

We present a new clustering validation technique named: “Hypothesis Learning”. We build our method on three concepts: (1) clustering cohesion, (2) clustering dispersion and, (3) hypothesis quality. The first two notions focus on individual cluster quality. We measure them using a classifier estimating the tightness and separation as a likelihood. The third notion evaluates the complexity of learning the clustering partition. Similar to cohesion and dispersion, we get a likelihood value. Next, we aggregate these three measures to find a single index reporting clustering quality. Previous methods from the literature have already used supervised and unsupervised algorithms and stability concepts to validate clustering solutions. Our motivation is not only to improve these methods but to use learning algorithms in a novel manner to learn key clustering concepts such as cohesion and dispersion. Furthermore, we include a technical discussion on how to regularize a classifier to handle overfit, thus explaining the symbiosis between supervised and unsupervised algorithms. In our experimental setup, we tested “Hypothesis Learning” with a fast classifier, K Nearest Neighbour (KNN). However, in the discussion of the method, we explore other classifiers like CART and Random Forest. The experimental results compare our approach with a similar method and many other well-known clustering indexes.

Bibliographic Details

DOI10.1007/s11222-024-10511-8

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85206245075&origin=inward; http://dx.doi.org/10.1007/s11222-024-10511-8; https://link.springer.com/10.1007/s11222-024-10511-8; https://dx.doi.org/10.1007/s11222-024-10511-8; https://link.springer.com/article/10.1007/s11222-024-10511-8

AUTHOR(S)

Ariel E. Bayá; Mónica G. Larese

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Mathematics; Decision Sciences; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know