Multi-modal alignment via hyperbolic geometry

Citation DataPage: 1-48

Publication Year2024

0
Citations
41
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
41
- Downloads
  32
- Abstract Views
  9

Thesis / Dissertation Description

Strong capabilities of generalization to unseen domains are vital for deep neural networks. While existing methods have shown promising results without source domain access, they mostly rely on models that are extensively pre-trained on source domains or overlook the intricate hierarchical structures inherent in visual and textual features. These limitations may have bad impacts on performances, especially on datasets with many classes. To overcome this, in this paper we propose a novel approach that projects the model onto hyperbolic geometry and employs geometric optimal transport to align cross-modal features in an unsupervised manner. Unlike Euclidean geometry, hyperbolic geometry is characterized by hierarchical data structures, which can facilitate understanding diverse classes. To fully capture hierarchical information from text, we enrich the model with finegrained concepts extracted from WordNet, enhancing its understanding of diverse classes. Extensive experiments on standard benchmarks demonstrate the superior performance of our method compared to strong baselines.

Bibliographic Details

REPOSITORY URLhttps://ink.library.smu.edu.sg/etd_coll/651

URL IDhttps://ink.library.smu.edu.sg/etd_coll/651; https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1649&context=etd_coll

AUTHOR(S)

Suyu LIU

PUBLISHER(S)

Singapore Management University

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know