SciND: a new triplet-based dataset for scientific novelty detection via knowledge graphs

Citation DataInternational Journal on Digital Libraries, ISSN: 1432-1300, Vol: 25, Issue: 4, Page: 639-659

Publication Year2024

2
Citations
0
Usage
5
Captures
0
Mentions
9
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
2
- Citation Indexes
  2
Captures
5
- Readers
  5
Social Media
9
- Shares, Likes & Comments
  9

Article Description

Detecting texts that contain semantic-level new information is not straightforward. The problem becomes more challenging for research articles. Over the years, many datasets and techniques have been developed to attempt automatic novelty detection. However, the majority of the existing textual novelty detection investigations are targeted toward general domains like newswire. A comprehensive dataset for scientific novelty detection is not available in the literature. In this paper, we present a new triplet-based corpus (SciND) for scientific novelty detection from research articles via knowledge graphs. The proposed dataset consists of three types of triples (i) triplet for the knowledge graph, (ii) novel triplets, and (iii) non-novel triplets. We build a scientific knowledge graph for research articles using triplets across several natural language processing (NLP) domains and extract novel triplets from the paper published in the year 2021. For the non-novel articles, we use blog post summaries of the research articles. Our knowledge graph is domain-specific. We build the knowledge graph for seven NLP domains. We further use a feature-based novelty detection scheme from the research articles as a baseline. Moreover, we show the applicability of our proposed dataset using our baseline novelty detection algorithm. Our algorithm yields a baseline F1 score of 72%. We show analysis and discuss the future scope using our proposed dataset. To the best of our knowledge, this is the very first dataset for scientific novelty detection via a knowledge graph. We make our codes and dataset publicly available at https://github.com/92Komal/Scientific_Novelty_Detection.

Bibliographic Details

DOI10.1007/s00799-023-00386-x

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85181720672&origin=inward; http://dx.doi.org/10.1007/s00799-023-00386-x; https://link.springer.com/10.1007/s00799-023-00386-x; https://dx.doi.org/10.1007/s00799-023-00386-x; https://link.springer.com/article/10.1007/s00799-023-00386-x

AUTHOR(S)

Komal Gupta; Ammaar Ahmad; Asif Ekbal; Tirthankar Ghosal

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Social Sciences

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know