CoCoScore: Context-aware co-occurrence scoring for text mining applications using distant supervision

Citation DataBioinformatics, ISSN: 1460-2059, Vol: 36, Issue: 1, Page: 264-271

Publication Year2020

17
Citations
0
Usage
60
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
17
- Citation Indexes
  17
Captures
60
- Readers
  60

Article Description

Motivation: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence. Results: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications.

Bibliographic Details

DOI10.1093/bioinformatics/btz490

PMID31199464

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85077791150&origin=inward; http://dx.doi.org/10.1093/bioinformatics/btz490; http://www.ncbi.nlm.nih.gov/pubmed/31199464; https://academic.oup.com/bioinformatics/article/36/1/264/5519116; https://dx.doi.org/10.1093/bioinformatics/btz490

AUTHOR(S)

Alexander Junge; Lars Juhl Jensen; Jonathan Wren

PUBLISHER(S)

Oxford University Press (OUP)

TAG(S)

Mathematics; Biochemistry, Genetics and Molecular Biology; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know