Towards a Scientific Language Processing Model
2023
- 50Usage
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Usage50
- Abstract Views50
Artifact Description
Natural Language Processing is an effective tool for analyzing large volumes of text effectively. However, most scientific articles contain sophisticated language that can be difficult to understand effectively and quickly. To expedite this, I tuned a model that can quickly classify abstract datasets about scientific topics into specific subcategories. Using the ArXiv corpus with over 2.2 million abstracts, I created a dataset of climate change articles, on which I ran pretrained HuggingFace models. Using observational and quantitative data (ROUGE, Cosine SImilarity, etc.), I tuned the parameters of various keyword extraction models and analyzed the keyword frequency of the dataset. Then, using the BERTopic model with various embedding techniques (SentenceTranformers, spaCy, etc.), I classified the dataset into clusters which could be individually analyzed. I used abstractive and extractive summarization models on each cluster to concisely describe the general progress of particular climate change topics. Using dynamic topic modeling, I then plotted the prevalence of different topics over time, which provided insight into the interest in climate change topics over the past decade. This weakly-supervised algorithm allows analysts and researchers to quickly derive general conclusions about specific scientific topics and visualize their relevance in the scientific community over time.
Bibliographic Details
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know