Reviving the Past: Enhancing Language Models with Historical Text Optimization

Publication Year2024

0
Citations
118
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
118
- Downloads
  78
- Abstract Views
  40

Thesis / Dissertation Description

Recent advancements in Natural Language Processing (NLP) have brought attention to the significant potential that exists for widespread applications of Large Language Models (LLMs). As demands and expectations for LLMs rise, ensuring efficiency and accuracy becomes paramount. Addressing these challenges requires more than just optimizing current techniques; it urges novel approaches to NLP as a whole. This study investigates novel data preprocessing methods designed to enhance LLM performance by mitigating inefficiencies rooted in natural language, particularly by simplifying the complexities presented by historical texts. Utilizing the classical text The Odyssey by Homer, two preprocessing techniques are introduced: tokenization of names and places, and substitution of outdated terms. After optimizing a Long Short-Term Memory (LSTM) network to perform well with the original text, the study examined how each methodology influenced the model's efficiency and precision through the analysis of training time and loss metrics. Tokenization significantly reduced the training time of the model by simplifying complex names and places, albeit with a slight degradation of output quality. Substitution of outdated terms not only decreased the training time of the model but also improved the model’s comprehension. This study successfully demonstrated novel preprocessing methods for improving the efficiency of LLMs, providing insight for future research and contributing to the ongoing mitigation of NLP challenges.

Bibliographic Details

REPOSITORY URLhttps://aquila.usm.edu/honors_theses/955

URL IDhttps://aquila.usm.edu/honors_theses/955; https://aquila.usm.edu/cgi/viewcontent.cgi?article=1973&context=honors_theses

AUTHOR(S)

Heather D. Broome

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know