Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval

Citation DataTransactions of the Association for Computational Linguistics, ISSN: 2307-387X, Vol: 12, Page: 1197-1213

Publication Year2024

0
Citations
0
Usage
1
Captures
1
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Captures
1
- Readers
  1
Mentions
1
- News Mentions
  1

Most Recent News

Reports Outline Computational Linguistics Findings from Tel Aviv University (Retrieval-pretrained Transformer: Long-range Language Modeling With Self-retrieval)

October 30, 2024
Middle East Daily

2024 OCT 30 (NewsRx) -- By a News Reporter-Staff News Editor at Middle East Daily -- Research findings on Linguistics - Computational Linguistics are discussed

Article Description

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added post-hoc to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch and applying it to the task of modeling long texts. Given a recently generated text chunk in a long document, the LM computes query representations, which are then used to retrieve earlier chunks in the document, located potentially tens of thousands of tokens before. Information from retrieved chunks is fused into the LM representations to predict the next target chunk. We train the retriever component with a semantic objective, where the goal is to retrieve chunks that increase the probability of the next chunk, according to a reference LM. We evaluate RPT on four long-range language modeling tasks, spanning books, code, and mathematical writing, and demonstrate that RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.

Bibliographic Details

DOI10.1162/tacl_a_00693

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85206360071&origin=inward; http://dx.doi.org/10.1162/tacl_a_00693; https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00693/124629/Retrieval-Pretrained-Transformer-Long-range

AUTHOR(S)

Ohad Rubin; Jonathan Berant

PUBLISHER(S)

MIT Press

TAG(S)

Social Sciences; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know