Textual signatures: Identifying text-types using latent semantic analysis to measure the cohesion of text structures
- Citation data:
Natural Language Processing and Text Mining, Page: 107-122
- Publication Year:
- Repository URL:
- Computer Science; text-types; Information Storage and Retrieval; Computing Methodologies; Processor Architectures; Theory of Computation; Information Systems Applications; internet; Administrative Data Processing; Computer Sciences; Psychology; Speech and Rhetorical Studies
book chapter description
Just as a sentence is far more than a mere concatenation of words, a text is far more than a mere concatenation of sentences. Texts contain pertinent information that co-refers across sentences and paragraphs ; texts contain relations between phrases, clauses, and sentences that are often causally linked , , ; and texts that depend on relating a series of chronological events contain temporal features that help the reader to build a coherent representation of the text , . We refer to textual features such as these as cohesive elements, and they occur within paragraphs (locally), across paragraphs (globally), and in forms such as referential, causal, temporal, and structural , , . But cohesive elements, and by consequence cohesion, does not simply feature in a text as dialogues tend to feature in narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not present or absent in a binary or optional sense. Instead, cohesion in text exists on a continuum of presence, which is sometimes indicative of the text-type in question , ,  and sometimes indicative of the audience for which the text was written , . In this chapter, we discuss the nature and importance of cohesion; we demonstrate a computational tool that measures cohesion; and, most importantly, we demonstrate a novel approach to identifying text-types by incorporating contrasting rates of cohesion. © 2007 Springer-Verlag London Limited.