Browse by author
Lookup NU author(s): Dr Hermann Moisl
This is the authors' accepted manuscript of a book chapter that has been published in its final definitive form by Peter Lang, 2009.
For re-use rights please refer to the publisher's terms and conditions.
The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional philological methods for search and interpretation of data have been overwhelmed by volume, and a variety of computational methods have been developed in an attempt to make the deluge tractable. These developments have clear implications for corpus-based linguistics in general, and for corpus-based study of historical dialectology in particular: as more and larger historical text corpora become available, effective analysis of them will increasingly be tractable only by adapting the interpretative methods developed by the statistical, information retrieval, pattern recognition, and related communities. To use such analytical methods effectively, however, issues that arise with respect to the abstraction of data from corpora have to be understood. This paper addresses an issue that has a fundamental bearing on the validity of analytical results based on such data: variation in document length. The discussion is in four main parts. The first part shows how a particular class of computational methods, exploratory multivariate analysis, can be used in historical dialectology research, the second explains why variation in document length can be a problem in such analysis, the third proposes document length normalization as a solution to that problem, and the fourth points out some difficulties associated with document length normalization.
Author(s): Moisl HL
Editor(s): Dossena, M; Lass, R
Publication type: Book Chapter
Publication status: Published
Book Title: Studies in English and European Historical Dialectology
Year: 2009
Pages: 67-90
Print publication date: 20/05/2009
Series Title: Studies in Language and Communication
Publisher: Peter Lang
Place Published: Bern
URL: https://www.peterlang.com/view/title/34600
Library holdings: Search Newcastle University Library for this item
ISBN: 9783034300247