Toggle Main Menu Toggle Search

Open Access padlockePrints

Using electronic corpora in historical dialectology research: the problem of document length variation

Lookup NU author(s): Dr Hermann Moisl

Downloads


Licence

This is the authors' accepted manuscript of a book chapter that has been published in its final definitive form by Peter Lang, 2009.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional philological methods for search and interpretation of data have been overwhelmed by volume, and a variety of computational methods have been developed in an attempt to make the deluge tractable. These developments have clear implications for corpus-based linguistics in general, and for corpus-based study of historical dialectology in particular: as more and larger historical text corpora become available, effective analysis of them will increasingly be tractable only by adapting the interpretative methods developed by the statistical, information retrieval, pattern recognition, and related communities. To use such analytical methods effectively, however, issues that arise with respect to the abstraction of data from corpora have to be understood. This paper addresses an issue that has a fundamental bearing on the validity of analytical results based on such data: variation in document length. The discussion is in four main parts. The first part shows how a particular class of computational methods, exploratory multivariate analysis, can be used in historical dialectology research, the second explains why variation in document length can be a problem in such analysis, the third proposes document length normalization as a solution to that problem, and the fourth points out some difficulties associated with document length normalization.


Publication metadata

Author(s): Moisl HL

Editor(s): Dossena, M; Lass, R

Publication type: Book Chapter

Publication status: Published

Book Title: Studies in English and European Historical Dialectology

Year: 2009

Pages: 67-90

Print publication date: 20/05/2009

Series Title: Studies in Language and Communication

Publisher: Peter Lang

Place Published: Bern

URL: https://www.peterlang.com/view/title/34600

Library holdings: Search Newcastle University Library for this item

ISBN: 9783034300247


Share