Toggle Main Menu Toggle Search

Open Access padlockePrints

Using electronic corpora to study language variation: the problem of data sparsity

Lookup NU author(s): Dr Hermann Moisl

Downloads


Abstract

As more and larger digital electronic corpora of natural language text appear, effective linguistic analysis of them will increasingly be tractable only by using the computational interpretative methods developed by the statistical, information retrieval, and related communities. To use such analytical methods effectively, however, issues that arise with respect to the abstraction of data from corpora have to be understood. This paper addresses an issue that has a fundamental bearing on the validity of analytical results based on such data: sparsity. The discussion is in three main parts. The first part shows how a particular class of computational methods, exploratory multivariate analysis, can be used in language variation research, the second explains why data sparsity can be a problem in such analysis, and the third outlines a solution.


Publication metadata

Author(s): Moisl HL

Editor(s): Tsiplakou, S; Karyolemu, M; Pavlou, P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: Language Variation: European Perspectives v. 2: Selected Papers Form the 4th International Conference on Language Variation in Europe (ICLlaVE 4)

Year of Conference: 2009

Pages: 169-178

Date deposited: 16/08/2010

Publisher: John Benjamins

Library holdings: Search Newcastle University Library for this item

Series Title: Studies in Language Variation

ISBN: 9789027234858


Share