SpecGram--A Corpus-Linguistic Approach to Demography--Morten H. Albert

A Corpus-Linguistic Approach to Demography

Morten H. Albert
Rose College

This paper develops a corpus-linguistic approach to the demography of North American cities. In a groundbreaking study, Chomsky (1957:17) convincingly showed that it can be proven on linguistic grounds alone that more people live in New York than in Dayton, Ohio. Unfortunately, Chomsky did not go on to develop his corpus methods any further.

In the present study, all occurrences of the search string "I live in" were extracted from the American National Corpus. The word forms immediately to the right of the search string were extracted from the concordance and ranked according to frequency. Table 1 shows the ten items with the highest token frequencies:

Table 1: America's 10 largest cities

CITY TOKENS

New York
4223

Los Angeles
3986

Harmony
3669

Chicago
3478

London
3300

Houston
2906

Philadelphia
2495

Tokyo
2335

Phoenix
1399

The Past
599

The table reveals a number of surprising results. Besides previously unknown cities, also a number of cities previously believed to lie elsewhere are actually found in North America. It will be up to geographers to face the challenge and actually put these on the American map.

To obtain demographic information about these cities, the token numbers must be related to the numbers of inhabitants. To calculate these figures, it is necessary to know the exact number of inhabitants of one city. This number, divided by its token frequency, gives us the token-inhabitant ratio that we need. Chomsky (personal communication) gives an estimate of the population of Dayton, Ohio with 166,179. Divided by its token frequency 88, the corpus-linguistic demographical constant (CLDC) is exactly

1888.4

With this figure in mind, I believe that corpus-based armchair demography has a number of advantages over more traditional methods of statistical demography. Not only is it fast and economical, it also points toward new phenomena that traditional demography did not have any account for.

	Poetry Corner
	Phonemic Color--Tong Shunming
	SpecGram Vol CL, No 1 Contents

CITY	TOKENS
New York	4223
Los Angeles	3986
Harmony	3669
Chicago	3478
London	3300
Houston	2906
Philadelphia	2495
Tokyo	2335
Phoenix	1399
The Past	599

A Corpus-Linguistic Approach to Demography

Morten H. Albert Rose College

Morten H. Albert
Rose College