This paper develops a corpus-
In the present study, all occurrences of the search string "I live in" were extracted from the American National Corpus. The word forms immediately to the right of the search string were extracted from the concordance and ranked according to frequency. Table 1 shows the ten items with the highest token frequencies:
Table 1: America's 10 largest cities
CITY | TOKENS |
New York |
4223 |
Los Angeles |
3986 |
Harmony |
3669 |
Chicago |
3478 |
London |
3300 |
Houston |
2906 |
Philadelphia |
2495 |
Tokyo |
2335 |
Phoenix |
1399 |
The Past |
599 |
The table reveals a number of surprising results. Besides previously unknown cities, also a number of cities previously believed to lie elsewhere are actually found in North America. It will be up to geographers to face the challenge and actually put these on the American map.
To obtain demographic information about these cities, the token
numbers must be related to the numbers of inhabitants. To calculate
these figures, it is necessary to know the exact number of inhabitants
of one city. This number, divided by its token frequency, gives us the
token-
1888.4
With this figure in mind, I believe that corpus-