Itre.cis.upenn.edu

Mechanisms for gradual language change

2014-02-09

A few years ago, I wrote about a presentation by Bridget Jankowski on the trend towards increasing use of 's as opposed to of, in phrases like "the government's responsibility" vs. "the responsibility of the government". My post was "The genitive of lifeless things", 10/11/2009, and the slides from her talk are here.

I was reminded of this recently, while looking at usage changes in State of the Union messages over the centuries. Apostrophe-s has seen a recent radical increase in SOTU frequency, reflecting in amplified form a more gradual increase in the English language as a whole. Such gradual, long-term trends are a puzzle: why and how do linguistic changes keep going for several centuries in the same direction, as they often do? You could ask the same question about other cultural changes, I guess, but for linguistic features that are preserved in the written form of a language with a textual history, like English, we have quantitative evidence over hundreds of years.

In the case of gradual genetic changes, there's a clear explanation in terms of gene-pool proportions driven by a fitness gradient. But in the case of most linguistic changes, it's not obvious what the analogue to "fitness" is, or whether a concept like fitness is even relevant, unless it's defined circularly as whatever it is that causes the population frequency of a feature to increase.

I don't have a general answer to this question. But in the course of a dinner conversation the other day, Gareth Roberts and I came up with a couple of new (at least to me) ideas that might work for the on-going increase in s-genitive frequency, and for some other cases as well. The general idea is to combine a simple "linear learner" (which adjusts internal probability estimates based on current experience) with two additional kinds of influence: first, effects that bias the learner to pay a bit more attention to certain inputs; and second, forces that exert a small biasing effect on probability estimates due to weak analogies across constructions.

Let's start by establishing that there's something here to explain. This plot shows the decade-by-decade change in the SOTU messages and in the Corpus of Historical American English:

These changes are a mixture of s-contractions ("That's why I believe…") and s-genitives ("America's graduation rate"). But both constructions have been getting more frequent in written English – we can isolate the s-genitive almost completely by searching COHA for patterns like

children 's [n*] vs. [n*] of children

where [n*] means "any noun" — and the relative frequency of the s-genitive forms has more than doubled over the past 150 years or so:

We see a similar — or even larger — increase for patterns like

china 's [n*] vs. [n*] of china

where contraction is possible but rare (in a random sample of 100 instances of this pattern from 2010-2012, none were s-contractions):

And similarly for patterns like

god 's [n*] vs. [n*] of god

This change is part of a much older story — of-phrases and s-genitives have been waxing and waning in English for more than a millennium. According to Benedikt Szmrecsanyi et al., "Culturally conditioned language change? A multi-variate analysis of genitive constructions in ARCHER", 2013:

Historically speaking, the of-genitive is of course the incoming form, which appeared during the ninth century. [...] [T]he inflected genitive vastly outnumbered the periphrasis with of up until the twelfth century. In the Middle English period, we begin to witness “a strong tendency to replace the inflectional genitive by periphrastic constructions, above all by periphrasis with the preposition of”. The Early Modern English period, however, sees a revival of the s-genitive, “against all odds”. [...] [T]he s-genitive is comparatively – and increasingly – popular in Present-Day English, especially American English ….

Here are some numbers illustrating this process, from Charles C. Fries, "On the Development of the Structural Use of Word-Order in Modern English", Language 1940:

The "periphrastic genitive" is what we've called the "of-genitive" — the other cases are inflected genitives, the ancestors in some sense of the modern s-genitive. In Old English, these followed the head noun about as often as they preceded it. During the transition to Middle English, the post-head genitives died out and the pre-head genitives replaced them, only to be replaced in their turn by the of-genitives, except in the case of human possessors. But then, according to Anette Rosenbach ("Emerging variation: determiner genitives and noun modifiers in English", English Language and Linguistics 2007), "s-genitives have been shown to become more frequent from about late Middle/early Modern English onwards", due to a gradual increase in the frequency of s-genitives with collectives and inanimates:

In "A correlate of animacy", 9/27/2008, I sketched a crude and somewhat tongue-in-cheek picture of the gradient animacy relationship in modern web texts:

__'s

of __

ratio

Giuliani

1.14M

140K

8.14

McCain

23.6M

4.42M

5.34

Clinton

11.6M

2.81M

4.13

Obama

26M

7.6M

3.42

Apple

22.6M

9.39M

2.41

IBM

6.97M

4.03M

1.73

Microsoft

35.5M

21.3M

1.67

Google

17M

13.4M

1.27

America

113M

131M

0.863

Canada

26.8M

60.5M

0.443

Thailand

3.96M

11.8M

0.336

England

10.9M

48M

0.227

Belgium

799K

6.31M

0.127

lithium

60.7K

1.73M

0.035

arsenic

21.7K

1.19M

0.018

silicon

50.9K

5.93M

0.0086

hydrogen

45.3K

9.44M

0.0048

cadmium

4.01K

2.2M

0.0018

There's a more serious animacy hierarchy in Annete Rosenbach, "Animacy and grammatical variation", Lingua 2008:

human

animal

collective

temporal

locative

inanimate

the boy's bike

the dog's collar

the company's director

Monday's mail

London's suburbs

the building's door

There's also an effect of length, with longer possessor-phrases preferring the of-genitive. And I'm glossing over many syntactic, semantic, and lexical issues here: for example, "women's groups" and "groups of women" are not simply alternative ways of saying the same thing. But this post is already way too long. So trust me, doing a more careful job of characterizing the underlying phenomena leaves us faced with the same fact, which is that s-genitives declined relative to of-genitives for several hundred years, and then turned around and increased for few hundred years.

Why? One possible source of gradual changes in text statistics might be contact effects — we start with two different languages or language varieties, and patterns from one source gradually leak into the other. One plausible general case of this type would be gradually diminishing diglossia, where vernacular patterns gradually leak into the formal written language. This seems to me to be a plausible account of the increase over time in the frequency of contractions in English text — see e.g. "True Grit isn't true", 12/29/2010, for some relevant background.

There are clearly some genre differences with respect to English genitive choice — but in the end I don't believe that this is a plausible place to look for the forces driving the long-term trends.

In the 1940s, Otto Jespersen noted that poetic language tended to use more s-genitives, apparently due to rhetorical personification (A Modern English Grammar on Historical Principles: Volume 7, 1949, p. 346):

In poetry and in higher literary style the gen. of lifeless things is used in many cases where of would be used in ordinary speech; the gen. here conveys more or less a notion of personification [...]

For example, he quotes Elizabeth Barrett Browning "at poetry's divine first finger-touch". But 19th-century poetic diction is surely not where we're going to find the forces driving long-term changes in English genitive choices.

Jespersen also notes a tendency for quite prosaic text to go in the same direction, if not by exactly the same route:

During the last few decades the genitive of lifeless things has been gaining ground in writing (especially among journalists): in instances like the following the of-constructions would be more natural and colloquially the only one possible:

His examples include several where the s-genitive now seems entirely colloquial to me, at least in their choice of genitive form:

the rapidity of the heart's action

a glass knob was the door's sole fitting

to affect a book's good or evil fortune

There's some quantitative evidence for Jespersen's remark about journalism in Bridget Jankowski's work, which shows that s-genitive proportion has increased over time in two corpora of Canadian English text, with Maclean's (a magazine) always ahead of the Hansards (parliamentary proceedings):

A comparison of s-genitive percentages for various head nouns, across COCA registers and the LDC's archive of conversational telephone speech, shows a somewhat similar picture:

Spoken

Fiction

Magazine

Newspapers

Academic

LDC CTS

"children"

63.9%

53.3%

71.1%

76.9%

53.7%

68.6%

"women"

51.0%

56.5%

62.6%

72.1%

55.8%

55.4%

"men"

46.6%

46.0%

53.6%

72.9%

42.6%

51.2%

"china"

45.3%

20.6%

48.8%

59.5%

49.8%

62.3%

"russia"

39.5%

17.9%

53.5%

55.4%

59.1%

56.8%

But if journalism has somehow been driving other varieties, this simply raises the question of what forces have been driving the journalists. And the vernacular doesn't seem to be different enough from other registers to be the source of the long-term trends.

So what could be the mechanism?

Whatever else is happening, we can assume that individuals are adjusting some internal probability-like "belief" about the distribution of forms, which derives from their linguistic experience and also governs their linguistic behavior. One relevant source of evidence for this process can be found in the literature on syntactic priming, e.g. Martin Pickering and Holly Branigan, "Syntactic Priming in Language Production", Trends in Cognitive Science 1999.

In this situation, any bias that causes certain cases to have a somewhat larger impact, or to fade a bit more gradually, and to feed more strongly from short-time (priming-like) effects into long-term (baseline) values, could potentially drive a gradual long-term change in a speech community's distribution of probability estimates.

Thus in the history of s-genitive increase over the past few hundred years, we might hypothesize that animate references are on average more salient than inanimate ones; and therefore that animate genitives have a slightly greater impact on the relevant underlying constructional variable(s); and therefore that the perceived proportion of s-genitives tends to be slightly over-estimated, leading in turn to a slight increase in production; and recursively onwards.

A similar trend might also arise due to affinities across constructions, as Fries (1940) suggests:

The progressive fixing of the word-order pattern for modification can be illustrated by the facts concerning the position of the inflected genitive modifying a noun. Adjectival in its function, the inflected adnominal genitive in Old English appears, like the adjective, either before or after the noun it modifies. [...]

Before the end of the 13th century the post-positive inflected genitive has completely disappeared. By this time the general word-order pattern to express the direction of modification has become well established: single word modifiers of the noun or adjective class preceding the nouns they modify remain in that position, whereas single word modifiers in other positions are not so kept.

If possessive constructions are treated as a form of modification, then ever since modifiers came to precede heads in English, maybe every adjective+noun or noun+noun modifier/head sequence has exerted a small force on speakers' internal estimates of s-genitive proportions.

To make these ideas plausible, we'd have to start by showing that the right kind of thing happens in simulations, and then look for experimental as a well as historical evidence to support the assumptions involved. It's possible that someone has already done this, or part of it — relevant references will be appreciated.

Update — A relevant reference has already arrived: Catherine O'Connor, Joan Maling & Barbora Skarebela, "Nominal categories and the expression of possession", in Kersi Börjars et al., Eds., Morphosyntactic Categories and the Expression of Possession, 2013. The abstract:

In this cross-linguistic study we present parallels between (a) the stochastic patterns found in corpus studies of English prenominal possessives, and (b) the rule-governed, categorical features of a highly constrained prenominal possessive construction found in some Germanic, Slavic, and Romance languages. The well-known English tendency for prenominal possessor NPs to be low-weight, animate, and discourse-old or highly accessible corresponds to categorical requirements in what we call the Monolexemic Possessor Construction (MLP). This construction is recognizable by its pre-nominal, one-word, animate possessor that is highly accessible in the discourse context. We identify an accessibility hierarchy of nominal categories in which the MLP can be expressed. This hierarchy is consistent with all 17 languages with MLPs we have found. We show that this accessibility hierarchy (pronoun< proper noun< kinship term< common noun) is a function of the intrinsic discourse-pragmatic features of these nominal categories. While the categorical restriction to pronoun and proper noun possessors in Icelandic, German, and Russian may be largely grammaticized, we show that the discourse-pragmatic constraint is recognizably active in Czech and Bosnian/Croatian/Serbian. These results complement studies that attempt to understand how language structure responds to communicative forces and processing constraints.