Y-chromosomal STR haplotype analysis reveals surname-associated strata in the East-German population
Sunday, 25. November 2007, 18:08:14
European Journal of Human Genetics (2006), 1–6
& 2006 Nature Publishing Group All rights reserved 1018-4813/06 $30.00
www.nature.com/ejhg
Uta-Dorothee Immel*,1, Michael Krawczak2, Ju¨rgen Udolph3, Angela Richter4, Heike Rodig5,
Manfred Kleiber1 and Michael Klintschar1
1Department of Legal Medicine, Martin-Luther-University, Halle (Saale), Germany; 2Institute of Medical Informatics and Statistics, Christian-Albrechts-University, Kiel, Germany; 3Institute for Slavistics, University of Leipzig, Leipzig, Germany; 4Institute for Slavistics, Martin-Luther-University, Halle (Saale), Germany; 5Biotype AG, Dresden, Germany
In human populations, the correct historical interpretation of a genetic structure is often hampered by an almost inherent inability to differentiate between ancient and more recent influences upon extant gene pools. One method to trace recent population movements is the analysis of surnames, which, at least in Central Europe, can be thought of as traits ‘linked’ to the Y chromosome. Illegitimacy, extramarital birth and changes of surnames may have substantially obscured this linkage. In order to assess the actual extent of correlation between surnames and Y-chromosomal haplotypes in Central Europe, we typed Y-chromosomal short tandem repeat markers in 419 German males from Halle. These individuals were
subdivided into three groups according to the origin of their respective surname, namely German (G), Slavic (S) or ‘Mixed’ (M). The distribution of the haplotypes was compared by Analysis of Molecular Variance. While the M group was indistinguishable from group G (UST¼_0.0008, P40.5), a highly significant difference (UST¼0.0277, Po0.001) was observed between the S group and the combined GþM group. This surprisingly strong differentiation is comparable to that of European populations of much larger geographic and linguistic difference. In view of the major migration from Slavic countries into Germany in the 19th century, it appears likely that the observed concurrence of Slavic surnames and Y chromosomes is of a recent rather than an early origin. Our results suggest that surnames may provide a simple means to stratify, and thereby to render more efficient, Y-chromosomal analyses of Central Europeans that target more ancient events.
European Journal of Human Genetics advance online publication, 25 January 2006; doi:10.1038/sj.ejhg.5201572
Keywords: Y chromosome; onomastics; surnames; Germans; Slavs
Introduction
The study of surnames, also called ‘the poor man’s
population genetics’,1 has a long tradition in genealogical
research and was extensively used long before the term
‘genetics’ was first coined.2 Indeed, the fact that surnames
are patrilineally inherited in many parts of the world,3–5
including Central Europe,2 implies that names should be of
considerable interest to geneticists too. Surnames in
combination with genetic studies have proved useful for
describing population structures, for example, in France,
Sicily and Netherlands.6–8
However, Y-chromosomal DNA polymorphisms are
ideally suited for studies of male demography themselves,
and the availability of rapidly evolving markers on the
Y chromosome has lately rendered the onomastics of
human surnames an outsider discipline. Thus, the ability
of hypervariable Y-chromosomal short tandem repeats
(Y-STRs) to discriminate even between closely related and
co-localized male populations has been demonstrated
for Germans and Dutch9 for the Baltic populations,10 for
Central England and North Wales,11 and for Poland and
Germany.12 At an even larger scale, the recent identification
of previously unrecognized population strata in the
Y-STR haplotype distribution of more than 12 000 males
from 91 European localities13 has once more highlighted
the usefulness of this approach. Nevertheless, only a
small number of studies have so far addressed the actual
relationship between the distribution of surnames and
Y-chromosomal haplotypes.14 – 17
Surnames vary substantially both between and within
European countries. In Germany, for example, although
the majority of the one million different surnames are
typically German (eg ‘Mu¨ller’, ‘Schmidt’ or ‘Berger’), names
with foreign roots are also abundant. The majority of the
latter are of Slavic origin18 (approximately 20%) and many
of them are easy to recognize by consonant combinations
that are otherwise unfamiliar to the German language.
Examples include the names of the German writer Kurt
Tucholsky and of the second author of this article. In many
cases, however, the foreign origin of a surname may not be
immediately apparent, as is the case, for example, for the
name of the 18th Century play writer Gotthold Ephraim
Lessing.
The patrilineal inheritance of both surnames and
Y chromosomes suggests that different strata of surnames
should correspond to different strata of Y chromosomes.
Since this relationship is likely to have become obscured
not only by mutation but also by illegitimate births and
the change of surnames, quantifying the residual correlation
between the two characteristics would be of both
theoretical and practical relevance. On the one hand,
information about the history of patrilines is useful for the
precise estimation of mutation rates and for the assessment
of migration behaviours. On the other hand, surnames
potentially provide a simple means of stratifying populations
prior to Y-chromosomal analyses that target prehistoric
events, thereby increasing their efficiency through a
reduction in genotyping load. The aim of the present study
was thus to assess the extent to which Y-STR haplotypes of
German males, born and living in the region of Halle
(Saale), are indicative of a German, Slavic or mixed
German-Slavic descent of their surnames.
Materials and methods DNA samples
DNA samples were obtained from 419 German males, born
around Halle (Saale), located in the South-East of Germany
(Figure 1), who identified themselves as Germans. An
additional group of 29 German males were sampled from
the Sorbish minority, a Slavic-speaking community living
in the Lausitz area near the Polish border.19
Y-STR analysis
Eight Y-STR loci were analysed, namely DYS19, DYS385,
DYS389I, DYS389II, DYS390, DYS391, DYS392 and
DYS393. Locus information and PCR primer sequences
can be found in Kayser et al20 or at the Y-STR Haplotype
Reference Database (YHRD) web site (www.yhrd.org). The
YHRD nomenclature was used here in accordance with
recommendations by the International Society of Forensic
Genetics,18 designating Y-STR alleles by the number of
repeats included. DNA was amplified in two multiplex
reactions, following Elmoznino and Prinz.21 Consistent
allele designation and genotyping quality were assured by
the concurrent electrophoretic analysis of sequenced allelic
ladders or sequenced reference DNA samples. PCR products
were analysed by capillary electrophoresis using an ABI 310
Genetic Analyzer (Applied Biosystems, Weiterstadt, Germany)
and the Genotyper software.22 DNA of the Sorbish
males was analysed using the Mentypes Argus Y-MH PCR
Amplification Kit (Biotype, Dresden).
Analysis of surnames
The Halle samples were divided into three subgroups,
according to surname. Two larger groups comprised 195
males with surnames that were definitely German (‘G’) and
185 males with definitely Slavic surnames (‘S’). The third
group contained 39 males with mixed German-Slavic
surnames (‘M’). Samples of 29 Sorbs19 and some 1313
published haplotypes from Polish males13 were used for
comparison. Surname groups were defined on the basis of
spelling, using certain combinations of consonants and
surname suffixes to categorize the origin of the name in
question. Suffixes ‘-er’, ‘-mann’ and ‘-burg’, for example,
are typically German whereas ‘-ke’, ‘-ka’, ‘-ow’ and ‘-ski’ are
typically Slavic. In addition, the root morphemes of
surnames were also examined. Examples for a Slavic root
comprise ‘Lessing’, which sounds German but was derived
from the Slavic expression for ‘forest settler’, and ‘Kafka’,
which in Czech means ‘jackdaw’. Mixed surnames include
both German and Slavic elements, that is, a German basis
and a Slavic ending, or vice versa (‘Wudtke’ or ‘Kuppke’).
These surnames are the result of a long parallel usage of
both German and Slavic languages in the eastern part of
Germany.
Statistical analysis
The genetic relationship between the German, Sorbish and
Polish samples was assessed by Analysis of Molecular
Variance (AMOVA) using FST, an analogue of Wright’s FST
that takes the evolutionary distance between individual
Y-STR haplotypes into account.23,24 The analysis was
confined to the so-called ‘core’ haplotype, comprising all
markers but DYS385. Marker DYS385 had to be excluded
since its multilocal nature hampers the unambiguous
assignment of evolutionary distances to allele pairs.
Populations were recursively clustered by combining, in
each step, that pair of samples or clusters that yielded the
minimum global FST value for the core haplotype.
Clustering was carried out until only one cluster remained.
Estimates of pairwise and global FST values were obtained
using the ARLEQUIN software25 with a single step mutation
model, and tested for statistical significance by means
of random permutation of samples in 10 000 replicates.13
Results
In the 419 East-German males analysed in the present
study, a total of 270 different Y-STR haplotypes were
observed. While the most frequent haplotype occurred 10
times, 146 haplotypes were unique (data available from the
authors upon request). Group G comprised 139 different
surnames, 18 of which occurred twice. Five surnames were
observed more than two times (33, 15, 16). There
was only one instance in group G of a surname being
shared by two males with the same haplotype. In group S,
177 different surnames occurred, four of which were found
twice. No two males with the same surname had the same
haplotype. Finally, no shared surnames and haplotypes
were observed in group M.
Upon AMOVA, the core Y-STR haplotype distributions of
males with German (‘G’) and mixed surnames (‘M’) were
found to be indistinguishable (FST¼0.0008, P40.5). The
two samples were therefore combined into one group
(‘GþM’). Please note that this joint consideration of G and
M was retrospectively justified in that an analysis of group
G alone yielded virtually identical results (not shown). A
highly significant difference emerged between the combined
GþM group and the group of males with a Slavic
surname (‘S’; see Table 1). The observed level of differentiation
(pairwise FST¼0.0277, Po0.001) between groups
GþM and S was surprisingly large and so were approximates
seen between European populations of much larger
geographical and linguistic distances (eg Cologne and
Budapest; see www.yhrd.org). Cluster analysis based upon
global FST (Figure 2) revealed that the Y-STR core haplotype
distribution of the German S group is substantially closer
to that of the Polish population than to that of the GþM
group. The Sorbish males appear to be similarly close to
both the S group and the Polish group, although their
positioning in the tree may be less robust owing to small
sample size.
In a recent study of European Y-STR haplotypes, several
population clusters were identified; among them were
clearly defined ‘Eastern European’ and ‘Western European’
groupings.13 Haplotypes from these fringe clusters, as well
as their one-step neighbours, were classified as either
‘Western’ or ‘Eastern’, depending upon where they were
more frequent. A similar characterization of the present
samples in terms of the relative proportion of the fringe
haplotypes resulted in highly significant differences between
the two surname-defined German subgroups, GþM
and S (w2¼13.094, 2 df, P¼0.001). While 88 of the 234
haplotypes (38%) in the combined GþM group were
classified as ‘Western’, this was the case for only 42 of the
185 haplotypes (23%) in group S. In contrast, 80 GþM
haplotypes (34%) were of ‘Eastern’ type compared to 91 S
haplotypes (49%). The portion of unclassifiable haplotypes
was 28% in both groups (66 in GþM, 52 in S).
The seeming characterization of surname-defined male
samples from Halle as either ‘Western’ or ‘Eastern’ was
further corroborated by comparing the frequency of all
haplotypes observed in groups GþM and S with the
current release of YHRD (Release 15), comprising 17 214
haplotypes from 125 samples of European or Near-Eastern
extraction (Figure 3). Males from group GþM shared the
majority of their Y-STR haplotypes with western populations
whereas the distribution in group S was closer to
that of eastern, most notably Polish, populations. The
proportion of haplotypes shared between group S and
Polish males was higher than that with any other German
sample.
Discussion
How can the profound stratification observed among East-
German male lineages and their correlation with surnames
be best explained? Although the name ‘Germany’ appears
to imply a homogenous origin of the German people, the
country has always been a gateway for migration, mostly
from east to west. The best documented wave of migration
was that of Eastern Germanic tribes and Slavs, driven by
the Huns, that led to the downfall of the Roman Empire. In
historic times, two major instances of assimilation of Slavic
people into the German nation occurred. Around 950 AD,
the German Empire started to put pressure upon the Slavic
peoples inhabiting large areas of what was to become, in
the mid of the 20th Century, the German Democratic
Republic.26 By 1100 AD, after more than 100 years of wars
and proselytization, the complete area of contemporary
Germany had come under the influence of the German
Empire. During the following centuries, most of the
non-Germanic tribes (like the Baltic Prussians) completely
abandoned their language, and their descendants are today
regarded as ‘typically German’. Only in a small area,
southeast of Berlin, known as the Lausitz, the Slavicspeaking
Sorb people maintained their language and
culture, and their descendants today represent the only
recognized, non-immigrant minority in East Germany. In
any case, the names of many cities, including Berlin
(meaning ‘little swamp’), and some surnames, most
notably those of ‘typically Prussian’ nature like ‘von
Clausewitz’ or ‘Virchow’, still reflect the Slavic roots of
this part of Germany. The second major assimilation of
people with Slavic ancestry occurred during the Industrial
Revolution in the 19th Century. Thousands of people from
Eastern Europe migrated to theWest to work in the surging
industrial areas of Germany (Silesia, Ruhr-Area). Although
they brought their surnames with them, they nevertheless
became culturally amalgamated quite rapidly by the
German majority.
The Halle region is located exactly at the intersection
of the Germanic and Slavic spheres of influence of the
10th century, but it is also a traditional mining and
chemical industry area (Halle-Leipzig-Bitterfeld) that has
attracted Slavic workers during the Industrial Revolution.
Both of these factors should have had an impact upon
the male-specific genetic structure of the local population
where surnames of Germanic and Slavic origin are
about equally frequent. In terms of the relative importance
of the two historic instances for the observed correlation
between Y-STR haplotypes and surname characteristics,
it is interesting to note that surnames first
occurred in Europe in Venice during the 9th Century.
From there, the law of name bearing was adopted in France
and Catalonia in the 11th, and in England, and Western
and Southern Germany in the 12th Century. In the North
and East of Germany, the custom was practised no earlier
than the 15th Century and, in some rural regions,
surnames became fashionable only in the 18th century,
nearly 900 years after their first appearance in Europe.27,28
Furthermore, surnames frequently changed or became
modified until the beginning of the 19th century. Therefore,
it appears unlikely that the correlation between
surnames and Y-STR haplotypes observed in our study
dates back to the Middle Ages, but is more likely to be the
result of the immigration of industrial workers in the 19th
Century instead. In this respect, Central Europe appears to
differ from England and Ireland where patrilineally
inherited names are presumed to have a much deeper
rooting.14 – 17
Our results highlight the fact that the Y-chromosomal
genetic structure of modern Central European populations
is heterogeneous and that, particularly in East
Germany, the concomitant strata may be resolvable by
the consideration of surnames. This implies that future
studies targeted at more ancient population movements
inside or outside the region through the use of slowly
evolving Y-chromosomal markers (ie SNPs) may gain
efficiency from allotting the genotyping load according
to surnames.
Acknowledgements
We thank Tim Lu for helpful comments on the manuscript and Gerald
Bothe for graphical work.
REFERENCES

How to use Quote function: