How It's Done
Behind The process
HeredityLab provides ethnicity estimates that reflect how similar your DNA is to populations in a reference panel. Estimates are provided on an ethnic and regional basis, with the ethnic terms reflecting established ethnic groups and the regional terms reflecting regional, subregional, and micro level regions. To achieve this, we used the following steps, which included data gathering, establishing the panel, creation of an inference method, and methodology refinement. The genome is composed of a series of nucleotides, with the position of each nucleotide being particular to certain ethnic groups, and this allows us to reliably predict which segments of your DNA correspond to each of the ethnic groups in our panel.
1. Data Gathering
Representative genomes from worldwide populations were gathered. Information was first collected from the main publicly accessible research data sets. These include, but are not limited to, the 1000 Genomes Project, Human Genome Diversity Project (CEPH-HDGP), International HapMap Project (HapMap3), and the Population Reference Sample Project. To create a stronger reference panel that is inclusive of micro level regions, privately available reference data sets were collected and incorporated into the wider sample set. Following the aggregation of all sample data, samples were clustered through principal component analysis and samples that were not consistent with the characteristics of established clusters were removed. Within the DNA, pairs of alleles (termed ‘genotypes’) are found at different positions (single nucleotide polymorphisms) and the probability of a single genotype appearing at certain positions is dependent on the ethnic group. The likelihood of your DNA segment matching an ethnic group is based on the data of particular genotype locations in relation to that of other ethnic groups.
2. Establishing the Panel
Groups of neighboring SNPs have varying degrees of prevalence depending on population and the occurrence of particular haplotypes was noted for each reference cluster. In order to match parts of your genome with our reference groups, the digital version of your genome is virtually separated into hundreds of small divisions. These virtual divisions (commonly termed ‘windows’) are each examined, matched to groups within the reference panel, then a mathematical model (the Hidden Markov Model) is used to analyze data. Each window of division covers 4-12 centimorgans, and the Hidden Markov Model evaluates ethnicity at each position. The model examines the SNP configurations and is used to find from a statistical standpoint which group the DNA corresponds to. The term ‘haplotype’ refers to segments of adjoining dna within chromosomes that are extremely informative when looking for genetic signatures that correspond to particular ethnic groups. We are comparing the haplotype within each division to those in the reference groups. We used the BEAGLE program, which implements a Hidden Markov model, to create a series of haplotype cluster models and further refine the data. When integrated into our model, haplotype combinations are estimated after being given a set of genotypes. The haplotype cluster model contains clusters of similar haplotypes groupings, with these haplotype clusters being linked to reference groups.
3. Inference method Created
In order to reliably assign segments of DNA to reference panel groups, models were created to predict the likelihood that each of the haplotypes in each segment belonged to a particular reference group. A multiclass support vector machine algorithm was used to learn the boundaries between reference populations based on the existing data. The SVM assigns each segment to a reference group, while a Hidden Markov Model corrects errors within the output. The HMM is used to assign likelihood of all reference groups to the haplotypes present in your genome, resulting in the most probable ethnic combination with the highest likelihood of being the actual combination. The final ethnic estimate represents the amount of each ethnicity as a portion of the most likely combination.
4. Extensive Refinement of Methods
The results of these algorithms were tested to assess for validity and parameters were redefined as needed. Additional testing was completed until working predictive models were established with a reasonable degree of accuracy. To validate the process, we tested individuals from the reference panel and additional individuals from a separate specially designated panel which was created for testing purposes. The second panel contained samples with known ethnicity and model changes were made until the accuracy of the results reliably entered an acceptable range.
Common metrics for the accuracy of results include precision and recall, with precision being the actual amount of the ethnicity divided by the predicted amount, and the recall being the predicted amount divided by the actual amount. High precision systems with low recall have a tendency to assign DNA segments to a particular group only when there is a high level of confidence, but usually leaves a significant amount of DNA unassigned. High recall systems with low precision have a tendency to readily assign DNA segments to reference groups but can be incorrect at times. The continental level has the highest amount of precision, with precision declining as the assignments become more granular. Our system attempts several degrees of classification so low recall is seen where groups are hard to distinguish from others.
Accuracy of Results
How accurate are these values?
Common metrics for the accuracy of results include precision and recall, with precision being the actual amount of the ethnicity divided by the predicted amount, and the recall being the predicted amount divided by the actual amount. High precision systems with low recall have a tendency to assign DNA segments to a particular group only when there is a high level of confidence, but usually leaves a significant amount of DNA unassigned. High recall systems with low precision have a tendency to readily assign DNA segments to reference groups but can be incorrect at times. The continental level has the highest amount of precision, with precision declining as the assignments become more granular. Our system attempts several degrees of classification so low recall is seen where groups are hard to distinguish from others.
Assigning DNA segments to groups in the reference panel is dependent on the models used and all values are approximate values. While the models are created to approximate with as much accuracy as possible, they are not always completely correct and should be viewed as estimates only which have been calculated to the best of our computational ability.
Reference Panel
The following is a list of the groups that are part of the reference panel. Note that the quantity isn’t static and more samples may be added to the list as needed.
Europe:
German, Dutch, French (North), French (South), French, Swiss, Breton, Frisian, Cypriot, Albanian, Sicilian, Greek, Serbian, Slovenian, Italian (North), Italian (South), Croatian, Maltese, Sardinian, Sicilian, Bosniak, Corsican, Irish, Welsh, English, Scottish, Swedish, Finnish, Danish, Norwegian, Sami, Latvian, Lithuanian, Estonian, Hungarian, Ashkenazi Jewish, Slovak, Czech, Austrian, Portuguese, Spanish, Basque, Belarusian, Romanian, Bulgarian, Ukrainian, Russian, Polish, Lipka Tatar, Romanian Tatar
Caucasus:
Georgian, Armenian, Azeri, Dargin, Adyghe, Ossetian, Kabardian, Abazin, Tatar
West Africa:
Sierra Leone, Liberia, Guinea, Kru, Ewe, Wolof, Sahelian, Yoruba, Mandinka, Bambara, Fon, Abron, Akan, Igbo, Nigerian (Central, East), Fulani (Nigeria, Guinea), Mende, Angolan (North, South), Burkina Faso.
Southern Africa:
Kalahari Forager (Khoekhoen, San), Botswana (Tswana people), South Africa (Northeast, Southeast), Zimbabwe (Shona people)
East Africa:
Kalenjin, Kikuyu, Somali (North, South), Luhya, Sudan (North, Central, South), South Sudan, Maasai, Luo, Merina, Makua, Wolayta, Amhara, Oromo, Tigray, Sidama, Afar, Hadiya
Central Africa:
Central African Forager (Mbuti, Biaka), Cameroon (Fang, Bamoun), Congolese West (Anamongo, EsiKongo), Congolese East (Baluba), Central African Republic
Southeast Asia: Filipino (North, South), Indonesian (East, West), Malaysian, Thai, Cambodian (Khmer), Bamar, She, Miao, Filipino, Dayak, Batak, Nyishi, Tangsa, Yi, Lahu, Mon, Vietnamese, Assamese (Bodo Kachari, Rajbongshi).
South Asia:
Koli, Bengali, Marathi, Baloch, Uttar Pradesh, Odisha, Karnataka, Pathan, Sindhi, Gujarati, Punjabi, Rajasthani, Malayali, Telugu, Sinhalese, Bhojpuri, Odisha, Nepalese, Punjabi, Makrani, Kho, Bhil, Kannada, Tamil.
The Americas:
Inuit, Indigenous North, Indigenous Central Plains, Indigenous Southwest US, Mexican Indigenous North, Mexican Indigenous South, Indigenous Central America, Indigenous Caribbean (Taino), Indigenous South America (Northwest, Andes, Southern Cone, Amazonian).
Oceania:
Polynesian, Maori, Melanesian, Polynesian, Aboriginal Australian, Hawaiian, Moluccan, Tahitian, Tonga and Samoa.
East Asia and Siberia:
Evenk, Orok, Chinese (North, Central, South), Han, Nashi, Japanese, Yi, She, Korean, Manchu, Tujia, She, Zhuang, Ryukyuan, Nenet
Mideast: Arabian Peninsula (Eastern), Arabian Peninsula (Western), Turkish (West, East, Central, Southeast, Southwest, Northwest), Druze, Lebanese, Egyptian (North, South), Jordanian, Yemeni (Western, Eastern), Syrian (West), Assyrian, Palestinian, Bedouin, Iranian (West, Central, East), Kurdish (West, East), Iraqi (Central, South), Assyrian, Sephardic Jewish, Palestinian Christian, Lebanese Christian, Egyptian Coptic.
North Africa: Algerian, Tunisian, Moroccan (North, South), Libyan (East, West), Berber (Algeria)
The Jewish Diaspora Genetic Test focuses on these Jewish genetic groups:
Persian Jewish, Ashkenazi, Eastern Caucasus Jewish, Northern-Central Mesopotamia Jewish, Eastern Maghrebi Jewish (Moroccan, Algerian), Western Maghrebi Jewish (Tunisian, Libyan), Sephardic (Turkey) Jewish, Northern Levantine Jewish, Ethiopian Jewish, Yemeni Jewish, Indian Jewish.
The Central Asian Genetic Test focuses on these groups:
Hazara, Kyrgyz, Kazakh, Uzbek (East, West), Uyghur (West, East, Southwest), Pashtun (Afghanistan, Pakistan), Turkmen (Southwest, West, Southeast), Iranian (Northeast, East, Southeast, Central), Mongolian.
Reference Panel Sample Details
The following is a list containing the quantity of samples used per group within the reference panel. Note that the quantities are not fixed, and samples may be added or removed depending on changes we make to the reference panel. We may create some new categories and merge some categories depending on the level of accuracy. The most likely ones to be merged with others are those that have a high degree of overlap or low precision and recall. Also note that some of the categories under the Jewish Genetic Test have a slightly different set of reference panel samples than those same categories under the general reference panel, hence the statistics for precision and recall are different.
In these cases, the values under the Precision column represent how much of the assigned ethnicity is correct, while the values under the Recall column represent how much of the actual ethnicity is assigned by the process. An example for Precision would be if someone who is assigned 80% Makua by the algorithm, but only 40% is actually truly Makua. The precision value in this case would be 0.50. An example for Recall would be someone having 100% Irish ancestry, but only 20% is estimated by the model. In this case, the recall would be 0.20.
Group | Number of Samples | Precision | Recall |
Europe | |||
1. German | 93 | 0.69 | 0.72 |
2. Dutch | 57 | 0.88 | 0.74 |
3. French (North) | 38 | 0.86 | 0.76 |
4. French (South) | 52 | 0.85 | 0.92 |
5. French | 61 | 0.88 | 0.91 |
6. Swiss | 43 | 0.76 | 0.71 |
7. Breton | 29 | 0.65 | 0.64 |
8. Frisian | 47 | 0.68 | 0.93 |
9. Cypriot | 30 | 0.63 | 0.88 |
10. Albanian | 19 | 0.61 | 0.62 |
11. Sicilian | 28 | 0.73 | 0.97 |
12. Greek | 52 | 0.91 | 0.75 |
13. Serbian | 24 | 0.72 | 0.93 |
14. Slovenian | 22 | 0.56 | 0.9 |
15. Italian (North) | 109 | 0.62 | 0.65 |
16. Italian (South) | 103 | 0.9 | 0.77 |
17. Croatian | 42 | 0.65 | 0.86 |
18. Maltese | 37 | 0.78 | 0.79 |
19. Sardinian | 51 | 0.88 | 0.93 |
20. Bosniak | 64 | 0.81 | 0.61 |
21. Corsican | 21 | 0.52 | 0.83 |
22. Irish | 29 | 0.61 | 0.94 |
23. Welsh | 135 | 0.93 | 0.67 |
24. English | 62 | 0.7 | 0.91 |
25. Scottish | 148 | 0.84 | 0.94 |
26. Swedish | 121 | 0.82 | 0.81 |
27. Finnish | 105 | 0.96 | 0.79 |
28. Danish | 94 | 0.73 | 0.7 |
29. Norwegian | 81 | 0.81 | 0.66 |
30. Saami | 107 | 0.89 | 0.83 |
31. Latvian | 10 | 0.71 | 0.69 |
32. Lithuanian | 87 | 0.65 | 0.61 |
33. Estonian | 55 | 0.85 | 0.77 |
34. Hungarian | 38 | 0.75 | 0.83 |
35. Ashkenazi Jewish | 118 | 0.77 | 0.86 |
36. Slovak | 34 | 0.89 | 0.78 |
37. Czech | 32 | 0.52 | 0.94 |
38. Austrian | 75 | 0.6 | 0.83 |
39. Portuguese | 52 | 0.57 | 0.65 |
40. Spanish | 105 | 0.64 | 0.95 |
41. Basque | 92 | 0.84 | 0.77 |
42. Belarusian | 54 | 0.89 | 0.94 |
43. Romanian | 45 | 0.91 | 0.96 |
44. Bulgarian | 58 | 0.96 | 0.94 |
45. Ukrainian | 81 | 0.68 | 0.72 |
46. Russian | 104 | 0.86 | 0.61 |
47. Polish | 114 | 0.77 | 0.87 |
48. Eastern European Tatar | 12 | 0.63 | 0.92 |
49. Greek Mainlander | 56 | 0.65 | 0.8 |
50. Greek Islander | 59 | 0.61 | 0.86 |
51. Macaronesia | 35 | 0.54 | 0.73 |
Caucasus | |||
1. West Caucasus | 51 | 0.87 | 0.82 |
2. Armenian | 75 | 0.71 | 0.96 |
3. Azeri | 46 | 0.86 | 0.94 |
4. North Caucasus (Ossetian, Adygei, Balkar, Kabardin) | 37 | 0.62 | 0.65 |
5. Northeast Caucasus (Lezgin, Tabassaran, Chechen) | 33 | 0.85 | 0.74 |
West Africa | |||
1. Senegal (Wolof) | 47 | 0.8 | 0.85 |
2. Fulani (Guinea) | 24 | 0.91 | 0.68 |
3. Krou | 40 | 0.8 | 0.91 |
4. Sierra Leone | 53 | 0.94 | 0.75 |
5. Kwa | 61 | 0.83 | 0.72 |
6. Nigeria West | 106 | 0.87 | 0.94 |
7. Nigeria East | 113 | 0.95 | 0.83 |
8. Nigeria North | 36 | 0.54 | 0.74 |
9. Gabon and Cameroon | 28 | 0.97 | 0.68 |
10. Western Congo | 52 | 0.65 | 0.54 |
11. South Angola | 6 | 0.67 | 0.73 |
12. Malinke | 38 | 0.6 | 0.64 |
13. Burkina Faso | 4 | 0.58 | 0.62 |
14. Sahelian West | 15 | 0.65 | 0.7 |
Southern Africa: | |||
1. Kalahari Forager | 42 | 0.85 | 0.87 |
2. Tswana | 25 | 0.82 | 0.79 |
3. South African Bantu (Northeast, Southeast) | 94 | 0.75 | 0.91 |
4. Shona | 29 | 0.68 | 0.83 |
5. Bemba | 3 | 0.52 | 0.75 |
East Africa: | |||
1. Kenya West (Kalenjin, Luo) | 47 | 0.66 | 0.79 |
2. Kenya Central (Kikuyu) | 16 | 0.96 | 0.89 |
3. Somali (North, South) | 92 | 0.69 | 0.9 |
4. Sudan (North, Central, South) | 82 | 0.61 | 0.94 |
5. South Sudan (Dinka, Nuer) | 31 | 0.79 | 0.62 |
6. Maasai | 22 | 0.67 | 0.85 |
7. Malagasy Highlander | 29 | 0.69 | 0.74 |
8. Malagasy Coast | 24 | 0.72 | 0.61 |
8. Mozambique (Makua) | 17 | 0.78 | 0.67 |
9. Ethiopia North (Tigray, Amhara, includes Eritrean) | 48 | 0.88 | 0.71 |
10. Ethiopia South (Oromo, Wolayta) | 11 | 0.55 | 0.58 |
11. Rwanda (Hutu, Tutsi) | 14 | 0.63 | 0.54 |
12. Uganda (Baganda) | 10 | ||
Central Africa: | |||
1. Central African Forager (Mbuti, Biaka) | 46 | 0.94 | 0.83 |
2. Congolese East (Baluba) | 29 | 0.69 | 0.95 |
3. Central African Republic | 8 | 0.43 | 0.65 |
Southeast Asia: | |||
1. Filipino (North, South) | 103 | 0.87 | 0.64 |
2. Indonesian (East, West) | 115 | 0.75 | 0.96 |
3. Eastern SEA (Vietnamese Kinh, Lao) | 72 | 0.62 | 0.97 |
4. Yunnan Hills (Lahu) | 58 | 0.77 | 0.73 |
5. Central SEA (Thai, Mon, Cambodian, Peninsular Malaysian) | 32 | 0.63 | 0.7 |
6. Western SEA (Burmese – Bamar) | 41 | 0.63 | 0.64 |
7. Dai (Yunnan) | 24 | 0.58 | 0.88 |
8. Indigenous Borneo | 16 | 0.64 | 0.92 |
9. Indigenous Sumatra (Batak) | 10 | 0.63 | 0.7 |
10. Nyishi (Arunachal Pradesh) | 62 | 0.72 | 0.79 |
11. Tangsa (Arunachal Pradesh) | 64 | 0.83 | 0.61 |
12. Assamese (Bodo Kachari, Rajbongshi) | 11 | 0.73 | 0.72 |
13. Bodo | 14 | 0.64 | 0.62 |
14. Rajbanshi | 9 | 0.89 | 0.74 |
South Asia: | |||
1. Koli | 10 | 0.67 | 0.8 |
2. Bengali | 67 | 0.87 | 0.92 |
3. Marathi | 114 | 0.83 | 0.75 |
4. Baloch | 46 | 0.68 | 0.72 |
5. Uttar Pradesh | 26 | 0.9 | 0.62 |
6. Odisha | 74 | 0.92 | 0.91 |
7. Karnataka | 63 | 0.79 | 0.8 |
8. Pathan | 46 | 0.9 | 0.78 |
9. Sindhi | 85 | 0.74 | 0.7 |
10. Gujarati | 92 | 0.83 | 0.68 |
11. Rajasthani | 78 | 0.85 | 0.77 |
12. Malayali | 20 | 0.67 | 0.82 |
13. Telugu | 74 | 0.69 | 0.77 |
14. Sinhalese | 42 | 0.72 | 0.6 |
15. Bhojpuri | 19 | 0.87 | 0.69 |
16. Nepalese | 106 | 0.89 | 0.83 |
17. Punjabi | 25 | 0.69 | 0.9 |
18. Makrani | 13 | 0.47 | 0.65 |
19. Kho | 26 | 0.49 | 0.59 |
20. Bhil | 60 | 0.75 | 0.84 |
21. Tamil | 71 | 0.85 | 0.91 |
22. Pakistani GI (Kalash) | 20 | 0.66 | 0.9 |
The Americas: | |||
1. Inuit | 61 | 0.77 | 0.72 |
2. Indigenous North | 38 | 0.91 | 0.58 |
3. Indigenous Central Plains | 53 | 0.84 | 0.66 |
4. Indigenous Southwest US | 61 | 0.81 | 0.65 |
5. Mexican Indigenous (North, South) | 107 | 0.92 | 0.87 |
6. Indigenous Central America | 32 | 0.66 | 0.82 |
7. Indigenous Caribbean (Taino) | 23 | 0.73 | 0.65 |
8. Indigenous South America (Northwest, Andes, Southern Cone, Amazonian) | 104 | 0.75 | 0.68 |
Oceania: | |||
1. Polynesian | 55 | 0.9 | 0.94 |
2. Maori | 50 | 0.75 | 0.84 |
3. Melanesian | 47 | 0.64 | 0.95 |
4. Aboriginal Australian | 26 | 0.86 | 0.96 |
5. Hawaiian | 29 | 0.85 | 0.92 |
6. Moluccan | 26 | 0.72 | 0.64 |
7. Tahitian | 5 | 0.78 | 0.78 |
8. Tonga and Samoa | 47 | 0.77 | 0.66 |
East Asia and Siberia: | |||
1. Western Siberian (Chukchi) | 15 | 0.53 | 0.77 |
2. Central Siberian GI (Nganasan) | 8 | 0.48 | 0.79 |
3. Chinese North (Han) | 75 | 0.89 | 0.9 |
4. Chinese Central (Han) | 45 | 0.83 | 0.88 |
5. Chinese South (Han) | 72 | 0.83 | 0.67 |
6. Japanese | 20 | 0.79 | 0.66 |
7. Korean (South) | 117 | 0.85 | 0.73 |
8. Manchu (Northeast China) | 43 | 0.72 | 0.72 |
9. Zhuang (South China) | 5 | 0.85 | 0.81 |
10. Ryukyuan | 101 | 0.78 | 0.97 |
11. Miao (South China) | 48 | 0.69 | 0.86 |
12. Wuling Mountains (Central China South) (She, Miao, Tujia) | 39 | 0.65 | 0.67 |
13. Hengduan Mountains (Yi, Naxi) | 7 | 0.61 | 0.6 |
14. Tungusic Manchurian (Xibo, Hezhen) | 32 | 0.52 | 0.43 |
15. Mongolian | 94 | 0.76 | 0.94 |
Mideast: | |||
1. Arabian Peninsula (Eastern) | 49 | 0.8 | 0.64 |
2. Arabian Peninsula (Western) | 34 | 0.84 | 0.83 |
3. Turkish (West, East, Central, Southeast, Southwest, Northwest) | 117 | 0.81 | 0.67 |
4. Druze | 41 | 0.79 | 0.66 |
5. Lebanese | 38 | 0.62 | 0.77 |
6. Egyptian (North, South) | 106 | 0.95 | 0.66 |
7. Jordanian | 64 | 0.65 | 0.76 |
8. Yemeni (Western, Eastern) | 57 | 0.94 | 0.93 |
9. Syrian (West) | 58 | 0.81 | 0.74 |
10. Assyrian | 47 | 0.8 | 0.63 |
11. Palestinian | 35 | 0.86 | 0.61 |
12. Bedouin (Sinai) | 12 | 0.7 | 0.92 |
13. Iranian (West, Central, East) | 107 | 0.78 | 0.71 |
14. Kurdish (West, East) | 75 | 0.77 | 0.82 |
15. Iraqi (Central, South) | 41 | 0.8 | 0.73 |
16. Sephardic Jewish | 72 | 0.63 | 0.81 |
17. Palestinian GI (Palestinian Christians, Palestinian Samaritans, Palestinian Muslims (Nablus)) | 68 | 0.66 | 0.85 |
18. Lebanese GI (Lebanese Christians, rural Lebanese Shia) | 66 | 0.62 | 0.89 |
19. Egyptian GI (Coptic Christian) | 70 | 0.89 | 0.87 |
20. Iranian GI (Zoroastrian) | 15 | 0.63 | 0.8 |
North Africa: | |||
1. Algerian | 22 | 0.78 | 0.87 |
2. Tunisian | 50 | 0.63 | 0.72 |
3. Moroccan (North, South) | 75 | 0.62 | 0.81 |
4. Libyan (East, West) | 44 | 0.75 | 0.74 |
5. Berber (Algeria) | 36 | 0.66 | 0.79 |
Jewish Diaspora Genetic Test: | |||
1. Persian Jewish | 102 | 0.74 | 0.71 |
2. Ashkenazi | 118 | 0.89 | 0.65 |
3. Caucasus Jewish | 64 | 0.91 | 0.62 |
4. Northern-Central Mesopotamia Jewish | 24 | 0.84 | 0.74 |
5. Eastern Maghrebi Jewish (Moroccan, Algerian) | 50 | 0.66 | 0.76 |
6. Western Maghrebi Jewish (Tunisian, Libyan) | 87 | 0.93 | 0.88 |
7. Sephardic Jewish (Turkey) | 61 | 0.78 | 0.68 |
8. Northern Levantine Jewish | 92 | 0.69 | 0.89 |
9. Ethiopian Jewish | 37 | 0.87 | 0.66 |
10. Yemeni Jewish | 34 | 0.86 | 0.81 |
11. Indian Jewish | 15 | 0.6 | 0.77 |
Central Asian Genetic Test: | |||
1. Hazara | 36 | 0.62 | 0.68 |
2. Kyrgyz | 28 | 0.69 | 0.95 |
3. Kazakh | 85 | 0.78 | 0.94 |
4. Uzbek (East, West) | 65 | 0.62 | 0.72 |
5. Uyghur (West, East, Southwest) | 58 | 0.58 | 0.7 |
6. Pashtun (Afghanistan, Pakistan) | 82 | 0.81 | 0.9 |
7. Turkmen (Southwest, West, Southeast) | 81 | 0.74 | 0.72 |
8. Iranian (Northeast, East, Southeast, Central) | 104 | 0.89 | 0.73 |
9. Mongolian | 45 | 0.68 | 0.71 |
Sample Sources
The following is a list of the samples within the reference panel, with a description of the ethno-geographic sources for a majority of the known samples within each sample set. The samples come from both public data sets and our outreach program. Because of this, the samples fall into two categories: ones where we could verify the ancestral regions of origin, and ones where we could not verify the precise ancestral regions of origin. Listed are the regions of origin for samples which could be verified. Participants were screened based on history of ancestry in a particular region, based on their parents or grandparents being from the region. For some of the samples which we were not able to verify the regions of origin, we compared these samples to known samples per country and listed the ones that clearly clustered with these known samples within each region specific category. This process is meant to generate approximations as to the ancestral locations of origin.
Group | Source of A Plurality of Samples |
Europe | |
1. German | North Rhine Westphalia, Hesse, Rhineland Palatinate, Baden Wurttemberg |
2. Dutch | South Holland, North Brabant, Utrecht, North Holland |
3. French (North) | Hauts de France, Normandie, Grand Est, Centre Val de Loire |
4. French (South) | Provence Alpes- Côte d’Azur, Auvergne Rhone Alpes, Occitane |
5. French | France: unspecified |
6. Swiss | Switzerland: unspecified |
7. Breton | Bretagne |
8. Frisian | Friesland, Nordfriesland |
9. Cypriot | Limassol, Nicosia |
10. Albanian | Tirana, Fier, Durres, Lezhe, Shkoder |
11. Sicilian | Palermo, Catania |
12. Greek | Kendriki Makedonia , Attiki |
13. Serbian | Sumadija, Kolubara |
14. Slovenian | Savinga, Drava |
15. Italian (North) | Lombardy, Emilia Romagna |
16. Italian (South) | Campania, Apulia, Calabria |
17. Croatian | Split Dalmatia, Primorje Gorski Kotar |
18. Maltese | Valletta, Birkirkara |
19. Sardinian | Cagliari, Sassari, Nuoro, Oristano |
20. Bosniak | Sarajevo, Tulza, Zenica-Doboj |
21. Corsican | Upper Corsica, Southern Corsica |
22. Irish | Cork, Kildare, Meath, Antrim, Down, Londonderry, Wicklow, Louth, Wexford, Armagh, Tyrone |
23. Welsh | Cardiff, Swansea, Rhondda Cynon Taf |
24. English | Greater Manchester, West Midlands, West Yorkshire, Merseyside, South Yorkshire |
25. Scottish | Fife, North Lanarkshire, South Lanarkshire, Highland, West Lothian |
26. Swedish | Vastra Gotalands lan, Skane lan, Ostergorlands lan, Uppsala lan, Sodermanlands lan, Jonkopings lan |
27. Finnish | Uusimaa, Pirkanmaa, Varsinais-Suomi, Pohjois-Pohjanmaa, Satakunta |
28. Danish | Syddanmark, Midtjylland, Zealand, Nordjylland |
29. Norwegian | Viken, Vestland, Rogaland, Trondelag, Innlandet |
30. Saami | Finnmark (Norway) |
31. Latvian | Pierigas, Zemgales, Kursemes, Vidzemes |
32. Lithuanian | Vilnius, Kaunas, Klaipeda, Siauliai |
33. Estonian | Harju, Ida-Viru, Tartu, Parnu, Laane-Viru |
34. Hungarian | Pest, Gyor-Moson-Sopron, Hajdu-Bihar |
35. Ashkenazi Jewish | Warsaw, Budapest, Lodz, Lviv |
36. Slovak | Bratislava, Kosice, Presov, Zilina |
37. Czech | Central Bohemian Region, Praha, South Moravian Region, Moravian-Silesian Region |
38. Austrian | Lower Austria, Upper Austria, Styria, Tyrol |
39. Portuguese | Vila Real, Aveiro, Braga, Setubal, Santarem |
40. Spanish | Andalusia, Catalonia, Valencia, Castilla y Leon, Galicia, Castilla La Mancha |
41. Basque | Pais Vasco |
42. Belarusian | Homel, Vitebsk, Brest, Hrodna |
43. Romanian | Cluj, Timis, Iasi, Constanta, Prahova, Brasov |
44. Bulgarian | Plovdiv, Varna, Burgas, Stara Zagora, Blagoevgrad |
45. Ukrainian | Kharkiv, Dnipropetrovsk, Lviv, Odessa |
46. Russian | Krasnodar Krai, Sverdlovsk Oblast, Rostov Oblast |
47. Polish | Masovian Voivodeship, Silesian Voivodeship, Lesser Poland Voivodeship, Greater Poland Voivodeship, Lower Silesian Voivodeship |
48. Eastern European Tatar | Eastern Poland (Lipka Tatars), Constanta (Romanian Tatars) |
49. Greek Mainlander | Attiki, Kentriki Makedonia, Sterea Ellada, Thessalia, Peloponnisos |
50. Greek Islander | Kriti, Evia, Rodos |
51. Macaronesia | Tenerife, Gran Canaria, Lanzarote, Funchal |
Caucasus | |
1. West Caucasus | Georgia: Imereti, Shida Kartli, Adjara AR, Abkhazia AR Ethnic Groups: Georgian, Abkhazian. |
2. Armenian | Armenia: Ararat, Kotayk, Lori |
3. Azeri | Azerbaijan: Central Aran, Shirvan-Salyan. |
4. North Caucasus (Ossetian, Adygei, Balkar, Kabardin) | Ethnic Groups: Ossetian, Adygei, Balkar, Kabardin |
5. Northeast Caucasus (Lezgin, Tabassaran, Chechen) | Ethnic Groups: Lezgin, Tabassaran, Chechen |
West Africa | |
1. Senegal (Wolof) | Senegal: Thies, Diourbel, Fatick, Ziguinchor |
2. Fulani (Guinea) | Guinea: Kankan, Nzerekore |
3. Krou | Liberia: Nimba County. Ivory Coast: Montagnes. |
4. Sierra Leone | Sierra Leone: Kenema, Bo. Ethnic Groups: Temne, Mende. |
5. Kwa | Southern Ghana: Volta, Ashanti, Central. Eastern Ivory Coast: Lacs, Comoe, Lagunes. Togo: Plateaux |
6. Nigeria West | Nigeria: Oyo, Ogun, Ondo. Benin: Plateau. Ethnic Groups: Yoruba |
7. Nigeria East | Nigeria: Cross River, Imo, Benue. Ethnic Groups: Igbo, Ijaw, Ibibio, Ekoi, Tiv, Jukon. |
8. Nigeria North | Nigeria: Kano, Jigawa. Niger: Tahoua, Maradi. Ethnic Groups: Hausa, Fulani. |
9. Gabon and Cameroon | Cameroon: Bamileke tribe (Ouest), Pahouin tribe (Sud). Gabon: Pahouin tribe (Woleu-Ntem). |
10. Western Congo | Democratic Republic of the Congo: EsiKongo Tribe. Angola: Ambundu Tribe |
11. South Angola | Angola: Ovambo Ethnic Group (Cunene) |
12. Malinke | Mali: Sikasso, Guinea: Kankan |
13. Burkina Faso | Burkina Faso: Mossi Tribe (Centre-Sud, Centre-Ouest) |
14. Sahelian West | Mauritania: Beidane Ethnic Group |
Southern Africa: | |
1. Kalahari Forager | Namibia: Otjozondjupa. Ethnic Groups: Khoekhoen, San. |
2. Tswana | Botswana: Central District, Kweneng District. |
3. South African Bantu (Northeast, Southeast) | Eastern South Africa: KwaZulu Natal, Mpuma-Langa |
4. Shona | Zimbabwe: Manicaland Province, Midlands Province. |
5. Bemba | Zambia: Lusaka Province, Central Province |
East Africa: | |
1. Kenya West (Kalenjin, Luo) | Kenya: Western Province, Nyanza Province, Rift Valley Province |
2. Kenya Central (Kikuyu) | Kenya: Rift Valley Province, Eastern Province |
3. Somali (North, South) | Somalia North: Somaliland, Puntland. Somalia Central: Galmudug, Hirshabelle. Somalia South: South West Region, Jubaland. |
4. Sudan (North, Central, South) | North: River Nile State, Central: El Gazira, North Kordofan, South: Blue Nile, South Kordofan |
5. South Sudan (Dinka, Nuer) | South Sudan: Central Equatoria, Eastern Equatoria |
6. Maasai | Kenya: Rift Valley Province |
7. Malagasy Highlander | Madagascar: Antananarivo Province |
8. Malagasy Coast | Madagascar: Atsinanana. Ethnic Group: Betsimisaraka People |
8. Mozambique (Makua) | Mozambique: Zambezia, Nampula |
9. Ethiopia North (Tigray, Amhara, includes Eritrean) | Ethiopia: Amhara and Tigray Regions, Eritrea. |
10. Ethiopia South (Oromo, Wolayta) | Ethiopia: Oromia and Wolaita Regions |
11. Rwanda (Hutu, Tutsi) | Rwanda: Eastern Province |
12. Uganda (Baganda) | Central Uganda |
Central Africa: | |
1. Central African Forager (Mbuti, Biaka) | Democratic Republic of the Congo: Mbenga and Mbuti ethnic groups. |
2. Congolese East (Baluba) | Democratic Republic of the Congo: Kasai Province |
3. Central African Republic | Central African Republic: Sango ethnic group. |
Southeast Asia: | |
1. Filipino (North, South) | The Philippines: Luzon (North), Mindanao (South). |
2. Indonesian (East, West) | Indonesia: Java, Borneo, Sumatra, Sulawesi. |
3. Eastern SEA (Vietnamese Kinh, Lao) | Vietnam: Northwest Region, Northeast Region, Central Highlands Region. Laos: Salavan, Savannakhet. |
4. Yunnan Hills (Lahu) | China: Yunnan Province. Vietnam: Lai Chau Province |
5. Central SEA (Thai, Mon, Cambodian, Peninsular Malaysian) | Peninsular Malaysia: Eastern Region, Southern Region. Cambodia: Mekong Lowlands, Eastern Region. Thailand: Northeast Region, Central Region. Myanmar: Mon State. Thailand: Pathum Thani Province. |
6. Western SEA (Burmese – Bamar) | Myanmar: Magwe, Sagaing |
7. Dai (Yunnan) | China: Yunnan Province |
8. Indigenous Borneo | Indonesia: Borneo |
9. Indigenous Sumatra (Batak) | Indonesia: Sumatra |
10. Nyishi (Arunachal Pradesh) | India: Arunachal Pradesh |
11. Tangsa (Arunachal Pradesh) | India: Arunachal Pradesh |
12. Assamese (Bodo Kachari, Rajbongshi) | India: Assam State. |
13. Bodo | India: Assam State (Bodoland Region) |
14. Rajbanshi | India: Assam State |
South Asia: | |
1. Koli | India: Gujarat |
2. Bengali | Bangladesh: Rajshahi, Sylhet |
3. Marathi | India: Maharashtra (Aurangabad Region) |
4. Baloch | Pakistan: Balochistan (Quetta, Sibi) |
5. Uttar Pradesh | India: Uttar Pradesh (Allahabad, Lucknow, Bareilly) |
6. Odisha | India: Odisha (Northern District, Central District) Primary group: Odia speakers |
7. Karnataka | India: Karnataka (Belgavi, Mysuru) |
8. Pathan | Afghanistan: Zabol, Kandahar, Paktika |
9. Sindhi | Pakistan: Sindh (Jamshoro, Sanghar) |
10. Gujarati | India: Gujarat (Vadodara, Navsari, Anand) |
11. Rajasthani | India: Rajasthan (Bhilwara, Pali, Ajmer) |
12. Malayali | India: Kerala (Kottayam, Malappuram) |
13. Telugu | India: Telangana (Nalgonda, Khammam) |
14. Sinhalese | Sri Lanka: Uva Province, Central Province |
15. Bhojpuri | India: Uttar Pradesh (Azamgarh, Varanasi, Mirzapur) |
16. Nepalese | Nepal: Koshi Pradhesh, Madhesh Pradesh |
17. Punjabi | Pakistan: Punjab (Chiniot, Sargodha, Okara). India: Punjab (Jalandhar, Patiala) |
18. Makrani | Pakistan: Balochistan (Makran) |
19. Kho | Pakistan: Chitral |
20. Bhil | India: Madhya Pradesh |
21. Tamil | India: Tamil Nadu (Chola Naadu, Kongu Naadu) |
22. Pakistani GI (Kalash) | Pakistan: Chitral |
The Americas: | |
1. Inuit | Canada: Quebec (Nunavik). United States: Alaska (Northwest Arctic Borough) |
2. Indigenous North | Canada: Cree tribe (Manitoba) |
3. Indigenous Central Plains | United States: South Dakota (Sioux) |
4. Indigenous Southwest US | United States: Pima and Navajo ethnic groups. |
5. Mexican Indigenous (North, South) | Mexico: Otomi, Tarahumara, Mixtec, Seri, Huichol |
6. Indigenous Central America | Guatemala: Maya. El Salvador: Lenca, Pipil |
7. Indigenous Caribbean (Taino) | Puerto Rico, Dominican Republic, Cuba |
8. Indigenous South America (Northwest, Andes, Southern Cone, Amazonian) | Brazil: Guarani. Peru: Quechua. Argentina: Toba. Colombia: Piapoco |
Oceania: | |
1. Polynesian | Tuvalu, French Polynesia, Cook Islands |
2. Maori | New Zealand: North Island |
3. Melanesian | Eastern Papua New Guinea, Fiji, Vanuatu |
4. Aboriginal Australian | Australia: (Northern Territory) |
5. Hawaiian | United States: Hawaii |
6. Moluccan | Indonesia: Maluku Islands (North Maluku) |
7. Tahitian | French Polynesia: Tahiti |
8. Tonga and Samoa | Tonga, Samoa |
East Asia and Siberia: | |
1. Western Siberian (Chukchi) | Russia: Chukchi Peninsula |
2. Central Siberian GI (Nganasan) | Russia: Taymyr Peninsula |
3. Chinese North (Han) | China: Heilongjiang, Liaoning |
4. Chinese Central (Han) | China: Anhui, Jiangsu, Zhejiang |
5. Chinese South (Han) | China: Guangdong, Fujian, Guizhou |
6. Japanese | Japan: Yamagata, Niagara, Gifu |
7. Korean (South) | South Korea: Chungcheongnam-do, Gangwon-do |
8. Manchu (Northeast China) | China: Jilin, Inner Mongolia Autonomous Region |
9. Zhuang (South China) | China: Guangxi Zhuang Autonomous Region |
10. Ryukyuan | Japan: Ryukyuan Islands |
11. Miao (South China) | China: Yunnan, Guizhou |
12. Wuling Mountains (Central China South) (She, Miao, Tujia) | China: Guizhou, Hunan, Jiangxi |
13. Hengduan Mountains (Yi, Naxi) | China: Hengduan Mountains |
14. Tungusic Manchurian (Xibo, Hezhen) | China: Manchuria |
15. Mongolian | Mongolia: Tov, Selenge. |
Mideast: | |
1. Arabian Peninsula (Eastern) | Oman: Dschanub al-Batina, Maskat. UAE: Bani Yas Tribe. Qatar: Qatari Bedouin. |
2. Arabian Peninsula (Western) | Saudi Arabia: Hejaz Region |
3. Turkish (West, East, Central, Southeast, Southwest, Northwest) | Turkey: Eastern Anatolia Region, Black Sea Region, Southeastern Anatolia Region, Central Anatolia Region, Mediterranean Region, Aegean Region, Marmara Region. |
4. Druze | Lebanon: Mount Lebanon Governorate, Chouf district. |
5. Lebanese | Lebanon: North Lebanon Region, Mount Lebanon Region, South Lebanon Region |
6. Egyptian (North, South) | Egypt: Faiyum, Sharqia, Qena. |
7. Jordanian | Jordan: Ajlun, Al Balqa |
8. Yemeni (Western, Eastern) | Yemen: East (Aden, Janad, Azal). West (Hadhramaut- Mukalla). |
9. Syrian (West) | Syria: Homs, Hama, Rif Damashq |
10. Assyrian | Syria: Al Hasakah, Khabur River Valley |
11. Palestinian | Palestine: Acre, Haifa, Ramallah, Hebron |
12. Bedouin (Sinai) | Sinai: Jabaliya Tribe |
13. Iranian (West, Central, East) | Iran: Mazandaran, Qom, Semnan, Qazvin, Markazi, Gilan |
14. Kurdish (West, East) | Turkey: Eastern Anatolia Region. Iraq: Dohuk, Sulaymaniyah. |
15. Iraqi (Central, South) | Iraq: Babil, Dhi Qar, Wasit |
16. Sephardic Jewish | Turkey: Istanbul, Izmir |
17. Palestinian GI (Palestinian Christians, Palestinian Samaritans, Palestinian Muslims (Nablus)) | Palestine: Nablus, Nazareth, Zababdeh |
18. Lebanese GI (Lebanese Christians, rural Lebanese Shia) | Lebanon: Keserwan-Jbeil Governorate, Mount Lebanon Governorate, Baalbek, Hermel. |
19. Egyptian GI (Coptic Christian) | Egypt: Alexandria |
20. Iranian GI (Zoroastrian) | Iran: Tehran |
North Africa: | |
1. Algerian | Algeria: Saida, Mila, Bouira |
2. Tunisian | Tunisia: Siliana, Monastir, Zaghouan |
3. Moroccan (North, South) | Morocco: Taza, Fez, Khenifra, Azilal, El Kelaa |
4. Libyan (East, West) | Libya: Cyrenaica, Tripolitania |
5. Berber (Algeria) | Algeria: Chaoui Ethnic Group |
Jewish Diaspora Genetic Test: | |
1. Persian Jewish | Iran: Shiraz, Tehran |
2. Ashkenazi | Eastern Europe: Warsaw, Budapest, Lodz, Lviv |
3. Caucasus Jewish | Caucasus: Chechnya, Dagestan, Ingushetia, Azerbaijan |
4. Northern-Central Mesopotamia Jewish | Iraq: Kurdistan, Baghdad |
5. Eastern Maghrebi Jewish (Moroccan, Algerian) | Eastern Maghreb: Morocco, Algeria |
6. Western Maghrebi Jewish (Tunisian, Libyan) | Western Maghreb: Tunisia, Libya |
7. Sephardic Jewish (Turkey) | Turkey: Istanbul, Izmir |
8. Northern Levantine Jewish | Syria: Damascus. Lebanon: Beirut. |
9. Ethiopian Jewish | Ethiopia: Amhara Region, Tigray Region. |
10. Yemeni Jewish | Yemen: Hadhramaut, Aden, Habban |
11. Indian Jewish | India: Kerala, Mumbai |
Central Asian Genetic Test: | |
1. Hazara | Afghanistan: Hazarajat |
2. Kyrgyz | Kyrgyzstan: Osh, Choy |
3. Kazakh | Kazakhstan: Almati, Jambil, Jetisu |
4. Uzbek (East, West) | Uzbekistan: Syrdarya, Samarkhand, Kashkadarva |
5. Uyghur (West, East, Southwest) | China: Hotan, Aksu |
6. Pashtun (Afghanistan, Pakistan) | Afghanistan: Ghazni, Partika, Zabol, Nangarhar |
7. Turkmen (Southwest, West, Southeast) | Turkmenistan: Bagtyyarlyk Etraby, Buzmeyin Etraby, Kopetdag Etraby |
8. Iranian (Northeast, East, Southeast, Central) | Iran: Mazandaran, Qom, Semnan, Qazvin, Markazi, Gilan |
9. Mongolian | Mongolia: Tov, Selenge. |