Frequently Asked Questions

1. What does our site do?

HeredityLab is geared towards clarifying the ancestral heritage of users using genetics in conjunction with methodologies found in data science. We are based in the United States, and consist of a team of team of two engineers with a goal of bringing you results that provide insight into your genetic past. Users upload their raw DNA file to our platform, and we provide results after our systems process the file and compare the DNA to reference populations in our database through a series of algorithmic computations. 

To form the data set for our reference panel, we have used data from multiple sources. These primarily include data obtained through a public outreach program to expand the reference panel to include underrepresented ethnic groups, along with publicly available data sets such as the Human Genome Diversity Project, the Population Reference Sample data set, the 1000 Genomes Project, the Pan Asian SNP Consortium, the Indonesian Genome Diversity Project, the European Genome Phenome Archive, the Estonian Biocenter Data Archive, and the International HapMap project. 

2. How does this site work?

The user downloads his/her raw DNA data file (can be in text form or as a .zip file) from a mainstream DNA company like 23andMe or AncestryDNA. The user then uploads the raw DNA data file to our site using the designated file upload platform. We place the file into our queue to be processed, and our system runs an analysis on the user’s DNA file to determine the likely ethnic composition. Following this, the user receives an email after a set amount of time which indicates that the results are ready. The email contains a link for payment (all payments are processed via Stripe). The user then receives the results to his/her email in the form of an informative .pdf report. 

3. What is our methodology?

See the Methodology section for more details. Using a series of algorithms, we match segments of a customer’s DNA with similar segments in our reference panel to infer approximate ethnic percentages.

4. What ethnic groups are covered by your reference panel?

The following ethnic groups are present within the reference panel for the General Genetic Test:

Europe:
German, Dutch, French (North), French (South), French, Swiss, Breton,  Frisian, Cypriot, Albanian, Sicilian, Greek, Serbian, Slovenian, Italian (North),  Italian (South), Croatian, Maltese, Sardinian, Sicilian, Bosniak, Corsican, Irish, Welsh, English, Scottish, Swedish, Finnish, Danish, Norwegian, Sami, Latvian, Lithuanian, Estonian, Hungarian, Ashkenazi Jewish, Slovak, Czech, Austrian, Portuguese, Spanish, Basque, Belarusian, Romanian, Bulgarian, Ukrainian, Russian, Polish, Lipka Tatar, Romanian Tatar

Caucasus:
Georgian, Armenian, Azeri, Dargin, Adyghe, Ossetian, Kabardian, Abazin, Tatar

West Africa:
Sierra Leone, Liberia, Guinea, Kru, Ewe, Wolof, Sahelian, Yoruba, Mandinka, Bambara, Fon, Abron, Akan, Igbo, Nigerian (Central, East), Fulani (Nigeria, Guinea), Mende, Angolan (North, South), Burkina Faso.

Southern Africa:
Kalahari Forager (Khoekhoen, San), Botswana (Tswana people), South Africa (Northeast, Southeast), Zimbabwe (Shona people)

East Africa:
Kalenjin, Kikuyu, Somali (North, South), Luhya, Sudan (North, Central, South), South Sudan, Maasai, Luo, Merina, Makua, 
Wolayta, Amhara, Oromo, Tigray, Sidama, Afar, Hadiya

Central Africa:
Central African Forager (Mbuti, Biaka), Cameroon (Fang, Bamoun),  Congolese West (Anamongo, EsiKongo), Congolese East (Baluba), Central African Republic

Southeast Asia: Filipino (North, South), Indonesian (East, West), Malaysian, Thai, Cambodian (Khmer), Bamar, She, Miao, Filipino, Dayak, Batak, Nyishi, Tangsa, Yi, Lahu, Mon, Vietnamese, Assamese (Bodo Kachari, Rajbongshi).

South Asia:
Koli, Bengali, Marathi, Baloch, Uttar Pradesh, Odisha, Karnataka, Pathan, Sindhi, Gujarati, Punjabi, Rajasthani, Malayali, Telugu, Sinhalese, Bhojpuri, Odisha, Nepalese, Punjabi, Makrani, Kho, Bhil, Kannada, Tamil.

The Americas:
Inuit, Indigenous North, Indigenous Central Plains, Indigenous Southwest US, Mexican Indigenous North, Mexican Indigenous South, Indigenous Central America, Indigenous Caribbean (Taino), Indigenous South America (Northwest, Andes, Southern Cone,  Amazonian).

Oceania:
Polynesian, Maori, Melanesian, Polynesian, Aboriginal Australian, Hawaiian, Moluccan, Tahitian, Tonga and Samoa.

East Asia and Siberia:
Evenk, Orok, Chinese (North, Central, South), Han, Nashi, Japanese, Yi, She, Korean, Manchu, Tujia, She, Zhuang, Ryukyuan
, Nenet

Mideast: Arabian Peninsula (Eastern), Arabian Peninsula (Western), Turkish (West, East, Central, Southeast, Southwest, Northwest), Druze, Lebanese, Egyptian (North, South), Jordanian, Yemeni (Western, Eastern), Syrian (West), Assyrian, Palestinian, Bedouin, Iranian (West, Central, East), Kurdish (West, East),  Iraqi (Central, South), Assyrian, Sephardic Jewish, Palestinian Christian, Lebanese Christian, Egyptian Coptic.

North Africa: Algerian, Tunisian, Moroccan (North, South), Libyan (East, West), Berber (Algeria)

The Jewish Diaspora Genetic Test focuses on these Jewish genetic groups:
Persian Jewish, Ashkenazi, Caucasus Jewish, Northern-Central Mesopotamia Jewish, Eastern Maghrebi Jewish (Moroccan, Algerian), Western Maghrebi Jewish (Tunisian, Libyan), Sephardic (Turkey) Jewish, Northern Levantine Jewish, Ethiopian Jewish, Yemeni Jewish, Indian Jewish.

The Central Asian Genetic Test focuses on these groups:
Hazara, Kyrgyz, Kazakh, Uzbek (East, West), Uyghur (West, East, Southwest), Pashtun (Afghanistan, Pakistan), Turkmen (Southwest, West, Southeast), Iranian (Northeast, East, Southeast, Central), Mongolian. 

5. What archaic hominin ancestry do you test for?
Our platform assesses your DNA file for Neanderthal and Denisovan ancestry, with the Denisovan ancestry being divided further into three subgroups (D0, D1, D2). A section is also added for African Ghost Hominin ancestry, but note that the existence of these this populations is speculative.

6. Will the results come out differently if I test with AncestryDNA versus 23andMe or MyHeritage?

Due to differences in the SNPs tested when comparing these three companies, there is a possibility that results will turn out differently depending on the company.

7. Is DNA evenly inherited? Will an uneven inheritance of DNA segments affect my results?

No, DNA isn’t evenly inherited, and DNA is passed down randomly. In practice, this means that your parents will have DNA segments corresponding to ethnic groups which may not be passed down to you, or passed down in amounts that may not be detectable by our system. In the case of inheriting DNA from your grandparents, you will most likely not receive an even 25% from each grandparent. This uneven inheritance of DNA segments may cause our algorithms to interpret the DNA differently, thus resulting in variations of the regions assigned to you in comparison to a close relative such as your sibling.

8. Why was I assigned the wrong ethnicity?

Unfortunately, ethnicity estimation isn’t an exact science. Segments of your DNA are matched to reference populations in our reference panel using algorithms, and at times the nature of the segments themselves may resemble those of neighboring populations. Some ethnic groups have a higher degree of genetic isolation than others, and these are generally easier to pinpoint. However, due to historical population intermarriages and migrations, there is a tremendous degree of variability within ethnic groups that oftentimes results in variability within ethnic estimations. To compound this, DNA isn’t inherited evenly, so situations may arise where combinations of DNA segments may resemble ethnic groups which are different than the ethnic groups they actually correspond to.

9. How far back does your DNA test go?

Our test can reasonably discern ancestral ethnic groups up to the level of grandparents to great grandparents. Beyond that, there is a high degree of variability in what can be detected since some smaller DNA segments may look more similar to other ethnic groups than the ones they originate from, thus a degree of inaccuracy is introduced into the picture the further back you look.

10. How do I get my DNA file from my provider? What format do you support?

Each DNA company generally has the instructions for download readily available on their site. To use our site, follow the instructions to download the raw DNA file from your provider, which is generally in Build 37 format as .zip file (or .txt file).

23andMe: https://customercare.23andme.com/hc/en-us/articles/212196868-Accessing-Your-Raw-Genetic-Data

AncestryDNA: https://support.ancestry.com/s/article/Downloading-DNA-Data?language=en_US

MyHeritage DNA: https://www.myheritage.com/help-center?a=How-do-I-download-my-raw-DNA-data-file-from-MyHeritage—id–TcyMs9SCRpe9xXSv1UGPHg

Family Tree DNA: https://help.familytreedna.com/hc/en-us/articles/4415184836367-Downloading-Family-Finder-Data-

11. Do you allow refunds?

We generally do not allow refunds unless our system is somehow unable to process your file. Please contact us via contact@hereditylab.com email for any additional questions.  

12. How is our data protected? What is your privacy policy?

Please see our page labeled Privacy Policy / Terms and Conditions for further details. In summary, we do our best to uphold the safety and security of your data. We collect the minimum information needed (name, email address, DNA file), and delete your DNA file within 7 days after it is processed. All payments are handled via Stripe, hence our platform does not have access to any further customer data.  Also to note, we don’t sell data to third parties, and our HeredityLab service isn’t affiliated with any third party companies.

13. How do we know if your service is up and running?

We post periodic updates to the page labeled “Status” in the event that there are any service outages or related issues.

14. Where is your company based? Who is behind your company and why did you start this company?

We are based in the United States. Our company consists of two colleagues working in the field of engineering. We started this site since we both have a passion for genetic genealogy, and wanted to apply our knowledge of data science to help people discern further information on their genetic history. We are both full time engineers, and due to our existing commitments, we run this site on a part time basis. Our goals for this site are strictly limited to giving customers more details on genetic ancestry, and we don’t have any plans to expand the scope of the company beyond that stated goal. Although we work together at an engineering firm, the site itself is cloud based and doesn’t have a dedicated office. If the customer base expands beyond what we can manage, we will hire more people if necessary.

15. What does the reference panel consist of?

Our reference panel consists of a collection of both publicly available samples and private DNA samples which we meticulously collected over a period of two years. The reference panel is inclusive of most main ethnic groups and our platform has the ability to reliably discern between them.

16. Why do you offer three different types of ancestry tests?

The general genetic test is designed for all groups. The Jewish Diaspora and Central Asian genetic tests are designed to give those particular groups more genetic insight that includes percentages of more ethnic groups that are specific to their populations. The issue encountered on most DNA tests, including our general DNA test, is that these groups tend to have varying degrees of overlap with neighboring regions and thus the results don’t come out as specific as users may intend. For example, someone who is Hazara on a DNA test (including our general one) may see their results including components like Pashtun, Mongolian, Chinese, Siberian, and South Asian DNA in varying quantities; whereas the Central Asian test will accurately assign them as 100% Hazara or something like 80% Hazara, 10% Pashtun, 5% Kyrgyz, 5% Tajik.

17. Why is there a section on your DNA test which displays ethnicities not factored into the DNA test?

On many DNA tests, assigned ethnic groups already lump together any background ethnic contributions with each ethnicity label. For example, Portuguese DNA contains background genetic contribution from North Africa, but this background contribution is only displayed if it is above a certain threshold. This “ethnicities not factored in” section on our report displays ethnic groups that the algorithm isn’t 100% certain of assigning to the main ethnic estimates section.

18. Your test lists Denisovan and Neanderthal ancestry, tell me more about this?

Archaic admixture is present at varying trace levels throughout most of the world’s populations. Over the years, studies have assessed and reassessed the levels of archaic admixture in humans, and the exact amounts vary depending on the model used to identify genetic segments with archaic origins.

When it comes to Neanderthal DNA, all humans with trace Neanderthal DNA share a closer affinity to the Neanderthal genetic group the Vindija Neanderthal belonged to, relative to that of the Altai Neanderthal.

For Denisovan admixture, it has been found that there are likely three Denisovan lineages present in groups of modern humans, termed D0, D1, and D2. It appears that D1 and D2 split from D0 (the Altai Denisovan) around 283,000 years ago and 363,000 years ago, respectively. D2 appears to be divergent to the extent that it may represent a third lineage separate from both Neanderthals and Denisovans. It has been found that Papuans have segments from both D1 and D2, while East Asians and Siberians have ancestry from D0 and D2, with other mainland Asian groups having trace amounts of D0 and D2 as well. To complicate matters further, research indicates some West African populations may have trace ancestry from an unknown archaic hominin population, termed the “Ghost African Hominin” (2).

Within your report, there are estimates for these populations. Please note that these estimates are speculative, since assignment of archaic hominin ancestry is dependent on computational models based on the methodologies used in these studies and is naturally subject to change as new information is found and model parameters are modified.

(1) https://www.cell.com/cell/fulltext/S0092-8674(19)30218-1#%20
(2) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015685/

19. What is the number of samples per group in your reference panel?

The number of samples is highly variable. For samples obtained via public data sets, the number is much higher than the number of samples for groups obtained through our outreach program. The samples for groups obtained through our outreach program tend to be underrepresented in public data sets. In order to accurately match segments of your genome to ethnic groups within the reference panel, greater sample numbers are normally necessary. Thus, the accuracy of our algorithms drops when taking into account segments that are matched with groups in the reference panel with a lower amount of samples. The exact numbers can be found here: Reference Panel. Note that the exact numbers are subject to change since the panel will be modified depending on several factors which include availability of new samples and the results of testing to determine if the matching for some sample sets are less accurate than others.  

20.  How does your model work?

In order to gain more insight into the ancestral backgrounds of global populations, a substantial amount of data is necessary. Data has been assembled from a wide variety of public sources, along with a long running public outreach program to collect information pertaining to under represented groups. The creation and calibration of the reference panel took place over a period of 36 months, and data sets were pruned by removing a large amount of individuals that had a substantial amount of genetic overlap with neighboring populations.  The individuals remaining within the reference panel have a long history in the particular regions they represent, on the level of grandparents to great grandparents. The method to determine this was self identification, and then this was checked by comparing their genome with other known samples from the region.

For some of the more under represented groups, it was a challenge to obtain representative data sets. In particular, some groups such as the Taino and other Indigenous ethnic groups no longer have extant populations available for reference; hence genetic segments pertaining to these groups were isolated and used as representative data sets within the reference panel.  To facilitate higher degrees of accuracy, we completed a thorough degree of testing on these data sets. 

As for the architecture of the project, the bulk of the pipeline for the ancestral analysis portion of our processing is done on Amazon Web Services spot instances. Genomes are processed in a parallel fashion, and the results are stored on Amazon Web Services S3 Storage. To double check the results, we use a separate server to repeat the process. The process itself is costly, and we are in the process of creating an automated system of bidding to bid for spot instances to maximize process efficiency. 

The process has been very resource intensive, and we have invested a high degree of effort into focusing on the quality of the results. The present step represents a ‘soft launch’ to test the process on a small user base to iron out technical problems before we scale up to a larger user base.

21. Some of your reference groups have the label “Genetic Isolate” (GI). What does this indicate?

For several populations that have remained mostly endogamous over the last ~1,000 years (more or less in some cases) and received minimal gene flow from other populations, relative to their neighboring population groups, we have labeled these groups a “genetic isolate”.

We have done this to emphasize the fact that they represent an older genetic profile which is relatively close to the genetic profiles of individuals that lived in their respective home regions ~1,000 years ago, more or less depending on the case. These include the following: Egyptian GI, Palestinian GI, Iranian GI, Lebanese GI, and Pakistani GI.

The Egyptian GI consists of Coptic Christian Egyptians and Sudanese Coptic Christians that are descended from Egyptian Coptic Christians.

The Palestinian GI consists of a wider series of groups. This primarily includes samples of Samaritans, Palestinian Muslims, and Palestinian Christians that cluster closely together. Many of these samples of the latter two are from Nablus, which has a large population of the descendants of Samaritans that converted to the other two groups over the generations but still retained the same genetic profile.

The Iranian GI consists of Iranian Zoroastrians, a group that has retained a relatively isolated genetic profile since the time range between approximately 570 BCE – 746 CE. [1] 

The Lebanese GI consists of Lebanese Christians and Lebanese Shias (samples from two rural Shia villages) which cluster closely together. The Lebanese Druze also represent a genetic isolate, but these aren’t included within this group and have their own category. 

The Pakistani GI consists of samples from the Kalash tribe, which has undergone a heavy amount of genetic isolation from larger populations in the region, with one study estimating that the duration of relative genetic isolation to be up to 8,000+ years. [2] 

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5590844/
[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570283/

22. For the Jewish Diaspora Genetic Test, what populations are included?

This genetic test includes the following groups: Persian Jewish, Ashkenazi, Caucasus Jewish, Northern-Central Mesopotamia Jewish, Eastern Maghrebi Jewish (Moroccan, Algerian), Western Maghrebi Jewish (Tunisian, Libyan), Sephardic (Turkey) Jewish, Northern Levantine Jewish, Ethiopian Jewish, Yemeni Jewish, Indian Jewish.

Some of the reference data for these categories samples data from different groups that roughly cluster together. Due to historic migrations and mixing over the last few generations there is a substantial degree of overlap, but these categories are our best approximations at genetic clusters of these various groups. The Caucasus Jewish category samples include Georgian Jews, Mountain Jews, and Azerbaijani Jews. The Northern-Central Mesopotamia Jewish category samples Iraqi Jews and Kurdish Jews. The Sephardic Jewish group samples Sephardic Jews mostly from Turkey whose origins are mostly from the Jewish communities expelled from Iberia in the late 1400s.  The Northern Levantine Jewish category is composed mostly of Syrian Jews, whose origins are a mix of the original Levantine Jewish communities plus Mesopotamian Jewish and Sephardic Jewish ancestry. The Indian Jewish community is composed of samples from both the Cochin Jewish community and the Bene Israel, both of whom have ancestry that is a combination of Middle Eastern Jewish and South Asian origins.