This article will explain the main differences between types of genetic testing.
What Is the Genome?
Imagine you’re studying a massive book with 3 billion letters, where each letter represents a DNA base. This is your genome, and there are several ways we can read it. Let’s explore each method and understand when to use them.
Genotyping: The Basic Approach
Think of genotyping as checking specific letters at predetermined positions in that book. Typical Genotyping test checks for a bit less than 1 million of positions only, which is 0.033% of the human genome.
For example, if we know that position 1,000,000 can have either an ‘A’ or a ‘G’, and this variation is associated with lactose intolerance, genotyping will tell us which letter you have at that position.
Direct-to-consumer (D2C) companies often use a pre-set list of genetic variants for their tests. This list is placed onto a chip that is mass-produced and sent to many labs.
When you get your results, they come as a text file. This file shows which specific DNA positions were tested and what nucleotides were found at those spots. However, it’s important to note that these results do not include any assessment of how accurate or reliable the detection is.
Key characteristics:
- Examines about 1 million specific positions (0.033% of DNA)
- Extremely high rate of false positive errors (even as high as 40%)
- Costs around $50-100
- Takes 2-4 weeks to get results
- Popular through companies like 23andMe and AncestryDNA
While genotyping can provide basic insights about certain genetic traits, its ability to predict disease risk is limited. For complex conditions like Type 2 Diabetes, genotyping only captures a tiny fraction of potential genetic factors. However, it can be useful for specific medication responses where single genetic variants have strong effects, like warfarin sensitivity.
Whole Exome Sequencing (WES): Focus on Protein-Coding Regions
Imagine our genome book has 20,000 special chapters (genes) that contain instructions for making proteins. WES reads only these chapters in detail. This is particularly useful because about 85% of known disease-causing mutations occur in these regions.
Key characteristics:
- Reads about 1.5% of the genome (30 million bases)
- Costs $300-1500
- Results are available in 4-8 weeks
- Predominantly used in clinical settings and infrequently in direct-to-consumer contexts
Example: If a child has developmental delays and unusual facial features, WES might identify a mutation in a gene like FOXP1, which could explain these symptoms and guide treatment decisions.
WES is a much shorter version of WGS that uses the same technology, but since the files are much smaller, it’s much easier for labs and users to handle the raw data.
Raw data is usually presented as VCF files, which contain a list of detected mutations. Labs often share FASTQ files as well—these files are what the sequencer machine produces. Sometimes you can get a BAM or CRAM file—this is the digitized DNA aligned to a version of the reference genome.
Whole Genome Sequencing (WGS): The Complete Picture
WGS is like reading every single letter in the genome book, including parts between genes that control when and how genes are activated.
Key characteristics:
- Reads all 3 billion base pairs
- Costs $300-3000
- Takes 4-12 weeks for results
- Provides the most comprehensive genetic information
- Available both from clinical labs and as direct-to-consumer service.
Example: In cancer treatment, WGS can identify not just mutations in genes, but also complex structural changes that might affect treatment response.
It’s common to assume that WGS covers the entire genome, but in practice, the commonly used sequencing technology (short-read sequencing) has limitations and struggles to digitize some complex areas of DNA. Additionally, the quality or precision of sequencing can vary within the same genome file.
Limitations of Genetic Testing
Genotyping Limitations
Imagine trying to understand a story by reading only certain predetermined words. You might miss crucial plot points.
Real-world example: A patient with a rare form of anemia might test negative on a genotyping panel because their specific mutation isn’t one of the predetermined positions being checked. With access to only 0.033% of the genome, this limitation is a critical blocker for using genotyping in health research.
WGS and WES Technical Limitations
Unphased Data: Imagine having two similar copies of a book (one from each parent), but the pages get mixed up. You can see all the differences but can’t tell which version came from which parent. In most cases, WGS/WES data doesn’t provide information that helps to distinguish which variants come from one parent or the other.
Quality Issues: Reading some parts of the genome are like trying to read smudged text – they’re harder to sequence accurately. This often happens in regions with repeated sequences.
Interpretation Challenges: Finding a mutation is like finding an unusual word in the book – we might see that it is different, but we don’t always know if this change is important or harmful.
Evaluating Genetic Variants: From Discovery to Clinical Significance
After identifying genetic variants through whole genome or exome sequencing, practitioners can use several key tools to assess their clinical impact:
Clinical Variant Database
ClinVar serves as the primary public repository where geneticists document variants and their known effects on health. Like any collaborative database, its value depends on the quality and quantity of submitted evidence.
Predictive Assessment Tools
REVEL (Rare Exome Variant Ensemble Learner)
REVEL is tailored for assessing rare missense variants by integrating various predictive algorithms to produce scores ranging from 0 to 1. A higher score suggests a greater chance that a variant affects protein function. This tool is particularly useful for examining newly identified variants.
The pathogenicity probabilities for these rare variants were calculated by researchers in 2016, and the dataset has not been updated since.
CADD (Combined Annotation Dependent Depletion)
CADD provides comprehensive variant assessment across the entire genome, acting as a “damage score” for DNA changes. Scaled scores above 20 indicate variants in the top 1% most likely to cause disruption. Unlike REVEL, CADD can evaluate both coding and non-coding regions.
Conservation Analysis (PhyloP, PhastCons)
These tools examine how well-preserved a DNA position remains across different species. Highly conserved regions typically serve critical biological functions – much like finding words that have remained unchanged across languages for millennia.
Understanding Tool Limitations
While these tools provide valuable insights, they have important limitations. No single tool offers perfect predictions, and different tools may provide conflicting assessments. For example, a variant might receive a high CADD score suggesting harmful effects, while conservation scores indicate tolerance. In such cases, geneticists must integrate multiple lines of evidence, including clinical observations and functional studies, to determine true clinical significance.
The most reliable variant assessments combine computational predictions with real-world clinical evidence and laboratory studies of variant effects. This comprehensive approach helps ensure more accurate interpretation of genetic findings for patient care.
When Genotyping Falls Short
Consider a patient with unexplained seizures. Genotyping might miss rare mutations because it only looks at common variants. This is why genotyping isn’t suitable for diagnosing rare conditions. Below are some situations where genotyping may or may not help.
Can Help:
- Identifying genetic causes of rare diseases – didn’t we just say that it only tested common variants and thus may not work for rare variants/rare diseases?
- Understanding complex developmental disorders
- Guiding cancer treatment decisions
- Detecting structural variations in DNA
Cannot Help:
- Conditions primarily caused by environmental factors – I would add a few sentences about how large environmental contributions are to disease compared to genetics. This could lead to a second paper exploring environment vs genetics.
- Predicting exact disease outcomes
- Understanding conditions where multiple genes and environmental factors interact in complex ways
Example: A patient with autism might undergo WGS, but even if we find genetic variations, we cannot predict exactly how they will affect the patient’s development or what interventions will work best.
Choosing the Right Test
I would not rely on genotyping for my health investigations because of its limited scope and the risk of missing crucial genetic information. For my own genetic analysis, I strongly prefer Whole Genome Sequencing (WGS), as it gives me a complete picture of my entire genetic makeup.
WGS examines all 3 billion base pairs in the human genome, offering a complete picture that can uncover rare mutations, structural variations, and other genetic factors that genotyping might overlook. This thorough approach is essential for accurate diagnosis and personalized treatment plans.
Tools for Citizen Scientists
Several online platforms allow individuals to analyze their genetic data:
- Panels for metabolic pathways of vitamins and minerals
- Automatic search for potentially risky variants
- Simple access to all genes
- Supports WES/WGS raw data in the form of VCF files.
- Scope of analysis is limited to ClinVar records
- Basic methylation and detox reports
- Free service
- Supports Raw data from Genotyping and from WES/WGS as VCF files.
- Desktop software for variant analysis
- Supports VCF files from WGS/WES
- Technical interface requiring genetics knowledge
Summary
As a developer who works with genetic data analysis tools, I’ve come to understand both the capabilities and constraints of different testing methods. While genotyping is accessible and popular, its coverage of just 0.033% of the genome and high risks for false-positive results suggests caution when using it for health investigations.
Whole Exome Sequencing offers a middle ground by focusing on protein-coding regions where many disease-causing mutations occur, and Whole Genome Sequencing provides the most comprehensive genetic analysis available today.
We Need Your Help
More people than ever are reading Hormones Matter, a testament to the need for independent voices in health and medicine. We are not funded and accept limited advertising. Unlike many health sites, we don’t force you to purchase a subscription. We believe health information should be open to all. If you read Hormones Matter, like it, please help support it. Contribute now.
Yes, I would like to support Hormones Matter.
Photo by National Cancer Institute on Unsplash.
Thanks so much for writing this Sergey.
I’d like to remind everybody, that the big databases are constantly changed by scientists who may not have very much experience with very sick people..
Since the time I first got whole genome sequencing on my fgamily, I’ve had three variants in whole genome sequencing, but the snps were re-classified later so they went from pathogenic to being of unknown significance.
This information iis in constant flux.
I will say also that we don’t have very much information about intronic DNA, which is more likely where the instruction set exists for how to use the exome but we certainly don’t know enough about that yet.
I really like getting information from genome wide association, studies, which look at the probabilities of whether a particular snp is associated with a particular condition.
It is likely that data from your genes that you get today will be a lot easier to interpret in a few years because there is so much effort being done to understand DNA.
we certainly appreciate your efforts, Sergey, and especially you’re willingness to share what you know.