Author + information
- Received November 12, 2012
- Revision received January 29, 2013
- Accepted February 19, 2013
- Published online May 21, 2013.
- ↵⁎Reprint requests and correspondence:
Dr. Robert Roberts, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, Ontario K1Y 4W7, Canada
A paradigm shift toward biology occurred in the 1990s and was subsequently catalyzed by the sequencing of the human genome in 2000. The cost of deoxyribonucleic acid (DNA) sequencing has gone from millions to thousands of dollars with sequencing of one's entire genome costing only $1,000. Rapid DNA sequencing is being embraced for single gene disorders, particularly for sporadic cases and those from small families. Transmission of lethal genes such as associated with Huntington's disease can, through in vitro fertilization, avoid passing it on to one's offspring. DNA sequencing will meet the challenge of elucidating the genetic predisposition for common polygenic diseases, especially in determining the function of the novel common genetic risk variants and identifying the rare variants, which may also partially ascertain the source of the missing heritability. The challenge for DNA sequencing remains great, despite human genome sequences being 99.5% identical, the 3 million single nucleotide polymorphisms responsible for most of the unique features add up to 40 to 60 new mutations per person which, for 7 billion people, is 300 to 400 billion mutations. It is claimed that DNA sequencing has increased 10,000-fold while information storage and retrieval only 16-fold. The physician and health user will be challenged by the convergence of 2 major trends, whole genome sequencing, and the storage/retrieval and integration of the data.
Captain Cook wrote in his log upon reaching Australia that “I have not only travelled farther than any other man, but I have travelled as far as man can travel” (1). Thus, by the 18th century, all the continents had now been discovered and named. It appeared logical and perhaps appropriate for mankind to pursue the inner treasures of the planet. This coincided with the industrial revolution that led to the harnessing of energy from coal, electricity, and oil as well as the discovery of all the marvelous elements including uranium, which enabled many human endeavors, from cancer therapy to the invention of the atomic bomb. While this trend continues, in the 1990s a major worldwide shift occurred in which mankind became interested in the inner workings of human biology. The word “biology” is today often associated with excitement and activity, not just in science but also in medicine and commerce. This revolutionary concept received a major boost with the sequencing of the human genome in 2000 (2). In fact, sequencing of the human genome may be to the 21st century as invention of the vowels and development of democracy was to the 6th century BC or the industrial revolution was to the 18th century.
The Human Genome: New Developments
The double stranded human genome of each cell contains 6.4 billion nucleotides. While proteins are the molecules that do the work, only about 1% of the human genome sequences are designated to encode messenger ribonucleic acids (RNAs)for protein coding (3). Until recently, most of deoxyribonucleic acid (DNA) was considered junk (3), but we now know that virtually all of DNA is transcribed into RNA (3). The ENCODE (Encyclopedia of DNA Elements) project has enabled us to assign biochemical functions for 80% of the genome (4). It is of note that only a small proportion of the transcribed RNAs are translated into protein with the remainder performing a host of functions, affecting those sequences (genes) that encode for protein. These RNAs that do not code for protein are as a group referred to as noncoding RNA. Most genes coding for protein are in some way regulated by these noncoding RNAs (5). These noncoding RNAs are very promiscuous—each RNA can affect multiple different genes on the same or different chromosomes.
The Source of Human Genetic Biodiversity
All genomes from all species share most of their DNA sequences, having acquired them over a 3.8-billion-year evolutionary history since the origin of life. Despite the common sequence ancestry, each individual genome within each species has maintained itself as unique. The development of biodiversity and unique sequences of each genome whether within or between species is due primarily to the errors in the process of copying DNA. Copying errors during the replication of one's DNA induce primarily single base changes through substitution of a single base (nucleotide) for another (e.g., thymine for adenine). These substitutions are passed on from generation to generation and are referred to as single nucleotide polymorphisms (SNPs). These SNP substitutions account for 94% of the errors from copying or replicating DNA, while deletions of 1 to 4 bp account for 4.5%, and the remainder are due to insertions of 1 to 4 bp (6,7). Other types of DNA variation exist such as chromosomal rearrangements, duplications (copy number variants), and translocations. The mutations induced by DNA copying errors, if beneficial, are conserved and their frequency increases, while deleterious mutations remain rare or are eliminated. Fortunately, many of these SNPs have modest to minimal effects or are neutral. The human DNA (6 billion bases) replicates itself every few days, and although it only makes 1 error per 1 billion bases created, it can accumulate a significant number of mutations over generations. Kruglyayk and Nickerson (8) estimated with a mutation rate of 2 × 10−8 per base pair per generation and a human genome of over 3 billion base pairs, each genome carries 60 new mutations per generation. Sun et al. (9) estimated a mutation rate of 1.4 × 10−8 which would give a mutation rate of about 40 new mutations per generation. The world population of 7 billion has about 300 to 400 billion new mutations in the current generation. The genetic diversity of mankind is exemplified by the observation that the exons (protein coding regions) of each individual genome, referred to as the exome, encompasses ∼13,000 nonsynonymous and ∼7,000 potentially functional variants, posing considerable challenges in identification of disease causing DNA sequence variants (DSVs) (10,11). Despite the sequence of the human genome being 99.5% identical, the remaining 0.5% is more than adequate to provide each of us a unique genome that until sequenced will have many hidden surprises. Current knowledge indicates there are 3 million SNPs per genome, which account for over 80% of human phenotype variation, whether it is the color of one's eyes or the susceptibility to disease (12).
The Search for Disease Related Genes
A major goal is to identify DNA regions that predispose or cause cardiovascular disease. This refers to the ongoing studies that correlate physical or biochemical features (phenotype) to that of the genotype. Defining the phenotype precisely is fundamental to the discovery of the associated or causal genotype. The role of the clinician in detecting the phenotype has been crucial to this pursuit and will continue to be even more so as we further refine and specify subphenotypes. DNA can be obtained from the blood, other body fluids such as saliva, or body tissue. The approach to identify the causal genes and variants has evolved dramatically over the past 3 decades. The conventional approach of genetic linkage analysis in large families, which was very successful in linking causal DNA mutations to rare single gene disorders, has all but been replaced with the newer approaches of genome-wide association studies (GWAS) and next generation DNA sequencing (NGS) in small families and individual cases. The newer approaches not only have partially overcome a major limitation of genetic linkage in identifying the causal variant in small size families but also have afforded the opportunity to identify the causal alleles in sporadic cases with single gene diseases and the susceptibility (risk) alleles in those with the complex phenotypes.
Single Gene Disorders: The Success of Genetic Linkage Analysis
Single gene disorders are the phenotypic consequences of rare DSVs that impart large effect sizes. The mutation is both necessary and sufficient to induce the disease. Familial hypertrophic cardiomyopathy was the first cardiovascular single gene disorder for which the responsible mutation was discovered. The responsible mutation was a missense mutation in the gene that encodes the beta-cardiac myosin heavy chain (13). Introducing the human mutant gene as a transgene induced the disease in both the mouse (14) and the rabbit (15). While the rare variant is sufficient to cause the disease, there is often variable expressivity (severity of the phenotype), determined by other genetic and nongenetic factors. The conventional approach for mapping the chromosomal location (locus) of the gene responsible for a single gene disorder has been genetic linkage analysis. In this technique, DNA of members of a 2- generation to 3-generation pedigree affected with the disease are genotyped using a few hundred short tandem repeat DNA markers. DNA markers that are inherited more commonly than by chance by the affected members of the family indicate the markers are in close physical proximity to the DNA region containing the responsible gene. Sequencing of candidate genes at the mapped locus usually identifies the causal variant. This approach has been exceedingly successful in mapping the causal genes for various single gene disorders, typically in large and moderate size families. It is estimated there are about 6,000 single gene disorders of which causative genes have been discovered for over 3,500 (16). Accordingly, several dozen genes for hereditary cardiomyopathies, including dilated, hypertrophic, and arrhythmogenic cardiomyopathies; hereditary arrhythmias, such as atrial fibrillation, long QT syndromes, short QT syndromes, and catecholaminergic polymorphic ventricular tachycardia; and cardiac conduction defects have been identified (17). In addition to linkage analysis, the candidate gene approach, guided by the biological and functional similarities between the known causal genes and the candidate gene, has been used to screen and identify new causal genes for single gene disorders. Both approaches are limited by not offering sufficient resolution to identify the causal genes in small families or in sporadic cases.
Single Gene Disorders: DNA Sequencing, a Paradigm Shift
The advent of NGS platforms, has eased 1 of the bottlenecks to complete elucidation of the genetic causes of single gene disorders (10) including those occurring in small families or sporadically and has emerged as the preferred method. The unbiased approach of whole exome sequencing (WES), sequencing all of the exons in the genome, or whole genome sequencing (WGS) enables identification of all DSVs and hence, the opportunity for not only discovering the causal variants but also modifier variants that influence phenotypic expression of the disease.
The NGS technologies are based on parallel sequencing of millions of DNA fragments simultaneously. The sequencing reads are relatively short, typically comprising 35 to 100 bases but could be as long as 1,000 bases, depending on the platform. The reads are aligned with the reference sequence and multiple reads of the same DNA fragments are compared to identify the variants. The existing technologies afford the opportunity to generate up to ∼600 Gbp sequences per run in about 1 to 2 weeks. Given that each genome is ∼3.2 Gbp and each exome is ∼30 Mbp, such platforms afford the opportunity to sequence 1 genome or a dozen or so exomes at a high mean coverage rate (×100). The coverage rate refers to the number of times each DNA fragment is sequenced and mapped to the reference sequence. A new approach to sequencing is being developed based on the nanopore technology, whereby a pore is small enough to enable only a single strand of DNA to pass through it. Detection of the specific nucleotide is based on the changes in conductivity as each specific DNA (or RNA) nucleotide that passes through the pore (18). There is no need for fluorescence or chemicals, hence it should be relatively inexpensive. Oxford nanopore technologies (18) recently announced the generation of a plastic pore with an attached enzyme that pulls the single strand of DNA through at a given speed. It is estimated that 25,000 of these pores would fit into the diameter of a human hair. The simultaneous operation of a large number of nanopores makes it possible to sequence a human genome within hours at <$1,000 per genome. The machine would be a small laptop device and also relatively inexpensive. The company has announced that it will deliver testing machines before the end of 2012 and mass production is expected in the year 2013.
The most commonly used approach is sequencing of the approximately ∼180,000 protein coding exons in the 21,000 genes in the genome, which encompass approximately 30 Mbp of genomic DNA. The approach is referred to as WES as opposed to WGS wherein the entire genome is sequenced. In view of the large number of DNA sequence variants in each exome/genome, skillful interpretation of the genetic data utilizing various bioinformatics and genetic resources as well as exquisite phenotyping are necessary to reduce the number of putative causal variants. WES has other shortcomings including incomplete capture, and inadequate coverage (per read) of all exons as well as incorrect mapping of the reads. In general, approximately 500 of the 21,000 genes may not be correctly sequenced due to inherent errors in WES. For medical sequencing, i.e., genetic testing, all DSVs identified by the NGS platforms should be validated either by repeat independent NGS reactions, Sanger sequencing or at least by genotyping.
The NGS platforms have already been successful for many Mendelian disorders (10) as shown in Table 1. Utilizing this approach, TTN, encoding the giant protein Titin, was identified as a major causal gene for hereditary and sporadic dilated cardiomyopathy (19). While WES and WGS are useful for identification of the causal genes/variants in small families, robust study design is necessary to filter out the large number of variants that typically segregate with the phenotype in small families, which renders identification of the true causal variants challenging. Various study design and approaches have been suggested to strengthen the likelihood of success (10). Various bioinformatics programs, such as PolyPhen2 (20) and SIFT (21) as well as genetic databases, such as National Institutes of Health Heart, Lung, and Blood Institute Exome Sequencing Project and 1,000 Genomes, are available to filter out the DNA sequence variants identified by WES or WGS experiments that would restrict the number of putative candidate causal genes.
Determining the causal mutation in autosomal recessive disorders is facilitated by the fact that the causal mutation must be homozygous to induce the disease as opposed to heterozygous in autosomal dominant disease. In autosomal dominant disease, WES typically leads to identification of several dozen putative candidates that cosegregate with the phenotype in small or medium size families and hence, it is difficult to discern the causal variant. Despite the advantage of ascertaining the significance of polymorphisms within families, there will remain many polymorphisms that cannot be annotated definitively as causative for disease. While techniques such as bioinformatics and filtering mechanisms can reduce the number of putative causal variants, for some it will ultimately require extensive in vitro and in vivo studies to delineate biological and functional significance of these variants. Identification of non-synonymous variants by WES has the advantage of being in a protein coding region, which considerably facilitates functional analysis and the search for a corresponding phenotype. These points have been discussed in greater detail in a recent review by Marian (22).
DNA Sequencing as a Genetic Screen for Single Gene Disorders
Targeted subgenomic sequencing approach as opposed to WES may be used to screen for mutations in the known genes for single gene disorders (30). However, the approach is restricted to the known genes and does not lend itself to identification of the novel genes. It might also be used to identify double or triple causal mutations and as a part of cascade screening of family members. Cascade screening refers to genetic testing of family members of a proband in whom the causal mutation has already been identified. The cascade screening may entail simple genotyping for the presence of the specific mutation, Sanger sequencing, subgenomic sequencing, and even WES. While the latter seems excessive for cascade screening and currently not covered by the insurance companies, it affords the opportunity for identification of potentially additional mutations that might contribute to the phenotype and define the genetic structure of the individual. Technical aspects of WES, as a genetic screening tool in autosomal dominant diseases are similar to those relevant to gene discovery by NGS. Typically, a much higher coverage is demanded for medical sequencing than for gene discovery studies.
Genetics of CAD: An Archetypical Polygenic Disorder
It has been recognized for some time (31) that genetic predisposition to common diseases such as CAD would be due to multiple common genes, each with minimal to modest effect on the phenotype. In polygenic disorders, unlike single gene disorders, one gene is not sufficient or necessary to induce the phenotype (32). Genetic linkage analysis, which utilizes a few hundred DNA markers, lacks the necessary resolution to identify the predisposing genes in polygenic disorders. It was recognized that the case-control association would be the better approach, but would require hundreds of thousands of DNA markers to span the genome, which were not available (33). In 2005, HapMap (34) annotated the chromosomal location of millions of SNPs, which provided the necessary DNA markers to perform GWAS. At the same time, platforms for high-throughput genotyping were developed (32,35) which enabled mapping of the first genetic variant for CAD, 9p21 in 2007 (36,37). This was followed by 1 of the largest collaborative efforts (38) in cardiology involving 2 continents, CARDIoGRAM, with a sample size of 143,000 dedicated to mapping genes for CAD, followed by CARDIoGRAMplusC4D with a sample of 193,000. In just 5 years, 36 genetic variants have been confirmed to be associated with increased risk for CAD (39). Each of these 36 genetic risk variants for CAD was confirmed in populations independent of the discovery population and most recently underwent a meta-analysis in a total sample size of 190,000 (40). Based on this sample size, the chances of even 1 of these loci being false is very unlikely (41). It is important to realize that the DNA risk region is indicated by a SNP. This SNP serves as a marker and in most cases is not the SNP causing the disease risk. Thus, the actual sequence responsible for the risk in most cases is yet to be identified but will be markedly facilitated by the availability of rapid and inexpensive sequencing. Furthermore, most of the SNPs (23 of 36) mediate their risk independent of known risk factors (e.g., hypertension and cholesterol) through mechanisms as yet unknown. Functional analysis of the independent 23 risk variants for CAD is currently being pursued. Functional analysis is confounded by the observation that most of these SNPs are in non–protein-coding regions. Determining the function and identifying the polymorphism will be extremely difficult because the effect of any one risk variant is small and its specific intermediary phenotype that contributes to coronary atherosclerosis or myocardial infarction is unknown. Functional analysis is further confounded by the many contributing components to atherosclerosis such as macrophage formation, plaque rupture, platelet adhesiveness or thrombosis to name just a few. One approach to function is the pursuit of network modeling techniques (42–44) in an attempt to identify DNA, RNA, and protein pathways that involve the DNA region containing the disease associated DNA marker. This is pursued along with conventional analysis of in vitro (cells) and in vivo (animal) expression studies.
The common risk variants for CAD discovered by GWAS have several features in common, as reviewed in detail in a recent review (39): 1) the common genetic risk variants occur frequently, with 10 of the variants occurring in ≥75% of the population and one-half of them in ≥50% of the population; 2) the risk effect per variant is small, averaging a risk increase of about 18%; 3) 10 of the variants act through known conventional risk factors: 7 through cholesterol (SORT1, PCSK9, LPA, ZNF259/APOA5, TRIBI, APOE, ABCGA, LDLR), 2 through hypertension (CYP17A1 and SH2B3), and the ABO locus (9q34) through increased propensity for coronary thrombosis; 4) two-thirds of the genetic risk variants act through mechanisms independent of conventional risk factors; 5) most of the SNPs signaling a risk variant are found in non–protein coding regions; 6) risk is proportional to the total number of risk variants inherited by an individual, rather than a specific risk variant; and 7) in our analysis of 23 risk variants for CAD, we observed that while the maximum present in any 1 individual could be 46, the average was 17 with a maximum observed of 26 and a minimum of 7.
Genetic Risk of CAD and Clinical Application
The clinical application of the genetic risk factors for complex diseases such as CAD is yet to be recommended. One approach is to wait until we have specific therapy related to these genetic risk variants before recommending genetic testing. This is likely to require many years, as drug therapy as a rule could require a minimum of 10 years for development and approval. Another approach would be to incorporate them as risk factors into the current prevention-recommended guidelines. For example, current guidelines, for prevention of CAD recommend lowering low-density lipoprotein cholesterol to 160 mg/dl if one has no conventional risk factors, but if another independent risk factor such as hypertension is present, low-density lipoprotein cholesterol should be decreased to 140 mg/dl. Since genetic risk variants such as 9p21 have been proven to be independent risk factors, it could be incorporated into current guidelines and would lead to more intense treatment of known risk factors such as cholesterol. Currently, the genetic risk variants for CAD are not recommended for routine prevention and treatment of CAD. The independent genetic risk factors imply several mechanisms involved in the pathogenesis of atherosclerosis, which have yet to be discovered. While GWAS has not specifically identified the culprits, the implications for the pathogenesis and biology of atherosclerosis provide tremendous potential for development of new drug targets and innovative therapy.
Missing Heritability: The Need for DNA Sequencing
Despite the many common genetic risk variants for CAD, they only account for a small percentage of the expected heritability (45). It is estimated that about 50% of predisposition for CAD is genetic (31), yet, the 36 risk variants only account for about 10% of the expected heritability. There are several possibilities to account for this discrepancy: rare risk variants (mean allele frequency ≤5%), undiscovered common variants, epistasis (gene-gene interactions), or miscalculations. GWAS has the resolution to detect common SNPs, but not rare SNPs, so they can only be detected by direct DNA sequencing. An ongoing approach instead of WGS is WES. This is a National Institutes of Health Heart, Lung, and Blood Institute–sponsored project “Exome Sequencing Project” for rare variants that involves sequencing about 30 million bases encompassing all 180,000 exons in the 23,000 genes in the genome (30). The initial results confirm the expectation that there are many more rare variants (46) than common variants. Based on a sample of 202 genes in 14,000 Europeans, investigators observed 1 base pair per 21 base pairs had undergone mutation to a rare polymorphism. These variants are very rare (mean allele frequency <1%) with 75% of these rare variants having a frequency of only 1 per 200 to 300 individuals (47).
While sequencing is necessary to detect rare polymorphisms it does not determine their function or whether they are disease related. Once a rare polymorphism or SNP is identified, one must, through case control association studies determine if the SNP is statistically more common in cases than controls. The advantage of functional rare variants associated with disease is that they occur primarily in protein coding regions and are associated with several-fold increased risk (47,48). Rare variants that cause single gene diseases such as hypertrophic cardiomyopathy (3) and Wolff-Parkinson-White syndrome (49) are associated with several fold increased risk and are in themselves potent enough to induce the phenotype as shown in transgenic animals (50). The sample size required for 30 rare risk variants with an average frequency of 1% and power ≥80% is over 6,000 cases and controls, if risk is increased 2-fold. If one is assessing 30 rare risk variants with an average frequency of 0.1%, it would require 60,000 cases and controls.
In determining the biological or pathological function, rare variants have certain advantages over common variants. Common variants occur primarily in non–protein coding regions (51) as opposed to disease related rare variants, which predominantly occur in protein coding regions (47,48). Thus, WES, in which only the protein coding regions are sequenced, is appropriate for rare variants and is much more economical than WGS (52). Because most of the rare variants occur in known proteins, detection of the phenotype will be greatly facilitated with prior knowledge of the protein expressed, whether performed in vitro or in vivo.
It remains to be determined whether rare variants, despite their greater effect contribute significantly to the missing heritability. It is important to emphasize that the frequency of the genetic risk variant has nothing to do with its importance as a therapeutic target. Their importance as a therapeutic target is highly enriched by the greater effect over that of common variants. This is illustrated by the cholesterol receptor that was identified back in the 1970s by Brown and Goldstein (53). This inherited defect referred to as familial hypercholesterolemia only occurs in 1 in 5,000 people, yet this rare disorder was the tipping point to recognize that cholesterol played a major role in precipitating premature CAD in these individuals. This led to the development of statins, which inhibit the synthesis of cholesterol, and, today, statins are the mainstay in the prevention of CAD (54). A more relevant and recent example of the potency of rare variants is the rare polymorphism discovered in PCSK9, which has a frequency of about 1% (55). An antibody to PCSK9 was associated with a 60% further reduction in low-density lipoprotein cholesterol over that of statin therapy (55).
The other possibility is the overly stringent statistical requirement demanded by GWAS of p ≤ 5 × 10−8. There is considerable evidence, as indicated by Visscher et al. (56,57) that common variants of less than GWAS may account for much of the genetic “missing heritability.” In genetics of height, Yang et al. (57) showed that they can account for more than 40% of the expected heritability utilizing less significant common variants. A more recent study by Simonson et al. (58) also indicates that common variants of less than genome-wide significance do account for some of the missing heritability.
The current approach to assessing the total risk effect of common variants is by simply adding their individual effects. A major proportion of this missing heritability may be due to epistasis or gene-to-gene interaction, which is not accounted for in our current calculations (59). In their natural state, genes exert their effect through combined networks rather than as single units and likely have synergistic effects over and above that of their individual effects. As more genes are discovered and their functions elucidated together with their interacting networks, it should be possible to ascertain and confirm the source of the missing heritability. To resolve this issue, it will be necessary to have genome sequencing in massive sample sizes to identify the rare variants and elucidate their function.
Pharmacogenetics is rapidly expanding in defining the relationship of DNA sequence variation and drug response. This has been most notable with 2 drugs commonly prescribed for cardiovascular therapy, clopidogrel and warfarin. Since the establishment of dual antiplatelet therapy as the gold standard therapy following coronary stent placement, clopidogrel has become 1 of the most widely prescribed cardiac drugs. SNPs in the gene encoding cytochrome P450 2C19 have been shown to affect the degree to which clopidogrel attenuates platelet aggregation. In both the PLATO (Platelet Inhibition and Patient Outcomes) trial (60) and the TRITON−TIMI 38 (TRial to assess Improvement in Therapeutic Outcomes by optimizing platelet InhibitioN with prasugrel–Thrombolysis In Myocardial Infarction 38) (61) carriage of CYP P450 2C19 polymorphisms was associated with major adverse cardiac events including the potentially catastrophic outcome of stent thrombosis. It is unclear at this point as to whether tailoring antiplatelet therapy can favorably modify outcomes in those individuals that are carriers. However, it has been demonstrated that identification of carriers by point-of-care testing and tailored prescription of a dual-platelet regimen successfully eliminates high residual platelet activity (62). A randomized study using such technology needs to be executed to determine whether such testing reduces adverse outcomes.
Polymorphisms in cytochrome P450 2C9 and vitamin K epoxide reductase have been shown to modify warfarin response. Several pharmacogenetic models have been developed in order to predict warfarin-dosing requirements. These include CYP P450 2C9 and vitamin K epoxide reductase 1 genotype, smoking status, relevant medications, age, sex, and body mass index (63). The application of these algorithms has been investigated in several prospective studies demonstrating feasibility of this approach. However, only a few were randomized and all suffered from small sample sizes. A recent publication compared standard dosing regimen with 2 genotype-guided algorithms (64). Primary outcomes were percent out of range international normalized ratios and time in therapeutic range at 3 months. The combined genotype-guided prescription cohort demonstrated superior outcomes with respect to both primary endpoints. Moreover, serious events were significantly less frequent in the genotype-guided cohort (4.5% vs. 9.4% of patients; p < 0.001). It should be noted that there was no difference in the primary outcome between the 2 genotype-based algorithms. As a consequence, routine use of such algorithms has not been endorsed in the guidelines.
The primary thrust of pharmacogenomic inquiry has been to define sequence variation that modifies drug efficacy, some work has been done with respect to sequence variation that predisposes to adverse effects. One striking example is the identification of SLCO1B1 polymorphism and HMG CoA reductase inhibitor induced myopathy, where homozygosity confers a relative risk of 16.9 relative to noncarriers (65).
Individual Genome Sequencing: A New Reality
While a draft of the human genome sequence was completed in 2000 (2), the first individual human genome completed in its entirety was that of Venter et al. (66), when sequencing the individual genomes cost millions of dollars. Introduction of the NGS (30) revolutionized the rate and cost of DNA sequencing, with the sequence of the human genome today costing $5,000 and expected to be less than $1,000 within 1 to 2 years. It is estimated that over 30,000 individuals will have had their whole genome sequenced by the end of this year (67). Recent reviews on genome sequencing are listed in Table 2.
What Does It Mean to Have One's Genome Sequenced?
If one simply follows through with parallel sequencing, it will be feasible, inexpensive, rapid and expected to be routine within the next 5 years and maybe sooner if the nanopore approach is robust. What does it mean on the basis of a venipuncture, a buccal smear or a sample of one's human hair to have one's genome completely sequenced? Knowing one's DNA disease risk fragments from such a single measurement is overwhelming considering that these variants will not change in one's lifetime. These DNA risk variants are not influenced by meals, the time of the day, age gender or medications. A permanent record of one's DNA variants can be stored and attached to one's medical record as a permanent unchanging blueprint of the individual's genetic makeup. The National Institutes of Health has already launched a project referred to as “eMERGE” involving 5 medical centers in the United States whereby the individual's DNA sequencing and his or her medical record will be analyzed for genotype correlations (72). This could be the prototype for the future whereby one's buccal smear, blood, or tissue is stored in a biorepository and genetic analysis correlated with the stored electronic phenotypic data. Similar such projects are ongoing for other diseases such as cancer. This information will be routinely available and be part of the hospital record. A couple known to carry a gene for a lethal disease, such as Huntington's disease, can avoid transmitting it to their children through in vitro fertilization selecting their own egg and sperm without the mutation, thus avoiding what might be life-threatening mutations. Having your genome sequenced avoids misinterpretation and immediately determines whether you have one or more mutations proven to be associated with disease.
Despite the utility of the GWAS and NGS platforms in offering robust strategies to elucidate the genetic basis of complex diseases, clinical applications of such discoveries confront a number of challenges. Among them is the daunting task of identifying the true causal allele from the vast number of variants that are present in each genome or exome, including nonsynonymous single-nucleotide variations (nsSNVs) and even insertion/deletion variants. Bioinformatics algorithms might offer information about potential pathogenicity of the variants but such predictions are often discordant across different platforms. Likewise, large-scale high through screening tools to identify the pathogenic variants are currently not available. The focus on identification of the risk or causal variants by NGS is on the rare alleles, which are expected to exert larger effect size that the common alleles. However, a significant number of rare variants also are not expected to be pathogenic. Therefore, a practical approach is to identify the variants that have been already linked to the phenotype. Such variants are typically rare and are often non-sense, missense, or frame-shift mutations that either have been shown to cause cardiovascular pathology or are located in a gene that is known to be a causal gene for a Mendelian disease. Each genome comprises a handful of such variants that might be used for early identification of those at risk. However, whether NGS-based early identification and interventions could influence the outcome in cardiovascular disease is an empiric question remains to be tested.
The Convergence of 2 Technologies: A Challenge for Personalized Medicine
A major challenge to the healthcare policy makers, physicians, care givers, and end users are being created by the convergence of 2 major technologies: cost-effective DNA sequencing of the whole genome and digitization of patient data. The progress of DNA sequencing is said to have improved 10,000-fold in the past 8 years (73), while our ability to store, retrieve, and analyze data has only improved 16-fold (73,74). Some claim that the convergence of these 2 technologies is the tipping point for personalized medicine. It could be costly not to realize we are at the cusp of the new era of personalized medicine. The detailed genome knowledge is rapidly being made available as DNA sequencing is accelerating much faster than our ability to store and analyze the data. Interpreting the data will probably require elucidation of the function of the DNA risk variants. The era of population medicine where “1 drug fits all” will be replaced by medicine based on one's genetic composition, molecular makeup, and how it affects the particular disease phenotype in that individual. Given the etiological and phenotypic complexity of the common cardiovascular disorders and in view the difficulties in identifying the true risk alleles, one has to avoid a cavalier approach in assigning clinical implications to the genetic data. Experienced clinicians with training and expertise in medical genetics and/or in conjunction with medical geneticists should carefully assess the clinical significance of the genetic discoveries. The field is clearly not ready for a direct-to-customer approach, which has the potential to offer false information with considerable medical and psychological implications.
The human genome's effects have hardly been felt by some, but 1 effect is obvious to all of society and was best put by Leroy Hood, one of the pioneers “Revolutions that have been generated by the first draft of the Human Genome Project, have barely been felt, but there is 1 profound change that has already occurred and that is the realization that biology is fundamentally an informational science” (75). This informational revolution could not be more unlike the industrial revolution. It has minimal, if any unfavorable effects on the environment, being performed in cybernetic space that most of us believe is intangible, untouchable, and lily-white clean. The immense nature of the informational revolution was recently summarized in a book by Firestein (76). From 5,000 years ago until 2003, humanity created a total of 5 exabytes (a billion gigabytes) of information. From 2003 to 2010, we created this amount every 2 days and in 2013 we create this amount every 10 min. Another way of stating this is to realize that every few hours, we create more information than all of the information created by humanity since the start of civilization.
The authors acknowledge Peggy Offley for her assistance in the preparation of this manuscript.
Dr. Roberts receives grant support from CIHR#MOP82810 (RR)/Canada, CIHR#MOP77682 (AFRS)/Canada, and CFI#11966 (RR)/Canada; and is a consultant to Cumberland Pharmaceuticals. Dr. Marian receives grant support from R01-088498/PHS HHS/United States, R21 AG038597-01/AG/NIA NIH HHS/United States, and R34HL105563/HL/NHLBI NIH HHS/United States. Dr. Stewart receives grant support from CIHR#MOP82810 (RR) & CIHR#MOP77682 (AFRS)/Canada. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and acronyms
- coronary artery disease
- deoxyribonucleic acid
- deoxyribonucleic acid sequence variants
- genome-wide association studies
- next generation DNA sequencing
- nonsynonymous single-nucleotide variations
- ribonucleic acid
- single nucleotide polymorphism
- whole exome sequencing
- whole genome sequencing
- Received November 12, 2012.
- Revision received January 29, 2013.
- Accepted February 19, 2013.
- American College of Cardiology Foundation
- Boorstin D.
- Roberts R.,
- McNally E.M.
- Amaral P.P.,
- Dinger M.E.,
- Mercer T.R.,
- Mattick J.S.
- Carlson C.
- Bhangale T.R.,
- Rieder M.J.,
- Livingston R.J.,
- Nickerson D.A.
- Marian A.J.,
- Belmont J.
- Stranger B.E.F.M.,
- Dunning M.,
- Ingle C.E.,
- et al.
- Hamosh A.,
- Scott A.F.,
- Amberger J.S.,
- Bocchini C.A.,
- McKusick V.A.
- Marian A.J.,
- Brugada R.,
- Roberts R.
- Pennisi E.
- Theis J.L.,
- Sharpe K.M.,
- Matsumoto M.E.,
- et al.
- Granados-Riveron J.T.,
- Ghosh T.K.,
- Pope M.,
- et al.
- Meder B.,
- Haas J.,
- Keller A.,
- et al.
- McPherson R.,
- Pertsemlidis A.,
- Kavaslar N.
- Helgadottir A.,
- Thorleifsson G.,
- Manolescu A.,
- et al.
- Preuss M.,
- Konig I.R.,
- Thompson J.R.,
- et al.
- Roberts R.,
- Stewart A.F.
- Tennessen J.A.,
- Bigham A.W.,
- O'Connor T.D.,
- et al.
- Nelson M.R.,
- Wegmann D.,
- Ehm M.G.,
- et al.
- Dewey F.E.,
- Pan S.,
- Wheeler M.T.,
- et al.
- Sidhu J.S.,
- Rajawat Y.S.,
- Rami T.G.,
- et al.
- Hindorff L.A.,
- Sethupathy P.,
- Junkins H.A.,
- et al.
- Brown M.S.,
- Goldstein J.L.
- Zuk O.,
- Hechter E.,
- Sunyaev S.R.,
- Lander E.S.
- Carlquist J.F.,
- Anderson J.L.
- Anderson J.L.,
- Horne B.D.,
- Stevens S.M.,
- et al.
- Brunham L.R.,
- Hayden M.R.
- Drmanac R.
- Schnabel R.B.,
- Baccarelli A.,
- Lin H.,
- et al.
- Zerbino D.R.,
- Paten B.,
- Haussler D.
- DeSalle R.Y.M.
- Firestein S.
- The Human Genome: New Developments
- The Source of Human Genetic Biodiversity
- The Search for Disease Related Genes
- Single Gene Disorders: The Success of Genetic Linkage Analysis
- Single Gene Disorders: DNA Sequencing, a Paradigm Shift
- DNA Sequencing as a Genetic Screen for Single Gene Disorders
- Genetics of CAD: An Archetypical Polygenic Disorder
- Genetic Risk of CAD and Clinical Application
- Missing Heritability: The Need for DNA Sequencing
- Individual Genome Sequencing: A New Reality
- What Does It Mean to Have One's Genome Sequenced?
- The Convergence of 2 Technologies: A Challenge for Personalized Medicine