The human genome is the complete set of the genetic materials needed to create and maintain an individual. This genetic material is DNA or deoxyribonucleic acid. It is packaged into 23 pairs of distinct structures called “chromosomes”. Half of your chromosomes can be traced back to your mother, and the other half to your father. Both men and women share 22 pairs of chromosomes, labeled from 1 to 22. Additionally, women have two X chromosomes, and men have one X and one Y chromosome. In addition to the 23 pairs of chromosomes, genetic information is also contained in DNA found in the mitochondria, the energy-producing organelles found in each cell of your body. Mitochondrial DNA is inherited exclusively from your mother.


DNA is made of two long strands of 4 building blocks, called “nucleotides” or “bases”: adenine (A), cytosine (C), guanine (G), and thymine (T). To fit into a cell, DNA forms a characteristic double helix structure in which bases on the two DNA strands align, such that adenine (A) pairs with thymine (T) and guanine (G) pairs with cytosine (C). Because of this pairing, DNA sequence length is given in base pairs or “bp”. The human genome has about 3.2 billion base pairs.

The information stored in DNA is used by cells to produce proteins. Proteins are large molecules that produce energy, create tissues, digest food, enable movement, and perform many other critical roles in the body. A string of DNA bases that contains instructions to make one protein is called a “gene”. It is estimated that the human genome contains about 20,000-25,000 genes. You have two copies of each gene; one is inherited from your mother and the other is inherited from your father.

To produce a protein, the sequence of DNA bases in a gene has to be translated from DNA building blocks (A, C, G, T) to protein building blocks called “amino acids” (you may remember some amino acids by name, for example, proline, arginine, methionine, and others). DNA is first copied to a type of molecule called messenger ribonucleic acid (mRNA) in a process called transcription. mRNA is then translated into a protein when the nucleotides A, C, G, and U (U replaces T in mRNA) are used to build a set of amino acids into a protein.

What happens when one DNA base is replaced with a different DNA base, for example, sequence ATGAAC becomes ATGACC? This difference is called a “genetic variant”. In most cases, such changes in DNA do not result in changes in the produced proteins. In some cases, however, genetic variants can lead to proteins with altered functions, or cause a change in how much protein is made. When a change in the protein happens, it is not always harmful, but can be favorable or neutral. Unfortunately, some changes in DNA are harmful and lead to proteins that cannot perform their function due to errors in their structure or insufficient amount. When a certain protein cannot perform its function, a disease can develop. Genetic variants with effects so strong that only one error is sufficient to cause a disease are rare, and we call those “monogenic variants”. However, harmful variants with small or medium effects are more frequent. They are not capable to cause the disease by themselves but having several of them in different genes can predispose you to develop disease and these together can be used to generate “polygenic risk scores”.

To learn which genetic variants an individual has, the DNA is sequenced which allows the order of the bases A, C, G, and T present in all chromosomes to be read. Then, that DNA sequence is compared to the sequence that represents the DNA of an average healthy human. Any differences, or genetic variants, are noted and analyzed. Over four million variants in approximately 22,000 genes are evaluated. Of those 22,000 genes, 3,533 are known to be associated with diseases. Further, in-depth genomic analysis is performed on 210 genes that are known to be associated with an increased risk of developing cancer, heart and vascular, metabolic, and neurodegenerative disease. And finally, genetic variants are analyzed by a set of statistical models providing important clues into an individual’s disease risk.

High-risk genetic variants that result in strong effects on protein function are rare and the diseases they caused are called “rare diseases”, “Mendelian diseases”, or “monogenic diseases”. “Mono” indicates that disruption of function of a single gene is sufficient to cause the disease.

A different type of genetic variants, low-risk variants, contribute to the development of common diseases including cancers. Each such variant has only small impact on the protein function or a body’s process like inflammation and does not cause the disease by itself. These variants with small effects are common among people, being found in at least 1 in 20 individuals, but sometimes as common as 1 in 5 or more (in contrast with high-risk variants that are found in less than 1 in 100 individuals). Only when several low-risk variants are found together in one individual can they have a strong enough impact to be associated with an increased genetic risk for a common disorder.

In reality, many diseases have both monogenic and polygenic subtypes. For example, variants in the BCRA1 and BRCA2 genes are considered disease-causing as the lifetime risk of developing breast cancer for women having such variants is high (the risk reaches 38% to 87% and significantly exceeds the risk of breast cancer in general population estimated as 12%1). Thus, BRCA1- and BRCA2-associated hereditary breast and ovarian cancer is considered to be a monogenic condition. However, only 5 – 10% of breast cancer patients have BRCA1 or BRCA2 variants or variants in other high-risk genes2,3, leaving a significant number of breast cancer cases that are cases with significant contributions from genetic factors but with a principal role of environment and lifestyle.

While high-risk variants can impart risk independently, the risk conveyed by low-risk variants has to be evaluated as a group. To assess risk in this way, an individual’s polygenic risk score for a particular disease is calculated by summing the effects of multiple variants present in that individual’s genome. That score is compared to the scores of other people. If the individual’s score is higher than average, then they have increased risk, whereas if the score is lower, they have decreased risk relative to the general population.

Having a high polygenic risk is not the same as having a variant causing a rare disease, but it means that your genetic predisposition is increased. Environment and lifestyle are also important contributors to disease risk, so if you have a high polygenic score, you should feel especially motivated to maintain a healthy lifestyle. Minimizing other risks will allow avoiding several risk factors, genetic and lifestyle-related, affecting your health at the same time.

Using polygenic scores for disease risk prediction is a relatively new approach in clinical practice, and as with any new technology, there are limitations. One, most polygenic risk models created so far have used primarily data from people with European ancestry. This means they may be less accurate for people of other backgrounds. Two, polygenic risk scores are still rather limited in predicting who will actually develop a disease; incorporating new variants as they are discovered will likely improve risk scores in the future.


1. A. Torkamani et al., The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 19, 581-590 (2018).

2. S. A. Lambert et al., Towards Clinical Utility of Polygenic Risk Scores. Hum Mol Genet. 28, R133-R142 (2019).

3. N. J. Wald et al., The illusion of polygenic disease risk prediction. Genet Med. 21, 1705-1707 (2019).

4. A. R. Martin et al., Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 51, 584-591 (2019).


1. “Polygenic risk scores” by National Human Genome Research Institute. Online [ Polygenic-risk-scores].

2. “What are polygenic risk scores?” by Illumina. Online [].


Join 100+

Download Our Cancer Prevention Magazine