Like the one ring of power in Tolkien's "Lord of the Rings," deoxyribonucleic acid (DNA) is the master molecule of every cell. It contains vital information that gets passed on to each successive generation. It coordinates the making of itself as well as other molecules (proteins). If it is changed slightly, serious consequences may result. If it is destroyed beyond repair, the cell dies.
Changes in the DNA of cells in multicellular organisms produce variations in the characteristics of a species. Over long periods of time, natural selection acts on these variations to evolve or change the species.
The presence or absence of DNA evidence at a crime scene could mean the difference between a guilty verdict and an acquittal. DNA is so important that the United States government has spent enormous amounts of money to unravel the sequence of DNA in the human genome in hopes of understanding and finding cures for many genetic diseases. Finally, from the DNA of one cell, we can clone an animal, a plant or perhaps even a human being.
But what is DNA? Where is it found? What makes it so special? How does it work? In this article, we will look deep into the structure of DNA and explain how it makes itself and how it determines all of your traits. First, let's look at how DNA was discovered.
DNA is one of a class of molecules called nucleic acids. Nucleic acids were originally discovered in 1868 by Friedrich Meischer, a Swiss biologist, who isolated DNA from pus cells on bandages. Although Meischer suspected that nucleic acids might contain genetic information, he could not confirm it.
In 1943, Oswald Avery and colleagues at Rockefeller University showed that DNA taken from a bacterium, Streptococcus pneumonia, could make non-infectious bacteria become infectious. These results indicated that DNA was the information-containing molecule in the cell. The information role of DNA was further supported in 1952 when Alfred Hershey and Martha Chase demonstrated that to make new viruses, a bacteriophage virus injected DNA, not protein, into the host cell (see How Viruses Work for more information).
So scientists had theorized about the informational role of DNA for a long time, but nobody knew how this information was encoded and transmitted. Many scientists guessed that the structure of the molecule was important to this process. In 1953, James D. Watson and Francis Crick discovered the structure of DNA at Cambridge University. The story was described in James Watson's book "The Double Helix" and brought to the screen in the movie, "The Race for the Double Helix." Basically, Watson and Crick used molecular modeling techniques and data from other investigators (including Maurice Wilkins, Rosalind Franklin, Erwin Chargaff and Linus Pauling) to solve the structure of DNA. Watson, Crick and Wilkins received the Nobel Prize in Medicine for the discovery of DNA's structure (Franklin, who was Wilkins' collaborator and provided a key piece of data that revealed the structure to Watson and Crick, died before the prize was awarded).
DNA is one of the nucleic acids, information-containing molecules in the cell (ribonucleic acid, or RNA, is the other nucleic acid). DNA is found in the nucleus of every human cell. (See the sidebar at the bottom of the page for more about RNA and different types of cells). The information in DNA:
- guides the cell (along with RNA) in making new proteins that determine all of our biological traits
- gets passed (copied) from one generation to the next
The key to all of these functions is found in the molecular structure of DNA, as described by Watson and Crick.
Although it may look complicated, the DNA in a cell is really just a pattern made up of four different parts called nucleotides. Imagine a set of blocks that has only four shapes, or an alphabet that has only four letters. DNA is a long string of these blocks or letters. Each nucleotide consists of a sugar (deoxyribose) bound on one side to a phosphate group and bound on the other side to a nitrogenous base.
There are two classes of nitrogen bases called purines (double-ringed structures) and pyrimidines (single-ringed structures). The four bases in DNA's alphabet are:
- adenine (A) - a purine
- cytosine(C) - a pyrimidine
- guanine (G) - a purine
- thymine (T) - a pyrimidine
Watson and Crick discovered that DNA had two sides, or strands, and that these strands were twisted together like a twisted ladder -- the double helix. The sides of the ladder comprise the sugar-phosphate portions of adjacent nucleotides bonded together. The phosphate of one nucleotide is covalently bound (a bond in which one or more pairs of electrons are shared by two atoms) to the sugar of the next nucleotide. The hydrogen bonds between phosphates cause the DNA strand to twist. The nitrogenous bases point inward on the ladder and form pairs with bases on the other side, like rungs. Each base pair is formed from two complementary nucleotides (purine with pyrimidine) bound together by hydrogen bonds. The base pairs in DNA are adenine with thymine and cytosine with guanine.
In the next section we'll find out how long DNA strands fit inside a tiny cell.
Fitting Inside a Cell
DNA is a long molecule. For example, a typical bacterium, like E. coli, has one DNA molecule with about 3,000 genes (A gene is a specific sequence of DNA nucleotides that codes for a protein. We'll talk about this later). If drawn out, this DNA molecule would be about 1 millimeter long. However, a typical E. coli is only 3 microns long (3 one-thousandths of a millimeter).So to fit inside the cell, the DNA is highly coiled and twisted into one circular chromosome.
Complex organisms, like plants and animals, have 50,000 to 100,000 genes on many different chromosomes (humans have 46 chromosomes). In the cells of these organisms, the DNA is twisted around bead-like proteins called histones. The histones are also coiled tightly to form chromosomes, which are located in the nucleus of the cell. When a cell reproduces, the chromosomes (DNA) get copied and distributed to each offspring, or daughter, cell. Non-sex cells have two copies of each chromosome that get copied and each daughter cell receives two copies (mitosis). During meiosis, precursor cells have two copies of each chromosome that gets copied and distributed equally to four sex cells. The sex cells (sperm and egg) have only one copy of each chromosome. When sperm and egg unite in fertilization, the offspring have two copies of each chromosome (see How Sex Works).
In the next section we'll look at how the DNA replication process works.
DNA carries the information for making all of the cell's proteins. These proteins implement all of the functions of a living organism and determine the organism's characteristics. When the cell reproduces, it has to pass all of this information on to the daughter cells.
Before a cell can reproduce, it must first replicate, or make a copy of, its DNA. Where DNA replication occurs depends upon whether the cells is a prokaryotic or a eukaryote (see the RNA sidebar on the previous page for more about the types of cells). DNA replication occurs in the cytoplasm of prokaryotes and in the nucleus of eukaryotes. Regardless of where DNA replication occurs, the basic process is the same.
The structure of DNA lends itself easily to DNA replication. Each side of the double helix runs in opposite (anti-parallel) directions. The beauty of this structure is that it can unzip down the middle and each side can serve as a pattern or template for the other side (called semi-conservative replication). However, DNA does not unzip entirely. It unzips in a small area called a replication fork, which then moves down the entire length of the molecule.
Let's look at the details:
- An enzyme called DNA gyrase makes a nick in the double helix and each side separates
- An enzyme called helicase unwinds the double-stranded DNA
- Several small proteins called single strand binding proteins (SSB) temporarily bind to each side and keep them separated
- An enzyme complex called DNA polymerase "walks" down the DNA strands and adds new nucleotides to each strand. The nucleotides pair with the complementary nucleotides on the existing stand (A with T, G with C).
- A subunit of the DNA polymerase proofreads the new DNA
- An enzyme called DNA ligase seals up the fragments into one long continuous strand
- The new copies automatically wind up again
Different types of cells replicated their DNA at different rates. Some cells constantly divide, like those in your hair and fingernails and bone marrow cells. Other cells go through several rounds of cell division and stop (including specialized cells, like those in your brain, muscle and heart). Finally, some cells stop dividing, but can be induced to divide to repair injury (such as skin cells and liver cells). In cells that do not constantly divide, the cues for DNA replication/cell division come in the form of chemicals. These chemicals can come from other parts of the body (hormones) or from the environment.
What DNA Does
DNA carries all of the information for your physical characteristics, which are essentially determined by proteins. So, DNA contains the instructions for making a protein. In DNA, each protein is encoded by a gene (a specific sequence of DNA nucleotides that specify how a single protein is to be made). Specifically, the order of nucleotides within a gene specifies the order and types of amino acids that must be put together to make a protein.
A protein is made of a long chain of chemicals called amino acids Proteins have many functions:
- Enzymes that carry out chemical reactions (such as digestive enzymes)
- Structural proteins that are building materials (such as collagen and nail keratin)
- Transport proteins that carry substances (such as oxygen-carrying hemoglobin in blood)
- Contraction proteins that cause muscles to compress (such as actin and myosin)
- Storage proteins that hold on to substances (such as albumin in egg whites and iron-storing ferritin in your spleen)
- Hormones - chemical messengers between cells (including insulin, estrogen, testosterone, cortisol, et cetera)
- Protective proteins - antibodies of the immune system, clotting proteins in blood
- Toxins - poisonous substances, (such as bee venom and snake venom)
The particular sequence of amino acids in the chain is what makes one protein different from another. This sequence is encoded in the DNA where one gene encodes for one protein.
How does DNA encode the information for a protein? There are only four DNA bases, but there are 20 amino acids that can be used for proteins. So, groups of three nucleotides form a word (codon) that specifies which of the 20 amino acids goes into the protein (a 3-base codon yields 64 possible patterns (4*4*4), which is more than enough to specify 20 amino acids. Because there are 64 possible codons and only 20 amino acids, there is some repetition in the genetic code. Also, the order of codons in the gene specifies the order of amino acids in the protein. It may require anywhere from 100 to 1,000 codons (300 to 2,000 nucleotides) to specify a given protein. Each gene also has codons to designate the beginning (start codon) and end (stop codon) of the gene.
In the next few sections, we'll see how proteins are built.
Building a Protein: Transcription
Building proteins is very much like building a house:
- The master blueprint is DNA, which contains all of the information to build the new protein (house).
- The working copy of the master blueprint is called messenger RNA (mRNA), which is copied from DNA.
- The construction site is either the cytoplasm in a prokaryote or the endoplasmic reticulum (ER) in a eukaryote.
- The building materials are amino acids.
- The construction workers are ribosomes and transfer RNA molecules.
Let's look at each phase of the new construction more closely.
In a eukaryote, DNA never leaves the nucleus, so its information must be copied. This copying process is called transcription and the copy is mRNA. Transcription takes place in the cytoplasm (prokaryote) or in the nucleus (eukaryote). The transcription is performed by an enzyme called RNA polymerase. To make mRNA, RNA polymerase:
- Binds to the DNA strand at a specific sequence of the gene called a promoter
- Unwinds and unlinks the two strands of DNA
- Uses one of the DNA strands as a guide or template
- Matches new nucleotides with their complements on the DNA strand (G with C, A with U -- remember that RNA has uracil (U) instead of thymine (T))
- Binds these new RNA nucleotides together to form a complementary copy of the DNA strand (mRNA)
- Stops when it encounters a termination sequence of bases (stop codon)
mRNA is happy to live in a single-stranded state (as opposed to DNA's desire to form complementary double-stranded helixes). In prokaryotes, all of the nucleotides in the mRNA are part of codons for the new protein. However, in eukaryotes only, there are extra sequences in the DNA and mRNA that don't code for proteins called introns. This mRNA is then further processed:
- Introns get cut out
- The coding sequences get spliced together
- A special nucleotide "cap" gets added to one end
- A long tail consisting of 100 to 200 adenine nucleotides is added to the other end
No one knows why this processing occurs in eukaryotes. Finally, at any one moment, many genes are being transcribed simultaneously according to the cell's needs for specific proteins.
The working copy of the blueprint (mRNA) must now go the construction site where the workers will build the new protein. If the cell is a prokaryote such as an E. coli bacterium, then the site is the cytoplasm. If the cell is a eukaryote, such as a human cell, then the mRNA leaves the nucleus through large holes in the nuclear membrane (nuclear pores) and goes to the endoplasmic reticulum (ER).
Next, we'll learn about translation -- the assembly process.
Building a Protein: Translation
To continue with our house example, once the working copy of the blueprint has reached the site, the workers must assemble the materials according to the instructions; this process is called translation. In the case of a protein, the workers are the ribosomes and special RNA molecules called transfer RNA (tRNA). The construction materials are the amino acids.
First, let's look at the ribosome. The ribosome is made of RNA called ribosomal RNA (rRNA). In prokaryotes, rRNA is made in the cytoplasm; in eukaryotes, rRNA is made in the nucleolus. The ribosome has two parts, which bind on either side of the mRNA. Within the large part are two "rooms" (P and A sites) that will fit two adjacent codons of the mRNA, two tRNA molecules and two amino acids. At first, the P site holds the first codon in the mRNA and A site holds the next codon.
Next, let's look at the tRNA molecules. Each tRNA has a binding site for an amino acid. Because each tRNA is specific for a single amino acid, it must be able to recognize the codon on the mRNA that codes for that particular amino acid. Therefore, each tRNA has a specific three-nucleotide sequence called an anti-codon that matches up with the appropriate mRNA codon, like a lock and key. For example, if a codon on mRNA has the sequence ...-uracil-uracil-uracil-... (UUU) which codes for the amino acid phenylalanine, then the anti-codon on the phenylalanine tRNA will be adenine-adenine-adenine (AAA); remember that A binds with U in RNA. The tRNA molecules float in the cytoplasm and bind free amino acids. Once bound to amino acids, the tRNAs (also called amino-acyl tRNAs) will seek out ribosomes.
Finally, let's look at the events in the synthesis of new proteins. For example, let's consider a small mRNA molecule with the following sequence:
All mRNA molecules begin with AUG (the start codon). UGA, UAA, and UAG are stop codons; stop codons have no corresponding tRNA molecules (Actual mRNA molecules have hundreds of codons).
The corresponding sequence of tRNA anti-codons will be:
There is no tRNA corresponding to the stop codons.
The amino acid sequence specified by this small mRNA is:
We know this sequence of amino acids by using a table of the genetic code. The genetic code table below is for mRNA and specifies the bases in the first, second and third positions of the codon with their corresponding amino acids.
Let's read the amino acid specified by the mRNA codon, AUG. First, place your left finger on the first position codon (A), in the first column of the table. Move your left finger across the row under the second position codon (U) in the first row. Now, place your right finger over the third position codon (G) in the same row of the last column (G). Move your right finger across the row until it meets your left finger and read the amino acid (methionine).
In the next section we'll look at the protein synthesis process.
The Protein Synthesis Process
Now let's look at the order of events in the synthesis of our protein from our sample mRNA:
- A ribosome binds to mRNA with the AUG codon in the P-site and the UUU codon in the A-site.
- An amino acyl-tRNA (anti-codon = UAC) with an attached methionine comes into the P-site of the ribosome
- An amino acyl-tRNA (anti-codon = AAA) with an attached phenylalanine comes into the A-site of the ribosome
- A chemical bond forms between the methionine and phenylalanine (in a protein, this covalent bond is called a peptide bond).
- The methionine-specific tRNA leaves the P-site and goes off to gather another methionine
- The ribosome shifts so that the P-site now contains the UUU codon with the attached phenyl-alanine tRNA and the next codon (ACA) now occupies the A-site.
- An amino acyl-tRNA (anti-codon) with an attached threonine comes into the A-site of the ribosome.
- A peptide bond forms between the phenylalanine and the threonine.
- The phenylalanine-specific tRNA leaves the P-site and goes off to find another phenylalanine.
- The ribosome shifts down one codon so that the stop sequence is now in the A-site. Upon encountering the stop sequence, the ribosome detaches from the mRNA and splits into its two parts, the threonine-specific tRNA releases its threonine and leaves, and the new protein floats away
Several ribosomes can attach to a molecule of mRNA one after another and begin making proteins. So several proteins can be made from one mRNA. In fact, in E. coli bacteria, translation of the mRNA begins even before transcription is finished.
In the next section we'll look at DNA mutation.
DNA Mutation, Variation and Sequencing
In the human genome, there are 50,000 to 100,000 genes. As DNA polymerase copies the DNA sequence, some mistakes occur. For example, one DNA base in a gene might get substituted for another. This is called a mutation (specifically a point mutation) or variation in the gene. Because the genetic code has built-in redundancies, this mistake might not have much effect on the protein made by the gene. In some cases, the error might be in the third base of a codon and still specify the same amino acid in the protein. In other cases, it may be elsewhere in the codon and specify a different amino acid. If the changed amino acid is not in a crucial part of the protein, then there may be no adverse effect. However, if the changed amino acid is in a crucial part of the protein, then the protein may be defective and not work as well or at all; this type of change can lead to disease.
Other types of mutations in DNA can occur when small segments of DNA break off the chromosome. These segments can get placed back at another spot in the chromosome and interrupt the normal flow of information. These types of mutations (deletions, insertions, inversions) usually have severe consequences.
As noted above, there is lots of extra DNA in the human genome that does not code for proteins. What this extra non-coding DNA does is actively being researched. Perhaps some of it is merely spacing to hold the genes a certain distance apart for the enzymes of transcription. Some might be places where environmental chemicals might bind and affect DNA transcription and/or translation. Also, within this extra DNA, there are many variation sequences that are used in DNA typing (see How DNA Evidence Works).
The Human Genome Project (HGP) was initiated in the 1990s with the goal of determining the sequence of the entire human genome. What genes were present? Where they were located? What were the sequences of the genes and the intervening DNA (non-coding DNA)? This task was monumental, along the order of the US Apollo Project to place a man on the Moon. The HGP scientists and contractors developed new technologies to sequence DNA that were automated and less expensive.
Basically, to sequence DNA, you place all of the enzymes and nucleotides (A, G, C and T) necessary to copy DNA into a test tube. A small percentage of the nucleotides have a fluorescent dye attached to them (a different color for each type). You then place the DNA that you want to sequence into the test tube and let it incubate for a while.
During the incubation process, the sample DNA gets copied over and over again. For any given copy, the copying process stops when a fluorescent nucleotide gets placed into it. So, at the end of the incubation process, you have many fragments of the original DNA of varying sizes and ending in one of the fluorescent nucleotides. For an animation of this process of DNA sequencing, visit DNA Interactive, go to Techniques, then Sorting and sequencing.
DNA technology will continue to develop as we try to understand how the elements of the human genome work and interact with the environment.
For lots more information about DNA and related topics, check out the links on the next page.
- How Cells Work
- How DNA Evidence Works
- How Gene Pools Work
- How Evolution Works
- How Cloning Works
- How DNA Computers Will Work
- How Human Reproduction Works
- How Cipro Works
- How AIDS Works
- Is the U.S. government building a "Gattaca"-level DNA database?
- How does your body know the difference between dominant and recessive genes?
More Great Links
- "The Discovery of the Molecular Structure of DNA - the Double Helix." NobelPrize.org. http://nobelprize.org/educational_games/ medicine/dna_double_helix/readmore.html
- DNA Interactive http://www.dnai.org/index.htm
- "Double Helix: 50 years of DNA." Nature, March 15, 2007. http://www.nature.com/nature/dna50/index.html
- "Genomics and Its Impact on Science and Society: A 2003 Primer." Human Genome Program, U.S. Department of Energy, 2003. http://www.ornl.gov/sci/techresources/Human_Genome/ publicat/primer2001/index.shtml
- "Molecular Biology Review." National Center for Biotechnology Information. November 4, 2002. http://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/ MolBioReview/index.html
- NIH/NCBI Science Primer http://www.ncbi.nlm.nih.gov/About/primer/index.html
- "The Science Behind The Human Genome Project." Human Genome Project Information. August 29, 2006. http://www.ornl.gov/sci/techresources/Human_Genome/ project/info.shtml
- "The Search for DNA - The Birth of Molecular Biology." Access Excellence: The National Health Museum, 1990. http://www.accessexcellence.org/RC/AB/BC/Search_for_DNA.html
- "The Secret of Photo 51."PBS Nova. http://www.pbs.org/wgbh/nova/photo51/