How DNA Works

dna — Deoxyribonucleic acid, more commonly known as DNA, is a complex molecule that contains all of the information necessary to build and maintain an organism. All living things have DNA within their cells. Stanislaw Pytel/Getty Images

Like the one ring of power in Tolkien's "Lord of the Rings," deoxyribonucleic acid (DNA) is the master molecule of every cell. It contains vital information that gets passed on to each successive generation. It coordinates the making of itself as well as other molecules (proteins). If it is changed slightly, serious consequences may result. If it is destroyed beyond repair, the cell dies.

Changes in the DNA of cells in multicellular organisms produce variations in the characteristics of a species. Over long periods of time, natural selection acts on these variations to evolve or change the species.

The presence or absence of DNA evidence at a crime scene could mean the difference between a guilty verdict and an acquittal. DNA is so important that the United States government has spent enormous amounts of money to sequence DNA in the human genome in hopes of understanding and finding cures for many genetic diseases. Finally, from the DNA of one cell, we can clone an animal, a plant or perhaps even a human being.

But what is DNA? Where is it found? What makes it so special? How does it work? In this article, we will look deep into the structure of DNA and explain how it makes itself and how it determines all of your traits. First, let's look at how DNA was discovered.

DNA is one of a class of molecules called nucleic acids. Nucleic acids were originally discovered in 1868 by Friedrich Miescher, a Swiss biologist, who isolated DNA from pus cells on bandages. Although Miescher suspected that nucleic acids might contain genetic information, he could not confirm it.

In 1943, Oswald Avery and colleagues at Rockefeller University showed that DNA taken from a bacterium, Streptococcus pneumonia, could make noninfectious bacteria become infectious. These results indicated that DNA was the information-containing molecule in the cell. The information role of DNA was further supported in 1952 when Alfred Hershey and Martha Chase demonstrated that to make new viruses, a bacteriophage virus injected DNA, not protein, into the host cell.

So, scientists had theorized about the informational role of DNA for a long time, but nobody knew how this information was encoded and transmitted. Many scientists guessed that the structure of the molecule was important to this process. In 1953, James D. Watson and Francis Crick discovered the structure of DNA at Cambridge University.

Basically, Watson and Crick used molecular modeling techniques and data from other investigators (including Maurice Wilkins, Rosalind Franklin, Erwin Chargaff and Linus Pauling) to solve the structure of DNA. Watson, Crick and Wilkins received the Nobel Prize in Medicine in 1962 for the discovery (Franklin, who was Wilkins' collaborator and provided a key piece of data that revealed the structure to Watson and Crick, died before the prize was awarded).

DNA Structure

A nucleotide is the basic building block of nucleic acids (RNA and DNA). HowStuffWorks

DNA is one of the nucleic acids, information-containing molecules in the cell (ribonucleic acid, or RNA, is the other nucleic acid). DNA is found in the nucleus of every human cell. The information in DNA:

guides the cell (along with RNA) in making new proteins that determine all our biological traits
gets passed (copied) from one generation to the next

The key to all of these functions is found in the molecular structure of DNA, as described by Watson and Crick.

Although it may look complicated, the DNA in a cell is just a pattern made up of four different parts called nucleotides. Imagine a set of blocks that has only four shapes, or an alphabet that has only four letters. DNA is a long string of these blocks or letters. Each nucleotide consists of a sugar (deoxyribose) bound on one side to a phosphate group and bound on the other side to a nitrogenous base.

There are two classes of nitrogen bases called purines (double-ringed structures) and pyrimidines (single-ringed structures). The four bases in DNA's alphabet are:

adenine (A): a purine
cytosine(C): a pyrimidine
guanine (G): a purine
thymine (T): a pyrimidine

Watson and Crick discovered that DNA had two sides or strands, and that these strands were twisted together like a twisted ladder — the double helix. The sides of the ladder comprise the sugar-phosphate portions of adjacent nucleotides bonded together.

The phosphate of one nucleotide is covalently bound (a bond in which one or more pairs of electrons are shared by two atoms) to the sugar of the next nucleotide. The hydrogen bonds between phosphates cause the DNA strand to twist. The nitrogenous bases point inward on the ladder and form pairs with bases on the other side, like rungs. Each base pair is formed from two complementary nucleotides (purine with pyrimidine) bound together by hydrogen bonds. The base pairs in DNA are adenine with thymine and cytosine with guanine.

Hydrogen Bond

A hydrogen bond is a weak chemical bond that occurs between hydrogen atoms and more electronegative atoms, like oxygen, nitrogen and fluorine. The participating atoms can be located on the same molecule (adjacent nucleotides) or on different molecules (adjacent nucleotides on different DNA strands). Hydrogen bonds do not involve the exchange or sharing of electrons like covalent and ionic bonds. The weak attraction is like that between the opposite poles of a magnet. Hydrogen bonds occur over short distances and can be easily formed and broken. They can also stabilize a molecule.

Fitting Inside a Cell

E. coli bacterium — A typical *E. coli* bacterium is 3 microns long, but its DNA is more than 300 times longer. So the DNA is tightly coiled and twisted to fit inside. HowStuffWorks

DNA is a long molecule. For example, a typical bacterium, like E. coli, has one DNA molecule with about 3,000 genes. If drawn out, this DNA molecule would be about 1 millimeter long. However, a typical E. coli is only 3 microns long (3 one-thousandths of a millimeter). So to fit inside the cell, the DNA is highly coiled and twisted into one circular chromosome.

Complex organisms, like plants and animals, have 50,000 to 100,000 genes on many different chromosomes (most humans have 46 chromosomes). In the cells of these organisms, the DNA is twisted around bead-like proteins called histones. The histones are also coiled tightly to form chromosomes, which are in the nucleus of the cell.

When a cell reproduces, the chromosomes (DNA) get copied and distributed to each offspring or daughter cell. Non-sex cells have two copies of each chromosome that get copied and each daughter cell receives two copies (mitosis). During meiosis, precursor cells have two copies of each chromosome that gets copied and distributed equally to four sex cells. The sex cells (sperm and egg) have only one copy of each chromosome. When sperm and egg unite in fertilization, the offspring have two copies of each chromosome.

DNA Replication

DNA carries the information for making all the cell's proteins. These proteins implement all the functions of a living organism and determine the organism's characteristics. When the cell reproduces, it must pass all of this information on to the daughter cells.

Before a cell can reproduce, it must first replicate — or make a copy of — its DNA. Where DNA replication occurs depends upon whether the cell is a prokaryote or a eukaryote. DNA replication occurs in the cytoplasm of prokaryotes and in the nucleus of eukaryotes. Regardless of where DNA replication occurs, the basic process is the same.

The structure of DNA lends itself easily to DNA replication. Each side of the double helix runs in opposite (antiparallel) directions. The beauty of this structure is that it can unzip down the middle and each side can serve as a pattern or template for the other side (called semi-conservative replication). However, DNA does not unzip entirely. It unzips in a small area called a replication fork, which then moves down the entire length of the molecule.

Let's look at the details:

An enzyme called DNA gyrase makes a nick in the double helix and each side separates.
An enzyme called helicase unwinds the double-stranded DNA.
Several small proteins called single strand binding proteins (SSB) temporarily bind to each side and keep them separated.
An enzyme complex called DNA polymerase "walks" down the DNA strands and adds new nucleotides to each strand. The nucleotides pair with the complementary nucleotides on the existing stand (A with T, G with C).
A subunit of the DNA polymerase proofreads the new DNA.
An enzyme called DNA ligase seals up the fragments into one long continuous strand.
The new copies automatically wind up again.

Different types of cells replicate their DNA at different rates. Some cells constantly divide, like those in your hair and fingernails and bone marrow cells. Other cells go through several rounds of cell division and stop (including specialized cells, like those in your brain, muscles and heart). Finally, some cells stop dividing, but can be induced to divide to repair injury (such as skin cells and liver cells). In cells that do not constantly divide, the cues for DNA replication/cell division come in the form of chemicals. These chemicals can come from other parts of the body (hormones) or from the environment.

Animal vs. Plant DNA

The DNA of all living organisms has the same structure and code, although some viruses use RNA as the information carrier instead of DNA. Most animals have two copies of each chromosome. In contrast, plants may have more than two copies of several chromosomes, which usually arise from errors in the distribution of the chromosomes during cell reproduction. In animals, this type of error usually causes genetic diseases that are often fatal. For some unknown reasons, this type of error is not as devastating to plants.

What DNA Does

DNA carries all the information for your physical characteristics, which are essentially determined by proteins. DNA contains the instructions for making a protein. In DNA, each protein is encoded by a gene (a specific sequence of DNA nucleotides that specify how a single protein is to be made). Specifically, the order of nucleotides within a gene specifies the order and types of amino acids that must be put together to make a protein.

A protein is made of a long chain of chemicals called amino acids. There are various types of proteins with different functions:

Enzymes carry out chemical reactions (such as digestive enzymes).
Structural proteins are building materials (such as collagen and nail keratin).
Transport proteins that carry substances (such as oxygen-carrying hemoglobin in blood).
Contraction proteins (such as actin and myosin) cause muscles to compress.
Storage proteins that hold on to substances, such as albumin in egg whites and iron-storing ferritin in your spleen.
Hormones (including insulin, estrogen, testosterone and cortisol) send signals between the cells in the body.

The particular sequence of amino acids in the chain is what makes one protein different from another. This sequence is encoded in the DNA where one gene encodes for one protein.

How does DNA encode the information for a protein? There are only four DNA bases, but there are 20 amino acids that can be used for proteins. So, groups of three nucleotides form a codon that specifies which of the 20 amino acids goes into the protein. A 3-base codon yields 64 possible patterns (4*4*4), which is more than enough to specify 20 amino acids.

Because there are 64 possible codons and only 20 amino acids, there is some repetition in the genetic code. Also, the order of codons in the gene specifies the order of amino acids in the protein. It may require anywhere from 100 to 1,000 codons (300 to 2,000 nucleotides) to specify a given protein. Each gene also has codons to designate the beginning (start codon) and end (stop codon) of the gene.

Building a Protein: Transcription

Building proteins is very much like building a house:

The master blueprint is DNA, which contains all the information to build the new protein (house).
The working copy of the master blueprint is called messenger RNA (mRNA), which is copied from DNA.
The construction site is either the cytoplasm in a prokaryote or the endoplasmic reticulum (ER) in a eukaryote.
The building materials are amino acids.
The construction workers are ribosomes and transfer RNA molecules.

Let's look at each phase of the new construction more closely.

In a eukaryote, DNA never leaves the nucleus, so its information must be copied. This copying process is called transcription and the copy is mRNA. Transcription takes place in the cytoplasm (prokaryote) or in the nucleus (eukaryote). The transcription is performed by an enzyme called RNA polymerase.

To make mRNA, RNA polymerase:

Binds to the DNA strand at a specific sequence of the gene called a promoter
Unwinds and unlinks the two strands of DNA
Uses one of the DNA strands as a guide or template
Matches new nucleotides with their complements on the DNA strand (G with C, A with U — note that RNA has uracil (U) instead of thymine (T))
Binds these new RNA nucleotides together to form a copy of the DNA strand (mRNA)
Stops when it encounters a termination sequence of bases (stop codon)

mRNA is happy to live in a single-stranded state (as opposed to DNA's desire to form complementary double-stranded helixes). In prokaryotes, all the nucleotides in the mRNA are part of codons for the new protein. However, in eukaryotes only, there are extra sequences in the DNA and mRNA called introns, which don't code for proteins.

This mRNA is then further processed:

Introns get cut out.
The coding sequences get spliced together.
A special nucleotide "cap" gets added to one end.
A long tail consisting of 100 to 200 adenine nucleotides is added to the other end.

No one knows why this processing occurs in eukaryotes. Finally, at any one moment, many genes are being transcribed simultaneously according to the cell's needs for specific proteins.

The working copy of the blueprint (mRNA) must now go the construction site where the workers will build the new protein. If the cell is a prokaryote such as an E. coli bacterium, then the site is the cytoplasm. If the cell is a eukaryote, such as a human cell, then the mRNA leaves the nucleus through large holes in the nuclear membrane (nuclear pores) and goes to the endoplasmic reticulum (ER).

Next, we'll learn about translation — the assembly process.

Building a Protein: Translation

This is a table of the genetic code based on mRNA codons. Some tables are based on the DNA codons. HowStuffWorks

To continue with our house metaphor, once the working copy of the blueprint has reached the site, the workers must assemble the materials according to the instructions; this process is called translation. In the case of a protein, the workers are the ribosomes and special RNA molecules called transfer RNA (tRNA). The construction materials are the amino acids.

First, let's look at the ribosome. The ribosome is made of RNA called ribosomal RNA (rRNA). In prokaryotes, rRNA is made in the cytoplasm; in eukaryotes, rRNA is made in the nucleolus. The ribosome has two parts, which bind on either side of the mRNA. Within the large part are two "rooms" (P- and A-sites) that will fit two adjacent codons of the mRNA, two tRNA molecules and two amino acids. At first, the P-site holds the first codon in the mRNA and A-site holds the next codon.

Next, let's look at the tRNA molecules. Each tRNA has a binding site for an amino acid. Because each tRNA is specific for a single amino acid, it must be able to recognize the codon on the mRNA that codes for that particular amino acid. Therefore, each tRNA has a specific three-nucleotide sequence called an anti-codon that matches up with the appropriate mRNA codon, like a lock and key.

For example, if a codon on mRNA has the sequence ... uracil-uracil-uracil ... (UUU), which codes for the amino acid phenylalanine, then the anti-codon on the phenylalanine tRNA will be adenine-adenine-adenine (AAA); remember that A binds with U in RNA. The tRNA molecules float in the cytoplasm and bind free amino acids. Once bound to amino acids, the tRNAs (also called amino-acyl tRNAs) will seek out ribosomes.

Finally, let's look at the events in the synthesis of new proteins. For example, let's consider a small mRNA molecule with the following sequence:

All mRNA molecules begin with the start codon AUG. UGA, UAA and UAG are stop codons; stop codons have no corresponding tRNA molecules. (Actual mRNA molecules have hundreds of codons.)
The corresponding sequence of tRNA anti-codons will be UAC.
There is no tRNA corresponding to the stop codons.
The amino acid sequence specified by this small mRNA is methionine.

We know this sequence of amino acids by using a table of the genetic code. The genetic code table above is for mRNA and specifies the bases in the first, second and third positions of the codon with their corresponding amino acids.

Let's read the amino acid specified by the mRNA codon, AUG, using the table at the top of the page. First, place your left finger on the first position codon (A), in the first column of the table. Move your left finger across the row under the second position codon (U) in the first row. Now, place your right finger over the third position codon (G) in the same row of the last column (G). Move your right finger across the row until it meets your left finger and read the amino acid. You should see methionine.

The Protein Synthesis Process

Now let's look at the order of events in the synthesis of our protein from our sample mRNA:

A ribosome binds to mRNA with the AUG codon in the P-site and the UUU codon in the A-site.
An amino acyl-tRNA (anti-codon = UAC) with an attached methionine comes into the P-site of the ribosome.
An amino acyl-tRNA (anti-codon = AAA) with an attached phenylalanine comes into the A-site of the ribosome.
A chemical bond forms between the methionine and phenylalanine (in a protein, this covalent bond is called a peptide bond).
The methionine-specific tRNA leaves the P-site and goes off to gather another methionine.
The ribosome shifts so that the P-site now contains the UUU codon with the attached phenyl-alanine tRNA and the next codon (ACA) now occupies the A-site.
An amino acyl-tRNA (anti-codon) with an attached threonine comes into the A-site of the ribosome.
A peptide bond forms between the phenylalanine and the threonine.
The phenylalanine-specific tRNA leaves the P-site and goes off to find another phenylalanine.
The ribosome shifts down one codon so that the stop sequence is now in the A-site. Upon encountering the stop sequence, the ribosome detaches from the mRNA and splits into its two parts. The threonine-specific tRNA releases its threonine and leaves and the new protein floats away.

Several ribosomes can attach to a molecule of mRNA one after another and begin making proteins. So, several proteins can be made from one mRNA. In fact, in E. coli bacteria, translation of the mRNA begins even before transcription is finished.

Mitochondrial DNA

Mitochondria resemble an early form of bacteria, which is thought to have been captured into eukaryotic cells early in the history of life on Earth. The bacteria coexisted with the cell (endosymbiosis) and evolved into mitochondria. Another unique aspect of mitochondrial DNA is that you inherit it only from your mother (the mitochondria that exist in the egg cell). Although the sperm cell that fertilizes the egg contains a mitochondrion from the father, it does not get released and passed on.

DNA Mutation, Variation and Sequencing

DNA abnormalities — Chromosome abnormalities can be inherited from a parent (such as a translocation) or be "de novo" (new to the individual). This is why, when a child is found to have an abnormality, chromosome studies are often performed on the parents. National Human Genome Research Institute

In the human genome, there are 50,000 to 100,000 genes. As DNA polymerase copies the DNA sequence, some mistakes occur. For example, one DNA base in a gene might get substituted for another. This is called a mutation (specifically a point mutation) or variation in the gene. Because the genetic code has built-in redundancies, this mistake might not have much effect on the protein made by the gene.

In some cases, the error might be in the third base of a codon and still specify the same amino acid in the protein. In other cases, it may be elsewhere in the codon and specify a different amino acid. If the changed amino acid is not in a crucial part of the protein, then there may be no adverse effect. However, if the changed amino acid is in a crucial part of the protein, then the protein may be defective and not work as well or at all; this type of change can lead to disease.

Other types of mutations in DNA can occur when small segments of DNA break off the chromosome. These segments can get placed back at another spot in the chromosome and interrupt the normal flow of information. These types of mutations (deletions, insertions, inversions) usually have severe consequences.

There is lots of extra DNA in the human genome that does not code for proteins. Scientists used to believe noncoding DNA served no real purpose. However, they have recently discovered that at least some of it is integral to the function of cells, such as determining when and where some genes are turned on or off or assisting in protein assembly. These aren't necessarily benign duties. For example, a small change in noncoding DNA that alters the pattern of a critical protein can disrupt normal development and lead to health problems.

The Human Genome Project (HGP) was initiated in the 1990s with the goal of determining the sequence of the entire human genome and to answer some basic questions: What genes were present? Where were they located? What were the sequences of the genes and the intervening DNA (noncoding DNA)? The HGP was an overwhelming success, delivering the first rough draft human genome sequence in 2000 and the final high-quality version in 2003. In the years since, it has paved the way for medical advances in everything from identifying more targeted cancer treatments to diagnosing rare genetic disease.

This task was monumental, along the order of the Apollo space program to put the first man on the moon. The HGP scientists and contractors developed new technologies to sequence DNA that were automated and less expensive.

Basically, to sequence DNA, you place all the enzymes and nucleotides (A, G, C and T) necessary to copy DNA into a test tube. A small percentage of the nucleotides have a fluorescent dye attached to them (a different color for each type). You then place the DNA that you want to sequence into the test tube and let it incubate.

During the incubation process, the sample DNA gets copied over and over again. For any given copy, the copying process stops when a fluorescent nucleotide gets placed into it. So, at the end of the incubation process, you have many fragments of the original DNA of varying sizes and ending in one of the fluorescent nucleotides.

DNA technology will continue to develop as we try to understand how the elements of the human genome work and interact with the environment.