The mRNA that has been transcribed from the DNA is converted to a protein, requiring some kind of rules for translating from one alphabet to the other. The form this takes is that every three bases in the transcribed mRNA corresponds to a particular amino acid, and it is called the genetic code. (We can consider this correspondence as a relationship between DNA sequence and amino acids, since we know the correspondence between DNA and mRNA). These triplets of bases are called codons, and the code is non-overlapping with no ``space'' or ``pause'' symbols, so a sequence of 6 bases would correspond to exactly two amino acids. There are 64 possible values that a codon can assume, 61 of which correspond to an amino acid. The remaining three (UAA, UAG, UGA) code for chain termination. This is known as a ``universon code'' for nucleotide translation, and it is almost that. Some other less common codes do exist, particularly for mitochondria.
Note that for a given sequence of bases, there are three possible ways to divide it into a sequence of trinucleotides, by shifting the boundaries between codons over by one base. These are called reading frames, the frame that corresponds to how the sequence is actually read is called an open reading frame.
The main molecules involved in translation are
.
Figure 2: A diagram of transfer RNA. The RNA forms a 3-looped structure, and on one
of the loops is the ``Anti-codon'', which binds to the codon. tRNAs associate
specific amino acids with specific codons, according to the genetic code. The
ribosome is the enzyme that facilitates this translation from codons to amino acids.
Note that the mapping from the set of 64 possible codons to the set of 20 amino acids used in proteins is many-to-one, so every codon uniquely specifies an amino acid but some amino acids are represented by multiple codons (called synonyms). What is the reason for this degeneracy? One explanation is that this type of code is reasonably resistant to mutations. Often, substituting one amino acid for another won't have a terribly serious effect on the functionality of the translated protein. However a mutation of an amino acid codon to a terminating codon would probably render the translated protein useless, so a code that has one codon for each of the twenty amino acids and has 44 terminating codons would be quite sensitive to mutations.
Nevertheless, single nucleotide mutations can still have serious effects.
Take human beta-globin for example, a protein with on the order of a hundred
and fifty amino acids. A single mutation of A to T in the
codon
results in the replacement of Glu (coded here by GAG) by Val (GTG) and results in a form of
haemoglobin leading to the disease sickle-cell anemia.
Another possible reason for the degeneracy of the code is that it allows the same proteins to be coded differently under alternate base compositions. Organisms have different G + C contents - for example organisms in hotter environments tend to have more G + C in their DNA, as G-C bonds are 50% stronger than A-T bonds, raising the temperature at which their DNA denatures. Hence their usage of different trinucleotides coding for the same amino acids.
As mentioned above, three of the 64 possible codons code for termination of
the amino acid sequence. These codons are read by proteins called
release factors instead of by tRNAs, and when the ribosome reaches such a
stop codon the release factor is incorporated and terminates the
translation. The start signal is not as
straightforward. One of the 64 codons (AUG) codes for the start, but it is also
the only codon that codes for the amino acid Met within a protein. So
the signal for starting translation involves more than just looking
for AUG. In prokaryotes the start codon is often preceded 10 nucleotides
away by a purine rich region (high concentrations of A and G), whereas in
eukaryotes the start codon is often just the first AUG encountered after the
end of the mRNA.