Genetic information is stored in the sequence of nucleotide bases in the DNA molecule. When this information is presented in an organism, we say that it has been expressed. The evidence of expression is the presence of a protein, made using the instructions present in the genetic code.

The first step in the expression of a protein is transcription. This is the construction of a specialised nucleic acid called messenger RNA (mRNA) based on one strand of the DNA molecule.

RNA differs from DNA in that it is single stranded (although it may double over on itself in regions), it contains the pentose ribose instead of deoxyribose, and it contains the nucleotide base uracil rather than thymine.

RNA synthesis is catalysed by enzymes called RNA polymerases (in eukaryotes, RNA polymerase II). RNA polymerase binds to a sequence of nucleotides on the DNA molecule before the gene to be transcribed called the promoter. Promoters guide the RNA polymerase where to begin transcription, which direction to start transcription, and which strand of DNA to transcribe. Promoters can bind with varying strengths, which is one reason why some proteins are expressed more than others (i.e. their promoters bind more strongly).

The process of transcriptionThe RNA polymerase (along with some “helper” proteins) then unwinds a small portion of the DNA duplex. Using one strand as a template, it assembles a strand of mRNA from ribonucleotides.

One strand of DNA is designated the “coding strand”, while the other is the “template strand”. mRNA is constructed using the template strand, with the enzyme moving down this strand from 3’ to 5’.  Ribonucleotides are added to the 3’ end. As RNA polymerase moves along the DNA molecule, the separate strands join back together again.

When the RNA polymerase reaches a termination sequence, the enzyme drops off the DNA molecule and the mRNA molecule finds its way to a ribosome for the process of translation. In prokaryotes, where there is no nucleus, translation occurs immediately. In eukaryotes, the mRNA is modified so that it can make it out of the nucleus and into the cytoplasm. The 5’ base is modified (“capping”), a long stretch of adenosines are added to the 3’ end (“poly-A tail” and sections of non-coding RNA (introns) are removed and the remaining RNA (exons) joined back together.

The mRNA attaches to the ribosome, either through a binding sequence on the mRNA (in prokaryotes) or via the 5’ cap (in eukaryotes). The ribosome then moves down the mRNA until it finds the start sequence (AUG).

Floating around the ribosome are various tRNA (transfer RNA) molecules. Each of these has a specific sequence of three bases which is complementary to bases in the mRNA. The sequences on the tRNA are called “anti-codons” which bind to their complementary “codons” on the mRNA. For each anti-codon, a different amino acid is also bound to the tRNA.

e.g. The start codon is AUG. The anti-codon for this is UAC – this sequence is found on a tRNA which only binds to the amino acid methionine.

There are 20 different amino acids and 64 different combinations of A, U, G and C. This allows for a certain level of redundancy (i.e. one amino acid may be coded for by more than one codon) as well as three “stop” codons. The list of codons which code for each amino acid is known as the genetic code. You can access a table showing the genetic code here.

The ribosome holds the mRNA in place until the tRNA with the corresponding anti-codon binds onto the codon on the mRNA. Then the tRNA containing the anti-codon for the next codon binds next to it. The amino acids bound to each of these tRNA molecules are brought close together and a peptide bond formed between them by the enzyme peptidyl transferase.

The ribosome then moves down the mRNA to the next codon and the process is repeated. New amino acids are added to the C-Terminus of the growing peptide chain. As amino acids are joined, they separate from the tRNA and the peptide chain is freed.

When the ribosome reaches a stop codon, no tRNA with an amino acid exists to bind, so the peptide chain separates and goes on to fold into its secondary and tertiary structures.

It is important to understand the relationship between the DNA sequence and the sequence of amino acids in the peptide chain. The coding strand of the DNA, read from 5’ to 3’ gives the sequence of bases which codes for the protein (substituting “U”s for “T”s). In other words, the coding sequence, written 5’ to 3’, gives the amino acid sequence from N terminus to C terminus.

Interactive Concepts in Biochemistry provides an excellent animation explaining this process here (note : this link opens to an external website).