Essentials of molecular and cellular biology

This chapter progresses through the essentials of molecular and cellular biology, from the sequence whereby DNA synthetic information is used to synthesise RNA then protein to the means by which this generates the cell and extracellular structures, coordinates cellular interactions with each other and regulates the more complex responses in a multicellular environment. In a sense, the chapter follows an evolutionary theme, as increasing levels of complexity are reached. Unfortunately, this necessitates starting with some of the most difficult material in the chapter. The reader may need to revisit the first section and should not be deterred from following the chapter through. We aim to review preclinical material with a particular emphasis on processes relevant to clinical medicine, providing selective exanples of disease states. We also use the opportunity to introduce tools of investigative analysis in a manner which we hope will render them less daunting, and provide the basis for interpretation of scientific medical journals. Supplementary information useful for reference is given in Chapter 20, gives an overview of the sort of processes which will be described in detail later in the chapter.


Cellular DNA (deoxyribose nucleic acid) contains all of the information required to synthesise cellular and extracellular structures, and to regulate the cell's development in the environment of the whole organism. This is possible because strict and orderly pairing of bases between two nucleic acid strands provides both coding potential and the capacity to be replicated faithfully from generation to generation.

As illustrated a single strand of DNA is a linear polymer of nucleotide units, euch consisting of a pentose sugar (deoxyribose) linked via its 5 prime (5') carbon to a phosphate group, and via its l' carbon to one of four bases. Nucleotide units polymerise in a single strand by the formation of a sugar-phosphate-sugar-phosphate backbone, in which the orientation of the sugar carbons designates the direction of the strand S' to 3'. The accompanying four bases can occur in any order, and comprise purines (guanine (G) and adenine (A), which consist of two rings) and pyrimidines (cytosine (C) and thymine (7), which consist of only one ring).

The critical feature of DNA is its ability to direct the sequential association of incoming nucleotide units which polymerise running in the opposite direction to the parent strand . If these are deoxyribonucleotides, the second polymer will also be DNA and the two polymers will form a double-stranded structure through hydrogen bonding. Only two base-pairings are possible: G-C and A-T . This is because a stable structure requires three rings between the sugar-phosphate backbones, and optimal hydrogen bonding between bases on opposite strands. The double-stranded DNA unit of two nucleotides is referred tous a base pair (bp).

In turn,the most stable themensional structure for the double-stranded DNA polymer to adopt is a helix, in the double stranded DNA which the sugar-phosphate backbones spiral around, with 10 base pas per turn, the bases protected in the centre. Since this is a spiralling duplex, two different grooves are generated in the structure-a narrow minor groove spanning the base paired strands, and a wider major groove between consecutive spirals. Base pairs create specific

Table 1.1 Principles of Watson-Crick base-pairing
Base Base type Number of rings Hydrogen bonds Base pair
Guanine Purine 2 3 G-C
Cytosine Pyrimidine 1 3
Adenine Purine 2 2 A-T
Thymine Pyrimidine 1 2

patterns within the major groove, allowing their recognition by DNA binding proteins without breaking the helix. The helix is disrupted when DNA is required to act as a template, as in the natural and experimental situations illustrated. DNA helicases catalyse the unwinding of DNA and separation of the hydrogen bonds between bases using energy derived from adenosine triphosphate (ATP) hydrolysis. Inherited defects in genes encoding helicases are responsible for a number of diseases characterised by premature age-related defects and malignancy, such as Werner's syndrome.

The genome

In humans, one copy of the entire double-stranded DNA content is referred to as the haploid genome and consists of approximately 3 x 10 to the power 9 base pairs in 23 separate molecules, each part of a different chromosome. However, virtually all cells are diploid, with two copies of this genome in 46 chromosomes: 22 pairs of autosomes (numbered 1 to 22, generally in order of size), and two sex chromosomes (X and X for women; X and Y for men). In addition to DNA, the chromosomes contain a chromatin scaffold which packages DNA with histones and other small chromosomal proteins. This is subjected to different orders of higher packing according to need. For example, DNA which is not required by the cell as a template at a given time is kept in a particularly inert state, tightly packaged within the chromatin scaffold of the chromosomes, as exemplified by the structure of inactive chromosomes segregating to daughter cells during cell division.

The genetic code, genes and loci DITA diun ata thouthaic of the Ull as a template at a given Tupl in a particularly inert state, tightly packaged within the chromatin scaffold of the chromosomes, as exemplified by the structure of inactive chromosomes segregating to daughter cells during cell division.

The genetic code by which DNA directs the synthesis of the protein constituents of the cell is a series of words running 5' to 3' along the linear coding strand of DNA. Each word is a three-nucleotide unit (triplet) which specifies a particular amino acid to be incorporated into the mature protein. There are 43 (64) different triplets: 61 specify one of the 20 amino acids. Three, TAA, TAG and TGA, are nonsense codons which do not specify an amino acid and instead terminate the growing polypeptide chain.

Only about 1% of human DNA is decoded into protein sequences; these discrete areas within the genome are referred to as genes. By contrast, a locus can be any area of the genome. Not all of the DNA within a gene codes for the eventual protein: sequences within the gene include coding regions (exons), non-coding regions (introns) and regulatory sequences. DNA is not decoded directly into protein, since during transcription, chromosomal DNA remains in the nucleus whereas protein synthesis requires metabolic apparatus associated with ribosomes in the cytoplasm. Instead, a mobile molecule (messenger RNA, mRNA) carries the DNA sequence from nucleus to cytoplasm.


RNA synthesis and processing

To synthesise RNA in the process of DNA transcription, the enzyme RNA polymerase II and associated enzymes distort the chromatin structure to expose the underlying DNA, unwind a section of the double-stranded DNA helix, and disrupt the hydrogen bonds between the bases. As a result, a section of DNA can serve as a template for a new polymer, based on incoming ribonucleotide triphosphate units (in which the pentose sugars are ribose with a 2' -OH group). Note that in RNA the base uracil (U) replaces thymine (T) to base-pair with adenine (A), and since the transcribed RNA has the same sequence as the coding parental DNA strand, the template strand for RNA synthesis is actually the noncoding or complementary strand.

Immediately after synthesis of the primary mRNA transcript, nuclear proteins associate with the newly transcribed polymer which, as it is complementary to the parent DNA, includes introns. As illustrated in Figure 1.3, the primary RNA transcript is modified first by addition of a 5' CAP structure, and a 3' polyadenine (polyA) tail, which stabilise the ends of the short single-stranded molecule to protect it against intracellular breakdown. Secondly, highly accurate splicing machinery removes introns, joins adjacent exons, and creates an exon-only complement. The reagents involved include small nuclear RNAs complexed to proteins such as snRNPs U1 and U2. These assemble in a spliceosome complex which coordinates the recognition of splice site consensus sequences demarcating the exon-intron boundaries, and catalyses the requisite biochemical reactions. Splicing usually occurs exclusively between adjacent donor and receptor sites to excise a single intron accurately, although many genes have alternative splicing patterns, which may be exhibited in different tissues, at specific developmental stages or in response to exogenous stimuli.

The processed mRNA is then exported to the cytoplasm for translation into protein. Additional modifications may occur in specific tissues, and include mRNA editing, by which the actual coding sequence of the mRNA changes. For example, in apolipoprotein B, a C to U editing change produces a new termination codon resulting in a truncated apoB48 rather than apoB 100 protein (see p. 535). In other settings, prior to the translation process, precise signals can induce the premature decay of mRNA, including progressive shortening of the polyA tail, decapping and enzymatic cleavage.

Regulation and initiation of DNA transcription

Transcription of individual genes is regulated and finely tuned to the requirements of the cell and its environment. Before initiating the RNA synthetic process outlined above, RNA polymerase II identifies a gene to be transcribed by the recognition of upstream DNA regulatory sequences to which it hinds . Additional transcription factors also bind to the promoter and enhancers to increase or decrease the rate of transcription of the gene at specific times. The means by which transcription factors interact with DNA is illustrated . Structural features distinguish different families of transcription factors such as helix-turn-helix, zinc finger, leucine zipper and helixloop-helix motifs. Examples of transcription factors which will be discussed in later sections are the family which consists of homo- or heterodimers of c-Jun, c-Fos, bZIP and other proteins interacting at the common activating protein-1 (AP-1) DNA binding site.


Protin are synthesised from mRNA in cytoplasmic RNAprotein complexes known as ribosomes. These contain synthetic enzymes, and bind small single-stranded transfer RNAs. Each tRNA carries one of the 20 amino acids and the complementary sequence to the corresponding triplet codon, known as the anticodon. To commence protein synthesis , an mRNA molecule binds via its 5' CAP structure to the small 40S ribosomal subunit, which scans along the mRNA for the start codon AUG. The ribosome-bound initiator tRNA Met molecule carrying the AUG anticodon (UAC) base-pairs to the mRNA, when it activates the methionine which it carries. The activated methionine can then form a peptide bond with the aminoacid brought in by the next aminoacylt RNA,before the ribosome releases the now-empty initiatort RNA.The polymerisation of the amino acids to form a peptide chain is catalysed by peptidyl transferase in the large 60S ribosomal subunit, and proceeds rapidly with about 1000 amino acids polymerised per minute. The stop codons UAA, UAG and UGA are not recognised by tRNAs, but by other proteins termed release factors, which lead to peptidyltransferase adding a water molecule rather than an amino acid to the activated peptide bond.

The mRNA, peptide chain and ribosomes then disassemble, leaving the ribosomes free to associate with another mRNA molecule. However, one of the signals which promotes mRNA decay is impaired translation-for example, failure to initiate translation, incorrect site of initiation, or arrest at a premature stop codon . If the mRNA does not decay, at any one time, it may be associated with multiple ribosomes transcribing different sections of the code. The limiting event in this setting is the rate of initiation of synthesis, which depends upon the supply of ribosomebound initiator tRNA and adequate ribosomal initiation factors.


Genes and the proteins which they encode are often designated by the same name. Common means of distinguishing the two include referring to the gene in italics (c-abl: c-abl) or adding a small 'p' to designate the protein (RB: PRB). In addition, many proteins are designated by their molecular weight in kDaltons. Sometimes the name of the same protein described in a different organism is given as a superscript (e.g. p34CDC2).