// learn.shawon.ch / biology-101 / dna-genetic-code STUDY GUIDE
← Biology 101

Biology 101 · Chapter 2

DNA & the genetic code

Try this first

A cell must store the full recipe to build and run itself, then copy that recipe perfectly every time it divides — using nothing but molecules. If you had to design such a storage medium from scratch, how would you encode the information so that a copy can be checked and rebuilt with almost no errors?

Inside one of your cells, coiled into a nucleus about six micrometres across, sits roughly two metres of DNA. Stretched out it's a single long thread spelled in just four chemical letters — A, C, G, T — about three billion of them in order. That order is the complete instruction set for building and operating you. Nothing about the chemistry is exotic; the magic is entirely in the sequence, the same way a book's magic is in the order of its letters, not the ink.

The one idea

DNA stores information as a sequence of four letters, and each letter pairs with exactly one partner — A with T, G with C. So the molecule is two complementary strands carrying the same message twice. That single fact makes it self-copying: split the strands, and each one is a perfect template to rebuild the other.

Why pairing is the whole trick

The four letters are bases: adenine, thymine, guanine, cytosine. Their shapes only fit one way — an A across from a T, a G across from a C — held by weak hydrogen bonds. Two strands zip together into the famous double helix, but the helix is just packaging. The payload is the pairing rule.

Because the rule is rigid, the second strand carries no new information — it is the first strand's mirror image. That looks wasteful until you want to copy: the cell simply unzips the two strands and lets each one attract fresh partner letters, A calling in T, G calling in C, until two identical double helices stand where one did. Redundancy is the copy mechanism, and it doubles as error-checking — a mismatched pair bulges and gets caught.

two strands, zipped by base pairs 5' 3' 3' 5' AT TA GC CG AT GC TA A=T G≡C
Each rung is a base pair: A always with T, G always with C. The strands run in opposite directions (antiparallel), so each is a template for the other.

From sequence to protein: the code

Storing the recipe is half the story; the cell also has to run it. The instructions in DNA mostly spell out proteins — the molecular machines that do nearly all the work. Getting from letters to a working protein takes two steps, together called the central dogma:

DNA transcription mRNA translation protein
The central dogma. DNA is copied to a portable mRNA message (transcription); a ribosome reads that message three letters at a time and links amino acids into a protein (translation).

The reader is the codon: the ribosome steps along the mRNA in groups of three letters, and each triplet names one amino acid to add to the growing chain. With 4 letters in groups of 3 there are 4³ = 64 possible codons but only about 20 amino acids, so the code is redundant — several codons can mean the same amino acid, which softens the blow of a typo. A few codons are punctuation: AUG says "start" (and means methionine), and three others mean "stop".

A few entries from the genetic code
mRNA codonMeans
AUGMethionine — also "start"
UUUPhenylalanine
GGUGlycine
UACTyrosine
UAA"stop" — end of protein

Work one, then finish one

Worked: Suppose one DNA strand reads A-T-G-C-A-T. Apply the pairing rule (A↔T, G↔C) to each letter and you are forced into the partner strand: T-A-C-G-T-A. There's no choice and no extra information needed — that's exactly why one strand can rebuild the other during copying.

Your turn: Write the complementary strand for G-G-C-T-A-A. (Answer: C-C-G-A-T-T.)

Why this earns a place in your toolkit

DNA is a genuinely digital code: four symbols, two bits each, so your entire ~3.2-billion-letter genome is only about 750 MB of raw information — a blueprint for a human, smaller than a movie. That's why the life sciences became an information science. Genomes are stored and searched as strings; sequence alignment is the same dynamic-programming you'd use to diff two files; and complementarity is literally error-correcting redundancy. The frontier now runs both ways: "genomic language models" are trained on DNA the way LLMs are trained on text, and tools like AlphaFold read a gene's amino-acid sequence and predict the three-dimensional protein it folds into — turning biology's central dogma into a problem you can compute.

Recall check · no peeking

  1. State the base-pairing rules. Why does complementarity make DNA self-copying?
  2. What is a codon, how many possible codons are there, and why are there more codons than amino acids?
  3. Write the central dogma and name its two steps.
  4. Given one strand of DNA, how do you determine its partner strand?
  5. In what sense is each DNA letter "two bits" of information?

Explain it back

In one plain sentence, tell a friend why simply unzipping the two strands of DNA is enough to end up with two perfect copies.

Learn · Shawon Chowdhury · a study guide, kept rough on purpose