See also: Gene expression and regulation -- RNA biology -- Genetics and genomics -- Molecular evolution
Even further afield, historical linguists think that the root is related to other words suggesting birth, such as native (=indigenous), nativity, natal, nation, and nature. Not to mention, pregnant.
The point here is that the word "genetics" has many rich associations with the means by which life has come to be. And that is precisely what the subject is about. We have the additional scientific terms ontogeny (how individual creatures develop) and phylogeny (how the tree of life itself developed). The word could not be more apt, because molecular genetics is absolutely fundamental to understanding both these processes.
The original scietific meaning of the word gene was some (at the time unknown) unit representing genetic information. It was the something in the hereditary material of peas which (Mendel presumed) made them wrinkled or smooth. It is the something in our makeup (we presume) that gives us blue eyes or brown.
But that meaning, we now know, is a vast oversimplification. The full truth is much more complicated, but also much more amazing. Unfortunately, we too often encounter a misuse of the outdated concept. There are certainly genetic factors in personality and intelligence, for example. But it isn't just one gene. Traits of this sort are undoubtedly influenced by dozens or even hundreds of genes, to say nothing of environmental influences besides.
The term "molecular genetics" refers to our modern, rigorous science of genes, "the genome", DNA, etc. The subject has received a lot of attention since it was announced in 2000 that the "sequencing" of the human genome was (almost) complete. Does that mean there are no more important open questions left in this area?
Hardly.
More than a year after the announcement, there were still disagreements over some pretty basic details. Are there only 30,000 human genes? Or more like twice that? Or somewhere in between? How is it that we can know (99%) the precise order of the 3 billion base pairs in the human genome, and yet not know what all the genes are?
The answer is simple: knowing the order of all the base pairs is a good start, but it's only a start. It simply brings us to the point where we can reasonably ask (about any given genome, not necessarily that of humans) three fundamental questions:
The third constituent is one of a set of 5 "bases", each of which is a relatively simple molecule consisting of carbon, nitrogen, oxygen, and hydrogen. The 5 bases are adenine, guanine, cytosine, thymine, and uracil, usually abbreviated A, G, C, T, U respectively. Thymine occurs only in DNA, while uracil occurs only in RNA, but chemically they are very similar. In DNA and RNA one base is attached to each sugar molecule, like charms on a bracelet.
Each base has a tendency to pair off with a specific one of the others, by means of a "hydrogen bond". G always pairs with C. A pairs with either T or U. There is one more important difference between RNA and DNA. RNA is always a single strand. In DNA, however, two strands usually occur together with corresponding bases pairing together in the manner described. This usually results in the "double helix" structure, with the two strands twining around each other, their bases in the center and the sugar-phosphate backbone on the outside.
Given all this, the process of making a protein from a gene is easy enough to describe at a high level. In the first step, an enzyme (i. e. a special type of protein) called RNA polymerase (because it builds RNA polymers) does the work. The RNA polymerase attaches to a strand of DNA at a particular marker sequence which indicates the start of a gene. It them moves down the strand building an RNA molecule as it goes along. For each nucleotide on the DNA strand, the RNA polymerase adds a new unit on the growing chain using a nucleotide which contains the complementary base, selecting nucleotides from the surrounding acqueous medium. This "transcription" process stops when the polymerase encounters an appropriate base-sequence marker at the end of the gene.
The resulting RNA is called "messenger RNA" (mRNA). Once completed, it drifts away from the DNA to another part of the cell (outside the cell nucleus if there is one). Eventually the mRNA encounters an important cellular device called a ribosome. The ribosome is a complex of several proteins and another type of RNA molecule (ribosomal RNA or rRNA). It is the machine that "translates" mRNA into proteins. Basically what the ribosome does is to read the mRNA sequence just as the RNA polymerase read the original DNA and to build up proteins from the given sequence information.
It is at this point that one important fact comes in. Proteins are polymers of another sort of small molecule called an amino acid. A large number of amino acids are known, but only 20 actually make up proteins. Still, there are only 4 different bases in RNA, so there can't be a simple 1:1 correspondence. But if you take any string of 3 bases together, that yields 64 possible sequences, which is more than enough to specify any needed amino acid. That is exactly what happens, and the resulting mapping from base triplets to amino acids is called the "genetic code". It's a four-letter alphabet that forms 64 possible 3-letter words. Each word specifies a particular amino acid (and in most cases a particular amino acid can be specified by more than one word). A few words are reserved to specify the end of a coding sequence, like the period at the end of a sentence. It is a remarkable fact that this code is universal to all forms of life (with only a few almost trivial exceptions). This is very strong evidence for a single common ancestor of all presently existing organisms.
The ribosome, then, simply employs this genetic code to construct proteins one amino acid a time, given the sequence information in the mRNA. The code itself is actually not built into the ribosome, but rather resides in yet another type of RNA called "transfer RNA" (tRNA). Each tRNA is a relatively short molecule which has an appropriate 3-base sequence at one end and a docking location at the other which attaches only one specific amino acid. The tRNA molecules with attached amino acid are found floating around in the ambient medium and fundamentally all that the ribosome does is find the right one to use at any particular point in its reading of the mRNA.
There are a number of important details omitted from this description, but at a high level, there isn't anything more to the process of building proteins from a DNA template than the mechanical process of matching one sort of thing with the appropriate other sort of thing.
Note also how three different types of RNA have played crucial roles: mRNA acts as a copy of the original (DNA) template. rRNA makes up a key functional part of the ribosome. And tRNA implements the genetic code. The several active roles that RNA plays in this process have led to speculation that RNA was one of the original molecules of life, and that it was somehow, at an early stage, the key chemical player in a priomordial "RNA world". This is one principle possible scenario for the origns of life. It seems easier to imagine that different types of RNA working together eventually managed to "invent" proteins -- which today actually make up the material of living things as we know them -- than to suppose that proteins somehow came first and "invented" RNA.
But we're certainly not sure what happened. Proteins definitely play crucial roles at every stage of copying RNA and DNA. It is possible that proteins somehow came about first and eventually managed to encode their own blueprints in a crude primal form of RNA. In other words, at some point in time, there may have been relatively simple proteins which were able to "read" the sequence of other proteins and record it as RNA. This is doubtful mainly because we don't know of any such process going on today. Because of its linear nature, RNA (or DNA) is much easier to "read" than proteins are, making it much more convenient for information storage and retrieval.
The subject we're looking at here is sometimes called "gene regulation": How does the cell chemistry distinguish genes from all the rest of the DNA in the first place? And how does it decide whether to actually transcribe the gene into mRNA, i. e., whether to "express" the gene? One key fact we must allow for in a more detailed examination of gene regulation is that not all cells do it in the same way. In other words, unlike the genetic code itself, gene regulation is not the same in all life forms. Hence it has gradually developed during the evolutionary process. (Yet most of the details do seem to have appeared early enough to occur in most life forms other than bacteria.)
Very broadly speaking, there are two types of cells. The first, called prokaryotic cells or prokaryotes are very small, simple, and presumably primative cells -- with bacteria being the prime example. The second type, called eukaryotic cells or eukaryotes, are larger, more complex, and presumably a later evolutionary development. The cells in almost all types of life other than bacteria -- i. e. protists, fungi, plants, and animals -- are eukaryotic. Because it's more complex and interesting, we'll focus on gene regulation in eukaryotes, but point out how it differs in prokaryotes. (One other type of cell, found in organisms known as archaea, is somewhat intermediate in this regard, but we'll leave it out, for simplicity.)
There's also one key fact we need to note about nucleic acids, either RNA or DNA. That is, they are not just symmetrical strings of nucleotides in which there is no sense of "forward" or "backward". On the contrary, there is a definite directionality. It arises because the orientation of the sugar units in each nucleotide of the chain is important. Each sugar is conventionally said to have a 3' end and a 5' end (the numbers referring to numbering of the 5 carbon atoms in the sugar molecule). The 5' position is where the phosphate group is attached in a single nucleotide. (The base of a nucleotide is attached at the 1' position.) In the process of polymerization, one nucleotide is attached to another by establishing a bond between the phosphate on one nucleotide and the 3' position of the sugar of another nucleotide. As a result, a string of nucleotides itself has a 3' end an a 5' end, and each additional nucleotide can be added only at the 3' end.
What is the practical significance of this? It is simply that polymerase enzymes which costruct DNA or RNA from an existing string of nucleotides must build the new string in the 5' to 3' direction, since the 3' end is the only one that can "grow". Given this, it is a further fact that RNA polymerase "reads" a nucleotide string only in the 3' to 5' direction. That is, RNA polymerase looks for the "next" nucleotide to be transcribed at the 5' end of the "current" nucleotide. (DNA polymerase, which replicates an existing strand of DNA, also reads in the 3' to 5' direction.)
Somewhat confusingly, by convention, nucleotide sequences in single strands of DNA and RNA are specified in the 5' to 3' direction, which is the order in which they are assembled, but the opposite of the way they are read by polymerase. One more fact is that in doubly stranded DNA, the two strands run in opposite directions, so in this form of DNA, choice of direction is again arbitrary -- how would one choose which strand defines it? One final bit of terminology: the 5' to 3' direction of a single strand is said to be "downstream", while the opposite 3' to 5' direction is said to be "upstream".
We now have enough terminology to talk about how gene transcription (the copying of nucleotide sequences from DNA to mRNA) and regulation occurs. Most importantly, there are specific short sequences which are markers -- usually called "promoters" -- of the approximate start of a gene. Such markers always include the sequence TATA. Consequently, the marker is sometimes called a TATA box. The RNA polymerase enzyme attaches to the promoter sequence in a specific direction -- "downstream", so that it is oriented, on the strand containing the promoter, in the 5' to 3' direction relative to the promoter. (Hence the promoter occurs upstream of the gene itself.
In eukaryotes, however, there is a special protein, one of a class called "transcription factors", which first attaches to the marker region. Additional transcription factors then attach to the first, and only when enough of the "right" transcription factors are present can the RNA polymerase attach and begin working. The transcription factors are one of the main ways in which gene expression is regulated. Since transcription factors are proteins, which are manufactured under the direction of other genes, this is how one gene can affect the expression of another. Indeed, there can be a whole sequence of such relationships between genes. In this way, a set of genes may become turned on in sequence, altering the characteristics of the cell at each stage. This seems to be the basic "secret" of how specialized cells arise from more generalized types of cells in the embryonic development process of multicellular organisms (which are almost always composed of eukaryotic cells).
If you've followed this discussion closely, one detail may be bothering you. We said that RNA polymerase reads in the 3' to 5' direction. Yet it is oriented in the 5' to 3' direction relative to the marker where it attaches. To resolve this apparent contradiction, it is crucial that DNA contains two strands oriented in opposite directions. The marker region may occur on either strand (when read in the appropriate sense). But what then happens is that the polymerase actually reads from the opposite strand from the one containing the marker. Because the opposite strand is oriented in the "correct" (3' to 5') direction to be read by the polymerase, everything works out just right. The net result is that either strand may be the one which is actually transcribed, but there is no ambiguity, because everything is arranged properly by the location and orientation of the marker.
Is this perhaps a reason why DNA is double-stranded? Not necessarily. There are other clear advantages of the double-strandedness. Redundancy may be the main one: Any damage that affects one strand may be detected and even corrected by various mechanisms which use the information from the other (hopefully undamaged) strand. (This is in addition to the redundancy which occurs because genes are duplicated when they are found on paired chromosomes.) The double-stranded architecture of DNA has a variety of ramifications, and it's impressive how it all fits together.
We must note at this point that prokaryotic cells do not have transcription factors, so things are a little simpler. In this case, the RNA polymerase is ready to begin working as soon as it attaches to the promoter. (Even so, a small molecule called a "sigma peptide" assists in this process.) There are, however, proteins -- called "repressors" -- which can bind to the promoter and prevent polymerase from attaching. This is called "negative regulation". But there is yet another type of protein -- called an "inducer" -- that can bind to a repressor and make it let go of the promoter region, thus enabling transcription. This is an example of "positive regulation". The genes that specify such repressors and inducers are thus able to affect the expression of other genes. This mechanism seems best suited to allow prokaryotic cells (which are usually single-cell organisms like bacteria) to change behavior depending on their environment.
Something similar seems to happen in eykaryotes. But in this case there may be multiple regions of the DNA, not necessarily either upstream or close to the affected gene, to which proteins may attach and either facilitate or inhibit transcription of the gene. Such regions of DNA are not (as far as is known) parts of any gene. They are simply noncoding regions called enhancers or silencers, depending on their effect. The proteins which bind to such regions to make them effective are called activators and repressors, respectively. Here again, the genes which code for the activators and repressors are able to affect the expression of other genes, in a potentially multiple-step chain.
There is an totally different mechanism of gene regulation which works in eukaryotes but is entirely absent from prokaryotes. To discuss it, we have to know a few more facts about eukaryotic DNA. Unlike prokaryotic DNA, it does not just occur "loose" within the cell. Instead, the main cellular DNA (as opposed to that found in mitochondria) is contained in the chromosomes.
Since eukaryotic cells are complex (and usually part of even more complex organisms), their DNA needs to contain a lot of nucleotides (although, a lot of this seems to be "junk" of no apparent function). The 3 billion nucleotides in one strand of human DNA would stretch out about 2 meters if laid out straight. Yet this needs to be packed within a cell nucleus that's less than a millionth of a meter in diameter. And all without getting impossibly tangled up!
There is a systematic way this is accomplished. The DNA is wound around a set of proteins called histones. DNA winds twice around each histone, forming what is called a nucleosome. It then winds around another histone, and so on. The histones neutralize negative charges on the DNA sugar-phosphate backbone and allow it to be stored quite compactly. The entire string of nucleosomes is called chromatin, which is the "stuff" of the chromosomes.
As you might suspect, it is not as easy for proteins to get access to any segment of DNA when it's bound up this way as it would be if the DNA were simply floating around naked. Although we are just beginning to understand what's happening here, it appears that there are chemical markers attached to the histones which affect how proteins such as DNA polymerase, transcription factors, activators, and repressors interact with DNA. From our description of the transcription process, it's easy to see that anything which interferes with the interaction between DNA and these proteins is going to affect gene expression.
For example, the attachment of acetyl groups to histone proteins (a process called acetylation) may play a role in turning genes on. Likewise, the attachment of methyl groups to DNA itself ("methylation") seems to have the effect of turning genes off. (Or perhaps, more exactly, it prevents genes that are turned off from being turned back on.) In the process known as "imprinting", this seems (in a few cases) to allow for disabling genes which come specifically from either the mother or father.
Researchers are now speculating that there may be such a thing as a "histone code", analogous to the genetic code of DNA, which dictates gene regulation at a high level. In other words, the occurrence of a specific histone at a specific place may be meaningful in gene regulation.
Copyright © 2002-04 by Charles Daney, All Rights Reserved