Prerequisites: Molecular biology and genetics
See also: Proteomics
They are, if anything, more important even than DNA and RNA -- which are the blueprints for making proteins. However, life as we know it is not even conceivable without both proteins and nucleic acids. The simplest forms of life -- viruses (treating them for the sake of discussion as "living") -- consist of nothing but protein and nucleic acid.
Schoolchildren learn that "proteins" are an essential component of food, among others such as carbohydrates, fats, etc. Interestingly enough, however, proteins consumed as food are usually broken down by the process of digestion into their constituent amino acids. They are seldom used by the body directly. Instead, the amino acids are the genuinely important nutrients. All the proteins which a living body actually needs -- which may be as many as a million in humans -- are manufactured within cells under direction from the genes.
Just to show the diversity of functions that proteins have in living cells, here is a partial list of the principal ones:
The peptide bond forms between the carboxyl group of one amino acid and the amino group of another (expelling one water molecule in the process). Proteins may contain upwards of 1500 amino acids linked in this way. A generic name for such a chain is a "polypeptide". Shorter chains, less than about 40 amino acids, are conventionally referred to as "peptides" rather than proteins. Peptides often occur as hormones and neurotransmitters. In addition, they can usually be synthesized chemically in the laboratory, whereas biologically active proteins usually can't.
The side chain of an amino acid can be as simple as a single hydrogen atom, yielding "glycine". In glycine, the hydrogen atoms are on opposite sides of the central carbon. Except for glycine, all amino acids can have their four parts arranged around the carbon in one of two different ways -- clockwise or counter-clockwise. This leads to the circumstance that each amino acid can exist in two forms, known as "stereo-isomers" or "enantiomers", which are chemically but not physically identical. This handedness property is known as "chirality". In solution enantiomers polarize light passing through in different directions.
But more importantly, all amino acids that occur naturally in proteins are of one type rather than the other. These types are designated as L- or D- enantiomers (from Latin levulo and dextro). The L-form is the one that occurs naturally. The D-forms are biologically inert -- they cannot be manufactured into proteins by biological processes. When D-amino acids or polypeptides based on them are synthesized chemically, they are not used by living cells. It is a rather mysterious open question as to how this asymmetry happened to develop as it did in all forms of life.
Another unanswered question is why only 20 different amino acids -- out of a large number that are chemically possible -- actually occur in proteins. Undoubtedly these choices reflect chemical conditions which existed when protein-based life first emerged. We just don't happen to know what those conditions were.
Since any given protein is a linear sequence of specific amino acids, the protein is chemically defined by the sequence. This sequence is known as the "primary structure" of the protein. Most proteins, however, do not actually have a linear shape. They tend to fold up in very complex ways into compact forms, which is the secret of their biological activity. (A few structural proteins that are fibrous, such as the keratin in hair, are mostly linear.)
It turns out that most proteins fold up into their effective 3-dimensional shapes in just a single way. That is, the amino acid sequence usually determines the shape, and hence the function, of the protein. (And the sequence was itself determined by the sequence of nucleotides in the gene which ultimately specified the protein.)
The few cases where more than one shape may result from a given amino acid sequence can cause all sorts of havoc. This circumstance is exactly what gives rise to so-called "prion diseases", such as the famous mad cow disease. What happens here is some (a very few) proteins which can assume an alternate form are also capable of converting all other like proteins to the alternate form, destroying whatever functionality they may have had.
Anyhow, there are three more levels of structure beyond the primary structure. The second level -- the "secondary structure" -- consists of the "local" shapes assumed by portions of the protein. Certain shapes occur commonly enough that they have names such as the "alpha helix" and the "beta-pleated sheet".
The way that the various components of the secondary structure fold together into some definite configuration relative to each other creates the "tertiary structure". The various bumps and pockets thus formed in the protein molecule become the main way that other molecules attach to the protein, or vice versa.
There is, lastly, a "quaternary structure". This comes about in many proteins which actually consist of two or more polypeptide chains (subunits). Each of these assumes some tertiary structure, and then chemical bonds form in certain locations to hold the subunits together.
As it happens, most complete polypeptide chains will spontaneously assume just one tertiary structure determined by the amino acid sequence (the primary structure). This 3-dimensional shape generally represents the lowest energy configuration for the sequence. At least, this is what has long been thought. It now appears that many, perhaps most, proteins may have alternative tertiary structures. This seems to be somewhat of a murky issue now.
In any case, whether there is just one tertiary structure or several that are possible, the shape or shapes which can result should in principle be computable knowing just the primary structure. But in our present state of computational technology, actually performing this calculation from first principles for largish proteins is enormously difficult -- mostly beyond the capabilities of our most powerful existing supercomputers, even in the "terascale" class. Finding some more effective techniques or algorithms for doing this computation is known as the protein folding problem.
There's one more wrinkle to this folding issue which might be noted. All the observations on the general uniqueness of the 3-dimensional structure for a particular protein assume that the polypeptide already exists as a complete entity. But of course, during the synthesis of proteins in the ribosomes of cells, each polypeptide chain is only partially complete until the very end. What prevents these chains from folding in an inappropriate way when it is, say, only half complete? The answer to this is partially known. It turns out that there are specialized proteins called "chaperones", appropriately enough, which assist in the process to ensure that the partially completed polypeptides don't fold up the "wrong" way.
But there is another problem, even for relatively small proteins. Some simple proteins, such as blood serum albumin, really are composed of nothing but amino acids. Unfortunately, most proteins actually consist of more than just amino acids. These are called "conjugated" proteins, and they are decorated with a variety of other molecules -- like sugars, fats, or nucleic acids. Somehow, cells know how to do this in just the right way. Protein chemists don't know how to do this, yet.
Genetic engineering of bacteria theoretically makes possible the production of at least smaller proteins, by inserting the approriate DNA into the bacterial genome. (There's a limit on how much DNA can be inserted, limiting the size of proteins that can be produced.) Unfortunately, there are glitches even here. It appears that mammalian cells (including human ones) attach different sugars to their proteins than bacteria do. (The process is called "glycosylation.) They may fold the proteins differently. And they can make larger proteins than bacteria.
Net result, even with genetic engineering of other organisms, it isn't at all easy to manufacture proteins. The best we've done so far is we've learned how to use other mammalian cells (such as those of Chinese hamster ovaries) to make proteins acceptable for humans.
Copyright © 2002 by Charles Daney, All Rights Reserved