See also: Number theory -- The Riemann hypothesis -- The Langlands program
Muhammad ben Musa al-Khwarizmi seems to have been the first person whose writing uses the term al-jebr. As he used it, the term referred to a technique for solving equations by performing operations such as addition or multiplication to both sides of the equation -- just as is taught in first-year high school algebra. al-Khwarizmi, of course, didn't use our modern notation with Roman letters for unknowns and symbols like "+", "×", and "=". Instead, he expressed everything in ordinary words, but in a way equivalent to our modern symbolism.
The word al-jebr itself is a metaphor, as the usual meaning of the word referred to the setting or straightening out of broken bones. The same metaphor exists in Latin and related languages, as in the English words "reduce" and "reduction". Although they now usually refer to making something smaller, the older meaning refers to making somethng simpler or straighter. The Latin root is the verb ducere, to lead -- hence to re-duce is to lead something back to a simpler from a more convoluted state. In elementary algebra still one talks of "reducing" fractions to lowest terms and simplifying equations.
The essence of the study of algebra, then, is solving or "reducing" equations to the simplest possible form. The emphasis is on finding and describing explicit methods for performing this simplification. Such methods are known as algorithms -- in honor of al-Khwarizmi. Different types of methods can be used. Guessing at solutions, for instance, is a method. One can often, by trying long enough, guess the exact solution of a simple equation. And if one has a guess that is close but not exact, by changing this guess a little one can get a better solution by an iterative process of successive approximation. This is a perfectly acceptable method of "solving" equations for many practical purposes -- so much so that it is the method generally used by computers (where irrational numbers can be specificed only approximately anyhow). Some approximation methods are fairly sophisticated, such as "Newton's method" for finding the roots of polynomial equations -- but they're still based essentially on guessing an initial rough answer.
Another method for finding solutions of equations is by means of geometric construction. One can construct geometric figures in which the length of a certain line segment is a solution to some given equation. This works well, for example, when square roots are needed, since the hypotenuse of a right triangle has a length which is the square root of the sum of squares of the other two sides of the triangle. That is, if the lengths of the sides are a, b, and c, then a2 + b2 = c2 and hence c = √(a2 + b2). If a and b are whole numbers, so is the sum of their squares. Algorithms for finding the approximate square root of a whole number were known, so c could be computed approximately. However, with a geometric construction, c could be found simply be measuring the length of the right line segment. For future reference, note that an interesting problem is finding two numbers a and b such that for some given number d, d = a2 + b2. This is because if d is given, finding a and b enables one to find the square root of d by a geometric construction.
In addition to such approximation and geometric methods, al-Khwarizmi was interested in methods for finding exact solutions by a sequence of steps that could be described in cookbook style -- algorithms. This can be done completely successfully for linear equations of the form ax + b = c (in modern notation) using just the arithmetic operations of addition, subtraction, multiplication, and division. Just as important, the operations can be performed symbolically -- not just on particular numbers, but on symbols that stand for "any number". And so, one seeks to express the solution of a given equation in a symbolic form.
An interesting question is that of when the idea of representing equations in symbolic form arose. It's not easy to answer such a question, in part because symbolic representations were used before their full and considerable utility was recognized. For instance, Greek geometers labeled the lines in their figures with single letters, so it was natural to write what we now recognize as the Pythagorean theorem in the form a2 + b2 = c2. But the importance of this representation was somewhat blurred, since the distinction between a line and the length of a line was not fully appreciated. In fact, although Greeks and other early mathematicians (e. g. in India) used symbolic equations, al-Khwarizmi did not. (Hence it is likely he didn't know of Greek mathematics and much of his work was original, if not always as advanced as that of the Greeks.)
In modern notation, polynomial equations can be classified in terms of the highest power of any variable which occurs in them. We call an equation linear if the highest power is one, because its graph is a straight line. If there is just one variable, such an equation has the most general form ax + b = 0. If the highest power is two, the equation is called "quadratic", and has the form ax2 + bx + c = 0. (Why does the Latin prefix quad, usually associated with the number 4, occur here? Simply because the word for "square" is quadra in Latin.) In spite of lacking a symbolic representation of equations, al-Khwarizmi effectively did know the quadratic forumula which says that there are two solutions of the last equation, that can be written as x = {-b±√(b2-4ac)}/2a. He also realized that the equation has solutions at all in terms of "real" numbers only if the quantity we now call the discriminant, b2-4ac, is not negative.
The highest power of an unknown which occurs in a given polynomial equation is known as the degree of the equation. Although al-Khwarizmi doesn't seem to have studied equations of degree 3, called cubic equations, a more famous successor, who lived about 250 years later, did: Omar Khayyam (ca. 1050-1123), a Persian. Like his predecessor, Khayyam did not work with symbolic expressions for the equations. But he was able to produce solutions using geometric constructions (involving conic sections), provided a positive solution exists. He also thought, mistakenly, that such solutions couldn't be found by algebraic (algorithmic) methods of the sort al-Khwarizmi used.
The next substantial advance in solving equations began when scholars in Western Europe began to study and appreciate the work of people like al-Khwarizmi and Khayyam. Most notable among these scholars was Leonardo of Pisa, more commonly known as Fibonacci (ca. 1180-1250). He showed that algorithmic (as opposed to geometric) methods could be used to find solutions of some cubic equations. Fibonnacci had a much more obscure contemporary, Jordanus Nemorarius, of whom little is known apart from several books attributed to him on arithmetic, mechanics, geometry, and astronomy. He made a more systematic use of letters to stand for "variable" (not necessarily unknown) quantities in equations, but the importance of this technique was still not widely appreciated
With the advent of the Renaissance, progress in mathematics began to speed up. One of the first notable names was a German, Regiomontanus (1436-76). Though he produced less original work than others, he was widely read in the classic works of both the Greeks and the Muslim world. In particular, he had studied the Arithmetic of Diophantus of Alexandria (who was active around 250 CE) in the original Greek, and even considered publishing a Latin translation, though he never got around to it. Diophantus was in some respects more advanced than any other mathematician before the Renaissance, and among the problems he considered were what are now called Diophantine equations. The relevance of such problems will be explained shortly.
Somewhat more original than Regiomontanus was a Frenchman, Nicolas Chuquet, who died around 1500. He used expressions involving nested radicals farily close to the modern style, such as √(14-√180)), to represent solutions of 4th degree equations.
The real breakthrough came in the work of several Italians in the 16th century. In 1545 Girolamo Cardano (1501-76) published explicit algebraic solutions (that is, using arithmetic operations plus extraction of roots) of both cubic and quartic (4th degree) equations. Cardano, however, did not discover the solutions himself. The result for cubics was known before 1541 by Niccolo Tartaglia (ca. 1500-57), though apparently discovered even earlier by Scipione del Ferro (ca. 1465-1526). Cardano admitted he had not discovered the solution, but apparently he did break a promise to Tartaglia to keep the results a secret. (Just as now, precedence in publishing new scientific results was a matter of great prestige.) As for the quartic, Cardano states that the solution was discovered by Ludovico Ferrari (1522-65), though at his [Cardano's] request.
Such rapid progress naturally raised the question of solutions to equations of 5th degree (quintics) and higher, either by algebraic means (using arithmetic operations and radicals) or at least by means of geometric constructions (using only straightedge and compass). Surprisingly, it was proven almost 300 years later that solutions of either sort were not possible in general, i. e. for all cases. This was done independently by two young men, Niels Henrik Abel (1802-29) in 1824 and Évariste Galois (1811-32) in 1832. Galois' result is especially important, as it is based on very novel methods of abstract algebra -- the theory of groups -- and in fact Galois' ideas thoroughly permeate the theory of algebraic numbers we will discuss.
In spite of that astonishing negative result, only a few year earlier Carl Friedrich Gauss (1777-1855) had proven in his doctoral thesis of 1798 that polynomial equations of any degree n must have exactly n solutions in a certain very specific sense. This result was so important it became known as the fundamental theorem of algebra. The exact sense in which that theorem is true is the subject of the other part of this story: "numbers".
The next most "natural" sort of number includes the negatives of all natural numbers. Collectively all natural numbers and their negatives are known as integers. Mathematicians use the symbol Z to denote the set of all integers. The idea of negative numbers seems to have existed in China before 400 CE. The Chinese had specific tools for reckoning with negative quantities (e. g. debts), but they had even less algebra than the Greeks. For their part, the Greeks seem to have had no concept of negative quantities as such. Negative numbers may have made their first appearance in the written record in the work of the Indian mathematician Brahmagupta early in the 7th century CE. He seems to have been the first to use both 0 and negative numbers systematically, and even recognized that a negative number could be the root of a quadratic equation. (For instance, both +2 and -2 are solutions of x2-4=0.) But since it was not easy to see a negative number of tangible things or count with negative numbers on one's fingers, suspicion of them as mere fictions was widespread for centuries in the West. (Just as many today still regard "imaginary" numbers with deep suspicion.)
If the concept of symbolic equations involving unknown quantities had been more well understood, negative numbers would have been accepted much more readily. They provide a means, after all, of solving even the simplest equations, such as x+1=0, a first degree equation in which all the coefficients are natural numbers.
The operation of division is the inverse of multiplication, and so the reciprocal of a nonzero number n is 1/n -- 1 divided by n. Negative numbers are merely formed using subtraction (the inverse of addition), since -n = 0-n. So it's curious that the Greeks didn't think of negative numbers, though they, and other ancient people, did embrace fractions readily. One assumes this is because fractions arise naturally in geometry, measurement, commerce, and so forth. Fractions are just ratios of two natural numbers, in the form a/b for positive integers a and b, and so they are called rational numbers. If the numbers involved were allowed to be negative as well, the rational numbers too could be negative. Mathematicians use Q for the set of all such rational numbers. If the Greeks had been more capable of thinking abstractly in terms of solutions to equations, it would have been easy to define rational numbers as possible solutions to any linear equation of the form ax+b=0, where a (≠ 0) and b are integers.
Geometry was the most developed form of mathematics in ancient Greece, so it was natural to think of numbers (apart from simple counting) as the lengths of lines, areas of circles, volumes of solids, etc. In other words, it was easy to perceive that arithmetic rules of working with counting numbers behaved in the same way as rules for adding and subtracting the lengths of lines, or computing areas and volumes by multiplication and division. It looked as though, perhaps, all numbers of any consequence should be rational. It thus came as a shocking revelation to the classical Greeks that there were "numbers" that could occur as lenghts of lines in a geometric figure which could not be rational numbers. A proof of this was discovered by followers of Pythagoras, specifically that the length of the diagonal of a square whose sides had 1 unit of length could not be a rational number. In modern notation this length is simply √2.
The proof that √2 is not rational is simple. Suppose it were rational. Then √2 = a/b for natural numbers a and b. Hence a2 = 2b2. We may suppose that the fraction is in lowest terms, so that a and b have no whole number factors in common. (Otherwise, just divide those out.) a2 is clearly an even whole number, so a must be even also. (If 2 divides a2, it has to divide a as well, by the rule of unique factorization into prime numbers, also known as the fundamental theorem of arithmentic. As we shall see, this can be proven fairly easily.) So a is divisible by 2; say a = 2A. Then 4A2 = 2b2, hence 2A2 = b2. It follows that b2 is even, so b must be also. But that is a contradiction, since we could assume a and b had no factor in common. This contradiction means that the original assumption that √2 was rational must be false. QED.
This was so shocking to its discoverers that everyone who learned of it was pledged to secrecy. After all, "not rational", or "irrational" meant to the Greeks (just as in English) "unreasonable". This linguistic fluke suggested that the whole field of endeavor of Greek mathematicians was deeply flawed, so it would be devastating to their prestige if this notion became widely known.
In truth, there is nothing inherently contradictory or unreasonable about "irrational" numbers. They simply are not ratios of integers, but they can occur as solutions of polynomial equations with rational coefficients: for example, ±√2, which are solutions of x2-2 = 0. Numbers of this sort are called algebraic numbers, for obvious reasons. This class of algebraic numbers is the principal subject dealt with in algebraic number theory.
Algebraic numbers clearly exist, since the length of the diagonal of a unit square is certainly a meaningful concept. We've just seen the proof that some algebraic rational numbers are not rational. What are they then? In some sense, answering this question is what the subject of algebraic number theory is largely about. The theory attempts to say what they are in terms of mathematical properties they have. We will be spending most of our time on this issue.
Before we dive into that, let's look at the broader context. Recall the result Gauss proved in his thesis, the fundamental theorem of algebra. This theorem is about the roots of a polynomial equation of the form
anxn + an-1xn-1 + ... + a1x + a0 = 0where n is a positive integer, x is an "unknown", an ≠ 0, and for 0≤j≤n all aj are rational (symbolically, aj∈Q). Such an equation, as we noted, is said to be of degree n. This can be simplified a little, because if an ≠ 0, then we can divide both sides of the equation by it, without affecting any of the solutions of the equation (known as roots), and therefore assume that the coefficient of xn is 1. The polynomial in such an equation is said to be monic. For simplicity, we often write the polynomial part of an equation as f(x) so that the equation is f(x) = 0.
What Gauss actually proved is that the polynomial in this equation factors completely into polynomials which have degree at most 2 -- even if the coefficients are any real numbers, not necessarily rationals. Intuitively, a real number is one that can be represented in decimal form as a whole number plus a fractional part that is an infinite series of decimal digits. The decimal digits in the fractional part may or may not repeat in groups, from a certain point on. (For instance, .000123123123... is a repeating decimal, while the fractional parts of the numbers √2 and π never repeat.) It is not hard to show that a number is rational if, and only if, its fractional part is a repeating decimal. Thus the rational numbers form a subset of all real numbers. The set of all real numbers is denoted by R.
Irrational numbers such as √2 are examples of real numbers which are not rational. However, √2 is an algebraic number because it is a root of x2-2=0. Yet not all real numbers are algebraic. In fact, there are vastly more real numbers than there are algebraic numbers. In some sense, there are just as many rational numbers (or even integers) as there are algebraic numbers, because the set of all algebraic numbers can be put into a 1-to-1 correspondence with either Z or Q. (Algebraic numbers may actually be "complex", as will be discussed shortly, but for now just think about real algebraic numbers.) All of these sets are subsets of R which are strictly smaller than R, because they cannot be put into 1-to-1 correspondence with R. (The argument is simple. If A is the set of all (real) algebraic numbers, it has a 1:1 correspondence with positive integers, which means one can, in principle write down all members of A in some order. Now we can define a new number r as the number whose nth decimal digit is one more than the corresponding digit of the nth member of A in the list (or 0 if that digit is 9). r is therefore a real number which cannot appear anywhere in the list, since it differs from every one of them in at least one place. So the supposed list of all
A real number which is not algebraic is said to be transcendental. Curiously, even though there is a vast quantity of transcendental numbers, it is quite difficult to prove that specific numbers, such as π, are transcendental. In fact, it was not until 1873 that a "familiar" number (e, the base of the natural logarithms in this case) was shown to be transcendental, by Charles Hermite. π was shown to be transcendental in 1882, by C. F. Lindemann.
Let's return to Gauss' fundamental theorem of algebra. It is now possible to prove something more general than what Gauss showed. Namely, we can work in any algebraic structure called a field. We'll explain more carefully in the next section what that structure is, but for now just accept that it is any system of objects (like numbers) that allow the arithmetic operations of addition, subtraction, multiplication, and division (except division by 0) following rules just like those of rational or real numbers. In this case, let F be any field, and f(x) be a monic polynomial with coefficients in F. Then it is possible to construct a slightly larger field E that contains F be adding certain new elements which are defined by simple polynomial relations. For instance, if F=Q, we can add or adjoin another element which we will denote by α and which has the property that α2=2. This new field, which we denote by F(α), consists of all possible sums and differences of &alpha with elements of F, as well as all products and quotients of such expressions. There are standard ways to do this construction rigorously and to prove that the result E=F(α) is a field, called an extension field. F is said to be a subfield of E. (Note that as sets, F⊆E.) You may think of the extension E, if you wish, as a collection of formal expressions of sums, differences, products, and quotients involving α and elements of F, always simplified by using the relationship α2=2 whenever possible. That is, always replace α2 by 2 whenever it occurs.
Given all this, it can be shown that there is one root of f(x)=0 in some extension field of the field F that contains the coeffiecients of f(x). Call this root α, so that f(α)=0. With polynomials, there is a process very much like long division of integers which allows one to compute the quotient of f(x) divided by x-α, yielding another polynomial g(x) = f(x)/(x-α). This algorithm guarantees the coefficients of g(x) are in E=F(α) if the coefficients of f(x) are. (In particular, if the coefficients are actually in F.) Consequently, f(x) = (x-α)g(x), where g(x) is monic and has degree exactly one less than that of f(x). We can repeat this process with g(x), and so after exactly n steps, we will arrive at a complete factorization of f(x) into linear factors with coefficients (the constant terms) that are in some extension field of F. We might have to adjoin n different symbols (the roots of f(x)), but at least it can be done. (In fact, it can be shown there is a single additional element θ, called a primitive element, or a generator of the field, which is the only element that needs to be adjoined to F to produce an extension field E=F(θ) in which f(x) splits into linear factors. In other words, this field E contains all the roots of f(x)=0.)
Note that unlike other sorts of numbers we considered before, the "numbers" in an extension field of Q may be somewhat abstract objects, such as formal expressions. They certainly can't be just expressions involving radicals, if the degree of the lowest degree polynomial they satisfy is 5 or more (as Abel and Galois proved). Nevertheless, as long as they are elements of R, i. e. real algebraic numbers, they still "make sense", say, as the length of a geometric object.
The real numbers themselves are rather abstract objects when one tries to construct them rigorously. There are ways to do this other than using decimal expansions. One such method, involving set theory, is called the method of Dedekind cuts, after its inventor Richard Dedekind (1831-1916). (Dedekind's name will come up again, because he was one of the primary contributors to algebraic number theory.) More generally, we can adjoin to Q all possible limits of sequences {an} of rational numbers to form the completion of Q considered as a metric space. We won't attempt to describe these abstract constructions further. The point is that once one goes beyond the field Q of rationals, larger fields consist of objects which are somewhat more abstract -- and to an extent arbitrary, subject only to the rules which define a field.
A perfect example of this is the field of complex numbers, which is obtained by adjoining the element i to the field R of real numbers, subject only to the relation i2=-1. So we can say that i=√-1. What is i "actually"? It doesn't matter. The only thing one needs to know is i2=-1. This should not be cause for suspiciousness or skepticism about such imaginary numbers. Their existence is just as secure as any other abstract object of modern mathematics. If we adjoin i to R the field C = R(i) of complex numbers is what we get.
Another way to describe C is as the set of all "numbers" of the form a+bi with a,b∈R, i. e. C = {a+bi | a,b∈R}. Addition and multiplication are defined on this set by the rules (a+bi)+(c+di) = (a+c)+(b+d)i, and (a+bi)×(c+di) = (ac-bd)+(bc+ad)i. This is very much as if i were an "unknown" symbol like x, except that we always simplify expressions by using the relation i2=-1.
There are other ways to think of this field. For instance, we can take it to be the set of all ordered pairs {(a,b) | a,b∈R} where addition and multiplication are given by (a,b)+(c,d) = (a+c,b+d) and (a,b)×(c,d) = (ac-bd,bc+ad), as suggested by the preceding paragraph. In this notation, it is apparent that C is "nothing but" the Cartesian plane R×R with a peculiar sort of multiplication. (Indeed, topologically, C and R×R are the same.)
One of the requirements of a field is that division by any element of the field except 0 is always possible -- that is, all nonzero elements have a multiplicative inverse. What is 1/(a+bi), the inverse of a+bi? First, we use the notation (a+bi)* = a-bi for the operation of complex conjugation. a-bi is said to be the complex conjugate of a+bi. This is used quite frequently. Next we note that (a+bi)×(a+bi)* = a2 + b2, a non-negative real number that is 0 if and only if a=b=0. So the square root of this is a real number, and we use the notation |a+bi| = √((a+bi)×(a+bi)*). This is called the norm of the complex number a+bi. It follows that if a+bi≠0, then its inverse is given by 1/(a+bi) = (a+bi)*/|a+bi|2.
Just a little more teminology and we can move on. The set of all polynomials in one variable that have coefficients in a field F is denoted by F[x]. A polynomial f(x)∈F[x] is said to be irreducible over F if it has no factors other than 1 and itself belonging to F[x]. An irreducible polynomial is completely analogous to a prime number in the integers. Suppose an element α is a member of some extension E of F. f(x)∈F[x] is said to be a minimal polynomial for α if f(α) = 0 and this is true of no polynomial in F[x] that has degree less than that of f(x). It's easy to show that when f(α)=0, f(x) is a minimal polynomial for α if and only if f(x) is irreducible over F. α is said to have degree n over F if n is the degree of its minimal polynomial. The degree of an extension E⊇F, denoted by [E:F], can be defined in various ways, but what it boils down to is that [E:F] is the degree of a primitive element θ such that E=F(θ). In some ways, a better definition of the degree [E:F] comes about when we regard E as a vector space over F. This is a concept from linear algebra. In these terms, [E:F] is the dimension of E as a vector space over F.
Given all that, we note that [C:R]=2 and that i is a primitive element for the extension C⊇R. C has the fairly special property of being algebraically closed. This means that any polynomial in C[x] factors completely into linear factors in C[x]. In other words, there are no irreducible polynomials in C[x] having degree more than 1, and all roots of any f(x)∈C[x] actually lie in C itself. These facts follow from Gauss' fundamental theorem of algebra. (C does have extensions of infinite degree, such as the field of rational functions of one variable, but we won't go into that now.)
In order to avoid ambiguity, whenever discussing extension fields of some field F, we always assume the extensions are subfields of some fixed algebraically closed field that contains F. A smallest such field is known as an algebraic closure of F. For instance, C is an algebraic closure of R. Q has an algebraic closure (the field of all algebraic numbers) contained in C that is of infinite degree over Q, but much smaller than C itself. (For instance, the algebraic closure of Q contains no transcendental numbers.)
A group G is a mathematical system consisting of a set of elements and one operation between any two elements of the set. If "∘" denotes the operation, in a group there are three requirements:
These axioms can be stated in slightly different ways, but we don't need to get into that.
Note one respect in which these group rules are different from the usual rules of either addition or multiplication in arithmetic: the commutative property x∘y = y∘x is not required for elements of a group, though it might hold for some or even all elements. If it does hold for all group elements, the group is said to be commutative or abelian (after Niels Abel). In the theory of algebraic numbers, whenever groups consist of actual algebraic numbers they will necessarily be commutative, since the rules of arithmentic (both addition and multiplication) still hold. But we will encounter groups that are defined in different ways which definitely won't be commutative. Some of the hardest problems of the theory, in fact, occur in the non-commutative cases.
For a nontrivial example of a commutative group that's important in algebraic number theory, just look at the set of all units, as we discussed in reference to Pell's equation. As you recall, we denoted by Z[√n] the set of numbers of the form a+b√n, where a and b are integers, and n is a positive integer that's not a perfect square. (Z[√n] is in fact a ring, as we'll define the term in a moment.)
Within that set, consider the subset of numbers such that the equation a2 - nb2 = ±1 holds. In other words, the "norm" of a+b√n, N(a+b√n), as defined by the left hand side of the equation, has the value ±1. We noted that this condition is necessary and sufficient for a number in the subset to have a multiplicative inverse. We called such numbers units of the ring Z[√n]. Note that 0 is not a unit, but 1 is, and that the existence of a multiplicative inverse of any unit makes the set of units into a commutative group under multiplication (with ordinary addition being irrelevant in this group – indeed, the sums and differences of units are not units).
Another thing to note is that the requirement for an identity element is a requirement for a solution to a certain simple equation, and we have seen this in action several times. For instance, the natural numbers N (nonzero integers) do not form a group under addition, because there is in general no solution to an equation of the form x+a = 0 with arbitrary a∈N. But if we extend N to the integers Z by "adjoining" all negative numbers, we have in effect simply included all formal solutions of equations x+a=0 for each a∈N and gotten lucky in that the enlarged domain satisfies the group axioms without difficulty.
Very much the same thing happened with respect to the operation of multiplication when we passed from the integers Z to the rationals Q. Again, with respect to multiplication, Z satisfies the group axioms except for the existence of inverses. That is, we are not able to solve the equation xa = 1 for arbitrary a∈Z. In fact, a solution exists only for a = ± 1. But in defining the rational numbers Q in effect we just formally adjoined the inverses (reciprocals) 1/a for each a∈Z (except a=0). The resulting group with the operation of multiplication consists of all nonzero elements of Q. This group is sometimes denoted by Q×.
If we wanted to preserve the additive structure of Z at the same time as providing multiplicative inverses, in order to construct the ring Q, we would need to have been a little subtler. This process is a standard one. It is called constructing a field of fractions, and we will come back to it.
In addition to the requirements on the operations of addition and multiplication seprately, they must satisfy a compatibility condition, known as the distributive law of multiplication with respect to addition:
The integers Z are the most obvious example of a ring. For any field F containing Q, there is also the concept of a ring of integers of the field F. This ring is a direct generalization of Z, and it is one of the central objects of study. One wants to know as much as possible about the structure of such rings, because this knowledge has extensive practical application to the study of Diophantine equations, as we shall see. Elements of a such a ring of integers are called, simply, algebraic integers.
In many respects, of groups, rings, and fields, it is rings which are most interesting. They have the complexity due to possessing two operations, but the freedom of a less restrictive set of axioms than fields. This results in many more special situations, though not all of the strong theorems about fields (such as Galois theory) apply to rings.
The most important set of facts about fields for our purposes lie in what is known as Galois theory. This is the theory developed originally by Évariste Galois to deal (among other things) with the solvability or non-solvability, using radicals, of algebraic equations. It tells us a lot about the structure of field extensions in terms of certain groups – called Galois groups – which are constructed using permutations of roots of a polynomial which determines the extension. (Permutations are 1-to-1 mappings of a set to itself that interchange elements.) A little more precisely, a Galois group consists of automorphisms of a field – i. e. maps (functions) of the field to itself which preserve the field structure. All such automorphisms, it turns out, can be derived from permutations of the roots of a polynomial – under the right conditions.
The importance of Galois theory is that it sketches out some of the "easy" background facts about a given field extension, into which some of the more difficult facts about the algebraic integers of the extension must fit.
Before we proceed, let's review some notations and definitions that will be used frequently. Suppose F is a field. For now, we will assume F is a subset of the complex numbers C, but not necessarily a subset of the real numbers R. If x is an indeterminate (an "unknown"), then F[x] is the set of polynomials in powers of x with coefficients in F. F[x] is obviously a ring. If f(x)∈F[x] is a polynomial, it has degree n if n is the highest power of x in the polynomial. f(x) is monic if the coefficient of its highest power of x is 1. If f(x) has degree n, it is said to be irreducible over F if it is not the product of two (or more) nonconstant polynomials in F[x] having degree less than n.
A complex number α, which is not in F, is algebraic over F if f(α)=0 for some f(x)∈F[x]. f(x) is said to be a minimal polynomial for α over F if f(x) is monic, f(α)=0, and no polynomial g(x) whose degree is less than that of f(x) has g(α)=0. (Note that any polynomial such that f(α)=0 can be made monic without changing its degree.) A minimal polynomial is therefore irreducible over F. F(α) is defined to be the set of all quotients g(α)/h(α) where g(x) and h(x) are in F[x] and h(α)≠0. F(α) is obviously a field, and it is referred to as the field obtained by adjoining α to F.
If E is any field that contains F, such as F(α), the degree of E over F, written [E:F], is the dimension of E as a vector space over F. (Usually this is assumed to be finite, but there are infinite dimensional extensions also.) It is relatively easily proven that if α is algebraic over F and if the minimal polynomial of α has degree n, then [F(α):F]=n. Of course, more than one element can be adjoined to form an extension. For instance, with two elements α and β we write F(α,β), which means (F(α))(β). (Or (F(β))(α) – the order doesn't matter.)
We will frequently need one more important fact. Suppose we have two successive extensions, involving three fields, say D⊇E⊇F. This is called a tower of fields. Then D is a vector space over E, as is E over F. From basic linear algebra, D is also a vector space over F, and vector space dimensions multiply. Consequently, in this situation we have the rule that degrees of field extensions multiply in towers: [D:F]=[D:E][E:F].
Now we're almost ready to define a group, called the Galois group, corresponding to an extension field E⊇F. However, Galois groups can't be properly defined for all field extensions E⊇F. The extension must have a certain property. Here is the problem: The group we want should be a group of permutations on a certain set – the set of all roots of a polynomial equation. But consider this equation: x3-2=0. One root of this equation is the (real) cube root of 2, 21/3. The other two roots are ω21/3 and ω221/3 where ω=(-1+√-3)/2. You can check that ω3=1 and ω satisfies the second degree equation x2+x+1=0. ω is called a root of unity, a cube root of unity in particular. (Roots of unity, as we'll see, are very important in algebraic number theory.) Now, the extension field E=Q(21/3) is contained in R, but the other roots of x3-2=0 are complex, so not in the extension E. This means that it isn't possible to find an automorphism of E which permutes the roots of the equation. Hence we can't have the Galois group we need for an extension like E.
The property of an extension E⊇F that we need to have is that for any polynomial f(x)∈F[x] which is irreducible (has no nontrivial factors) over F, if f(x) has one root in E, then all of its roots are in E, and so f(x) splits completely in E, i. e. f(x) splits into linear (first degree) factors in E. An equivalent condition (as it turns out), though seemingly weaker, is that there be even one irreducible f(x)∈F[x] such that f(x) splits completely in E but in no subfield of E. That is, E must be the smallest field containing F in which the irreducible polynomial f(x)∈F[x] splits completely. E is said to be a splitting field of f(x). The factorization can be written
f(x) = ∏1≤i≤n (x - αi)with all αi∈E, where n is the degree of f(x). (Remember that we are assuming f(x) is monic.) When this is the case, E is generated over F by adjoining all the roots of f(x) to F. In this case it can be shown that the degree [E:F] is the same as the degree of f(x).
An extension that satisfies these conditions is said to be a Galois extension, and it is the kind of extension we need in order to define the Galois group G(E/F). (Sometimes the type of extension just described is called a normal extension, and a further property known as separability is required for a Galois extension. As long as we are dealing with subfields of C, fields are automaticaly separable, so the concepts of Galois and normal are the same in this case.)
Suppose E⊇F isn't a Galois extension. If E is a proper extensions of F (i. e. E≠F), if α∈E but α∉F, and if f(x) is a minimal polynomial for α over F, then the degree [E:F] of the extension is larger than the degree of f(x). This is because all the roots of f(x) must be adjoined to F to obtain a Galois extension, not just a single root. If α is (any) one of the roots, [F(α):F] is equal to the degree of f(x). But this is the degree [E:F] only if α happens to be a primitive element for the extension, so that E=F(α), which isn't usually the case, and certainly isn't if E isn't a Galois extension of F.
In the example above with f(x)=x3-2, we have E = Q(ω,21/3) = Q(ω)(21/3), [Q(ω):Q]=2 and [Q(ω,21/3):Q(ω)]=3, so the degree of the splitting field of f(x) over Q is 6, because degrees multiply. Q(21/3)⊇Q is an example of a field extension that is not Galois. But Q(ω,21/3)⊇Q(ω) is Galois, since f(x) is irreducible over Q(ω) but splits completely in the larger field. Likewise, Q(ω)⊇Q is Galois, and in fact all extensions of degree 2 are Galois. (If f(x)∈Z[X] is a quadratic which is irreducible over Q and has one root in E, then the roots are given by the quadratic formula and involve √d for some d∈Z, so if one is in E, both are.)
We'll come back to this example, but first we'll look at a simpler one to get some idea of how Galois groups work. Consider the two equations x2-2=0 and x2-3=0. The roots of the first are x=±√2, and the roots of the second are x=±√3. We will start from the field Q and adjoin one root of each equation. This yields two different fields: E2=Q(√2) and E3=Q(√3). If we adjoin a root from both equations we get a larger field that contains the others as subfields: E=Q(√2,√3).
Consider the field extension E2⊇Q first. We use the notation G(E2/Q) to denote the Galois group of the extension. In this example, call it G2 for short. We will use Greek letters σ and τ to denote Galois group elements in general. G2 consists of two elements. One of these is the identity (which we denote by "1") which acts on elements of the field E2 but (by definition) leaves them unchanged. This can be symbolized as 1(α)=α for all α∈E2. The action of a Galois group element can be fully determined by how it acts on a generator of the field, meaning √2 in this case. So it is enough to specify that 1(√2) = √2. This Galois group has just one other element σ2, which is defined by σ2(√2)=-√2. An important property that a Galois group must satisfy is that the action of all its elements leaves the base field (Q in this case) unchanged. A Galois group is an example of a group that acts on a set – a very important concept in group theory. But there is an additional requirement on Galois groups: each group element must preserve the structure of the field it acts on. In technical terms, it must be a field automorphism. We'll see the importance of this condition very soon.
As you can probably anticipate, the Galois group G3=G(E3/Q) has elements 1 and σ3 defined by σ3(√3)=-√3. We can now ask: what is the Galois group of the larger extension E⊇Q? It must contain 1, σ2 and σ3. We have to think about how (for instance) σ2 acts on √3. The clever thing about Galois theory is that it's easy to say what this action should be: σ2 should leave √3 unchanged: σ2(√3)=√3. In particular, σ2(√3) cannot be ±√2 The reason is that σ2 leaves the coefficients of x2-3=0 unchanged, and because σ2 is a structure-preserving field automorphism it cannot map something that is a root of that equation (such as √3) to something that is not a root of that equation (±√2).
For any finite group G, the order of the group is the number of distinct elements. We symbolize the order of G by #(G). In Galois theory it is shown that the order of a Galois group is the same as the degree of the corresponding field extension. Symbolically: #(G(E/F))=[E:F]. Basically this is because we can always find a primitive element θ such that E=F(θ), and θ satisfies an equation f(x)=0, where the degree of f(x) is [E:F]. The other n-1 roots of that equation are said to be conjugate roots. We get n automorphisms, the elements of G(E/F), generated from mapping θ to one of its conjugates (or to itself, giving the identity automorphism). Since the degrees of field extensions in towers multiply, so too do the orders of Galois groups in field towers, as long as each extension is Galois. That is, if D⊇E⊇F, where each extension is Galois, then #(G(D/F)) = #(G(D/E))#(G(E/F)). In our example, the degree of the extension is [Q(√2,√3):Q] = [Q(√2,√3):Q(√2)][Q(√2):Q] = 4. So this is also the order of the Galois group G=G(Q(√2,√3)/Q), and therefore we need to find 4 elements.
We've already identified three of the elements (1, σ2 and σ3). It's pretty clear that the remaining element must be a product of group elements: τ=σ2σ3. The product of Galois group elements is just the composition of the elements, which are field automorphisms (which happen to be derived from permutations on roots of equations), and hence they compose like any other function (or permutation). (Composition is just another term for the the function which is the result of applying one function after another.) Because of how σ2 and σ3 are defined, it must be the case that τ(√2)=-√2 and τ(√3)=-√3. Since E⊇Q is generated by √2 and √3, and τ is a field automorphism, we can figure out what τ(α) must be for any other α∈E. For instance, τ(√6)=√6, since √6=√2√3.
(Remember that we specified σ2(√3)=√3. You may have been wondering why we didn't just define the action of σ2 as an element of the full Galois group G=G(E/Q) by σ2(√3)=-√3. Had we done that, σ2 would have been what we found as τ, while the τ we got as the product of σ2 and σ3 would turn out to be the "old" σ2, so the only difference would be a relabeling of group elements.)
For a slightly more complicated example, suppose f(x)=x2+x+1 and g(x)=x3-2, with roots ω and 21/3 respectively, as above. Then in the tower Q(ω,21/3) ⊇ Q(ω) ⊇ Q both the extensions are Galois. (We already saw this isn't so with the tower Q(ω,21/3) ⊇ Q(21/3) ⊇ Q – order matters.) So the full extension E=Q(ω,21/3) ⊇ Q is Galois. Its Galois group G=G(E/Q) has order 6, because 6 is the degree of the whole extension, since the intermediate extensions are of degree 3 and 2 and the degrees of the extensions multiply.
It turns out to be easy to determine the Galois group of this extension, although there are some tedious calculations needed to verify this. So bear with us a moment here. We can define two automorphisms of E that leave Q fixed, as follows. It suffices to specify them on generators of the field. Let one automorphism σ be defined by σ(&omega)=ω2 and σ(21/3)=21/3. Let the other automorphism τ be defined by &tau(21/3)=ω21/3 and τ(ω)=ω. σ and τ are defined to leave elements of Q unchanged. For sums and products elements of E, σ and τ are defined to preserve the field structure, so they really are automorphisms (though, to be rigorous, this should be checked). So σ and τ are elements of the Galois group G=G(E/Q).
We can also see that σ2(ω) = σ(σ(ω)) = σ(ω2) = ω4 = ω, because ω3 = 1. So σ2 is the identity automorphism. (Note that the exponents on σ and τ refer to repeated composition, not ordinary exponentiation, because composition "is" multiplication in the group G.) If we compute τ2 and τ3 in the same way, applied to 21/3, we find that τ2(21/3) = ω221/3, and τ3(21/3) = 21/3, again because ω3 = 1. Thus τ2 isn't the identity automorphism, but τ3 is.
Now let's compute with the composed automorphisms στ and τσ. First, στ(21/3) = σ(ω21/3) = ω221/3. However, τσ(21/3) = τ(21/3) = ω21/3. So we have στ ≠ τσ, because ω≠ω2. Instead, we will find by a similar calculation that στ(21/3) = ω221/3 = τ2σ(21/3). Hence στ = τ2σ. A little more checking will show that 1 (the identity automorphism), σ, τ, τ2, τσ, and στ give a complete list of distinct automorphisms that can be formed from σ and τ. That's just right, because G must be a group of order 6.
In abstract group theory there are only three distinct groups of order 6. (That is, distinct up to an isomorphism, which is a 1-to-1 structure-preserving map between groups that shows they are essentiall the "same" group.) One is the cyclic group of order 6, denoted by C6. Another is the direct product of a cyclic group of order two and one of order 3, i. e. the group C2×C3. However, both of those are abelian groups, and since στ ≠ τσ, G isn't abelian, it cannot be either one. But the third group of order 6 is (up to isomorphism) S3, the group of permutations of three distinct objects, also known as the symmetric group. (An isomorphic group is the dihedral group D3, the group of symmetries of an equilateral triangle.) Since this group is the only nonabelian group of order 6, G(E/Q) must be isomorphic to it.
There's a whole lot more that could be said about Galois theory, but that would take up quite a bit of space, and the intention here is only to give a feel for what it is about. The basic idea to take away is this: A great deal is known about abstract groups and their subgroup structure. Galois theory is a way to "map" extensions of fields to groups and their subgroups in such a way that most of the interesting details about the extension are reflected in details about the groups, and vice versa. The group structure is sensitive to relationships among elements in the subextensions of a Galois extension. In Galois theory it is proven that there is a precise correspondence between subextensions and subgroups of the Galois group.
It thus becomes possible to infer facts about field extensions easily from a knowledge of their Galois groups. One example of the power of this method is that it made possible proving facts that had remained mysterious for hundreds of years – for example, the unsolvability by radicals of general polynomial equations of degree 5 or more, and the impossibility of certain geometric constructions by straightedge and compass alone (trisecting angles, for example).
Galois theory is an absolutely indispensible tool in algebraic number theory. It will come up again and again. We will mention other results in the theory when they are needed.
What was pointed out in the discussion of Diophantine equations is that the characteristic feature isn't the form of the equation, but rather the type of solution which is sought. In normal mathematical conversation, a Diophantine equation is, in form, just a polynomial equation f(x,y,z,...)=0 in several variables, usually with integer coefficients. (Equations with rational coefficients can be multiplied by some integer to leave only integer coefficients.)
As the history sketched at the beginning of this page shows, simple equations of this form, with one variable, were studied from the time of Al-Khwarizmi in the 9th century onward. The notation and concepts used evolved closer to what we use today, until the 16th century, when the problem was mostly solved for equations of one variable of fourth degree or less, as long as "solutions" were allowed to be what (in modern terminology) are called algebraic numbers that are members of some finite extension of the field Q of rational numbers.
It is a much harder problem if one is required to find solutions to such equations which are actually rational numbers or even integers. Yet, curiously and perversely, this is exactly what Diophantos of Alexandria in the 3rd century CE, and his earlier predecessors and later successors, set out to do in solving a Diophantine equation. In form such equations are polynomials, yet the problem is made harder by adding additional requirements on the solution. Although adding additional conditions to a problem doesn't make the problem harder in all cases (sometimes just the opposite), it certainly does in this case.
The reason for this lies in the nature of the mathematics itself. Mostly this is because it becomes more difficult to determine whether a solution even exists. Note that for solving equations of the fifth or higher degree, it was not even realized until the 19th century that solutions weren't possible if it was required that they consist only of closed expressions involving arithmentic operations and radicals. Adding the further condition that solutions consist only of integers or rationals makes matters even more difficult, since one needs to determine conditions under which solutions even exist, depending on the degree of the equation, number of variables, nature of the coefficients, etc.
Yet this is precisely the swamp into which Diophantos and his predecessors and successors unwittingly stumbled. Of course, it's understandable why this happened. The problems that Diophantos et al. wanted to solve weren't formulated in modern notation. They weren't formulated in "algebraic" notation at all, which evolved only very gradually. Instead, they arose as "word problems". And they called for solutions in integers, or at worst rational fractions, because more sophisticated concepts of number hardly even existed at the time. Indeed, even rational numbers were cumbersome to work with or think about (to say nothing of negative numbers), while "irrational" numbers were still considered to be something of a scandal.
As a result of the work of Abel and Galois and others early in the 19th century, the problems of solving equations which were actually tractable finally became fairly well understood. So, at last, the time was right to attack the older problems, which were in fact much harder -- and therefore more interesting.
The discussion which follows is about the machinery which was invented in order to get a better grip on these harder problems. The problems are by no means fully conquered even today -- there remain many quite difficult open questions which have their roots in these seemingly simple problems about equations.
So let's call the concept we are looking for algebraic integers, and try to figure out how this concept should be defined. Without attempting to psych out what went on in the minds of the mathematicians of the 19th century who pulled this off, let's just try something obvious. An integer may be defined as a solution of a monic polynomial equation of degree one with coefficients in Z, i. e. an equation like x-a=0 for a∈Z.
Yes, this is a circular definition, but it can be generalized to a definition for equations of higher degree that not only isn't circular, but is in fact exactly what we want. So define an algebraic integer as any solution of a monic polynomial equation f(x)=0 whose coefficients are ordinary integers in Z, that is f(x)∈Z[x]. (Clearly, the assumption of a monic polynomial is very important now, since we are no longer at liberty to divide through by the leading coefficient without messing up the assumption on the other coefficients.)
What we have done is simply to limit consideration to polynomials whose coefficients lie in a ring, namely Z, instead of polynomials with coefficients in a field (Q). Moreover, Z and Q are closely related. It turns out that many (but hardly all) other rings like Z can be extended to fields like Q in exactly the same way -- Q is the field of fractions corresponding to the ring Z. Although this construction is familiar and obvious, there are subtleties about it which will motivate us to look at some interesting features of the theory of rings in general.
Let's look at how the construction of fractions actually works. What it really comes down to is finding a solution to the simplest possible Diophantine equation: ax-b=0 for a,b∈Z. Doing this is something like "solving" the equation x2+1=0. Essentially, one just gives a name to the solution (x=i in the latter case) and shows that the introduction of this symbol with the property that i2=-1 does not bring along any inconsistency. One way to do this is to construct some sort of mathematical object which actually has the desired property. In the case of x2+1=0, the construction can be done in a very general way by forming the quotient ring R[x]/(x2+1), noting that it is a field which contains R, and has an element which satisfies the given relation. (We'll explain this more fully a bit later.)
Now, as regards the solution of ax-b=0, we can give the name "a/b" to the solution. This a/b is constructed as follows: we start from the set of all ordered pairs of integers for which the second element isn't 0: {(a,b)∈Z×Z | b≠0}. We want this to be a field, so there must be rules of addition and multiplication for the pairs. Multiplication is easily defined as (a,b)×(c,d)=(ac,bd). Addition is a little trickier: we define (a,b)+(c,d)=(ad+bc,bd). (That being the way one actually adds a/b and c/d.) It's easy to check that the given set with these rules satisfies the axioms for a field. In particular, the multiplicative inverse of (a,b) is (b,a) (assuming a≠0), and the additive inverse of (a,b) is (a,-b). (One subtlety is that we have to regard (a,b) as identical to (a′,b′) in case ab′=a′b, or equivalently, to limit the original set of ordered pairs to only those where the elements have no integer divisor in common, the first element of the pair is a positive integer, and to maintain that condition by always removing common factors after addition or multiplication.)
This construction can actually be done for many, but not all, other commutative rings besides the integers Z. There is a certain property Z has which is not true of all rings. Namely, for a,b∈Z the product ab=0 if and only if at least one of a and b is 0. Although that is clearly true in Z, there are reasonable examples of rings where it fails. We'll give such an example very soon. The property that ab≠0 whenever both factors are not 0 is so important that a commutative ring with a multiplicative identity having this property is given a special name: a domain (or sometimes, integral domain). A ring clearly needs this property to perform the construction explained in the preceding paragraph.
But that property is sufficient. The indicated construction can be done for any integral domain A, and the field that results is called the field of fractions of the domain A. Any field is also a ring, and a domain A is naturally a subring of its field of fractions by means of the correspondence a→(a,1) for all a∈A. (This is why one requires a domain A to contain a mutiplicative identity element.) A field is a domain, because if we have ab=0, and one of the factors is not 0, then multiplying by its inverse would imply that the other factor is 0. (Multiplication is always commutative in a field.)
Another standard example of a ring is a ring A[x] of polynomials over an integral domain A: A[x] consists of all finite sums of the form ∑0≤k<∞ akxk, where all ak∈A. The "x" is just a formal symbol. The rules of addition and multiplication in A[x] follow from assuming ax=xa for a∈A and applying the distributive law as necessary. Since A[x] is a domain, a field of fractions can be constructed. It consists of rational functions, which are just quotients of polynomials f(x)/g(x), where g(x)≠0.
There is one more example, which is the main motivation for considering rings at all in this subject. If F is a field which is a finite extension of Q (that is, has finite degree over Q), F is called a number field or algebraic number field, because all of its members are algebraic numbers as defined previously (roots of some polynomial f(x)∈Q[x]). Let O (or OF or O(F/Q) when we want to be explicit) be the set of all elements of F that are algebraic integers as defined at the beginning of this section, namely roots of monic polynomials f(x) with integer coefficients, i. e. monic f(x)∈Z[x]. Then in fact O is a ring, called the ring of integers of the extension F⊇Q. Indeed, O is a domain. Addition and multiplication in O come from the corresponding operations in the field F. The only tricky part is showing that the sum and product of algebraic integers also satisfy monic polynomial in Z[x]. Trivially Z is a subring of any such O. And as you would expect, F turns out to be the field of fractions of the domain O.
For all these reasons, rings of integers are the natural generalization of the rational integers Z, which is a subring of any ring of algebraic integers. It is fair to say that the main concern of algebraic number theory is determining properties of such rings OF for algebraic number fields F. This is important, because there are properties of Z has which general rings of integers do not have.
One of the motivations for this concept of ideals is that it makes possible the definition of another very important concept: quotient rings. If as above R is a ring and I is an ideal, the quotient ring, denoted by R/I, is defined as the set of distinct cosets of the form r+I for all r∈R, where r+I is defined for any r∈ R as the set {r+a | a∈I}. Not all cosets are distinct as r ranges over R. Two cosets r+I and r′+I are the same exactly when r-r′∈I. This is because we can write r+I = r′+(r-r′)+I = r′+I. (Because r+I = I if r∈I.)
Importantly, we can define a ring structure on the set of all cosets. Addition is simple: (r+I)+(r′+I) = (r+r′)+I. Multiplication is a little trickier: (r+I)(r′+I) = rr′+I, but this only works since it doesn't depend on the choice of representative of each coset. The problem is we can have r+I=r′′+I even though r≠r′&prime, if r-r′′∈I. But in that case, (r-r′′)r′∈I by the second axiom for ideals, so all is OK. Thus multiplication of cosets is well-defined and unambiguous.
Under these definitions of addition and multiplication R/I is a (commutative) ring with a multiplicative identity. The additive identity element is I, and the multiplicative identity is the coset 1+I.
The ideal structure of the rational integers Z provides some important examples. Let n∈Z. Then obviously the set nZ = {nm | m∈Z} is an ideal, often written simply as (n). It is just all integral multiples of n. The quotient ring Z/nZ=Z/(n) is known as the ring of integers modulo n, and it has n elements. It is familiar from elementary number theory, where one writes "equations" such as a≡b (mod n) just in case a-b is divisible by n, i. e. is a multiple of n, i. e. a-b∈(n). The study of such congruences, which is done all the time in number theory, is really just the study of the ring Z/nZ.
A very important special case is when n is a prime p. Then it is a fact that Z/pZ is a field – a finite field of p elements, sometimes denoted by Fp. However, if n is not prime, then Z/nZ isn't even an integral domain, because it has divisors of zero, i. e. nonzero elements whose product is 0. For instance, if n=st for s,t∈Z, but s,t≠±1, then as ideals (s)≠0 and (t)≠0, where 0=(0)=(n). Yet (s)(t)=0. We can in fact characterize prime numbers p as elements of Z such that Z/pZ is a field.
Another important concept is that of a prime element of a ring. A little bit of care is required to define "prime" in a general ring, but essentially a prime element is one that has no factors other than itself and units. As far as divisibility and factors are concerned, units are essentially irrelevant, since they are invertible.
One of the most important properties that the integers have as a ring is unique factorization. That is, for any n∈Z, there is a unique way (apart from order and unit factors) to write n as a product of primes.
This fact can be proven using the order properties of Z, i. e. for every pair of distinct positive integers a, b, exactly one of a<b, a=b, or a>b is true. To begin with, this implies that for any pair of positive a,b∈Z, we can write a=qb+r with 0≤q and 0≤r<b. Reason: you can subtract b from a only a nonnegative but finite number of times (q) before the result is negative. This is because every number in the sequence a, a-b, a-2b, ... is strictly less than its predecessor, and if a is finite, there are only a finite number of distinct positive integers less than a. r is simply the last quantity before you have a negative number, and so 0≤r<b. The numbers q and r are uniquely determined by this procedure, and in fact there is a simple algorithm to find them, as we'll see in a moment.
For any positive integers a,b∈Z, we can define the greatest common divisor of the pair as the largest (positive) integer which divides both, written gcd(a,b), or simply (a,b). It may be, of course, that (a,b)=1, in which case we say a and b are relatively prime. As a matter of notation, if one number m divides another n, so that n=mq for some q∈Z, we write m|n. If this is not the case, then we write m∤n. (a,b) can be defined by the conditions that (a,b)|a, (a,b)|b, and if both c|a and c|b, then c|(a,b).
The Greek mathematician Euclid, known best for his geometry, was interested in number theory also. In addition to proving that there are infinitely many primes, he also gave a simple algrorithm for computing the greatest common divisior of two integers without explicitly factoring them – since factoring can be a relatively difficult process for large numbers. The algorithm is called, of course, the Euclidean algorithm.
To apply it, assume (without loss of generality) that a>b and write a=q1b+r1. Here, q1>0 and 0≤r1<b. Provided r1≠0 we can repeat the procedure and write b=q2r1+r2. We can repeat this procedure as long as the remainder rk isn't 0. If rk is the last nonzero remainder, then one notes that (a,b)|rk, because in fact (a,b) divides all such remainders in the process. But we also have rk-1=qk+1rk, hence rk|rk-1 and from rk-2=qkrk-1+rk, we find rk|rk-2 too. If we proceed back all the way we find rk|b and rk|a, hence rk|(a,b). Therefore rk=(a,b). In other words, (a,b) is the last nonzero remainder in this process.
But even nicer things are true. Go back to a=q1b+r1, so that r1=a-q1b. Similarly, r2= b-q2r1= b-q2(a-q1b)= Ma+Nb for some integers M and N (not necessarily positive). Proceding inductively, we have that (a,b)=Ma+Nb for some M,N∈Z. What this says is that a certain Diophantine equation can be solved for unknowns M and N if a, b (and hence (a,b)) are given. Note that if (a,b)>1, the equation d=Ma+Nb could not be solved if 1≤d<(a,b), because a solution would imply (a,b)|d.
We need one more fact about prime numbers. Suppose p is prime, and p|mn for some m,n∈Z. So by definition, mn=pq for some q∈Z. We claim that p must divide either m or n (perhaps both). For suppose that we don't have p|m, hence (p,m) can't be p. But p is prime, and (p,m)|p, so we must have (p,m)=1. Hence it is possible to write 1=Mp+Nm. Therefore n=n(Mp+Nm)=Mnp+Nmn=Mnp+Npq= p(Mn+Nq). In other words, p|n. This property possessed by primes in Z is not shared by "primes" in other rings of algebraic integers, as we shall soon see.
We now have all the facts we need to prove unique factorization in Z. The proof is done by supposing factorization isn't unique, and showing this leads to a contradiction. So suppose factorization isn't unique, and for some n there are two different factorizations of n (apart from units ±1). There cannot be any prime which occurs in one factorization but not the other, by the result of the preceding paragraph. Hence the same prime factors occur, but for at least one prime p we have n=Apr=Bps with 0<r<s, and (A,p)=(B,p)=1. Dividing through by pr reduces to the case where a prime occurs in one factorization but not the other, which is impossible. The contradiction proves the desired result.
It may seem "obvious" that factorization is unique, because we are so familiar with the fact this is true in Z that it is taken for granted. It may therefore be rather surprising that in many (in fact most) rings of algebraic integers, factorization is not unique. Unique factorization is actually a very special and rare occurrence, and a great deal of algebraic number theory is concerned with either trying to compensate for this "problem", or else trying to describe, in some sense, just how badly factorization fails to be unique.
Let's look at an example where unique factorization fails. First, we need to introduce a concept that makes it easy to prove some results about factorization (and has many more applications as well). Suppose we have an algebraic number α∈F and F⊇Q is a Galois extension. (Such an extension always exists: it is a splitting field of an irreducible polynomial f(x) such that f(α)=0, but we don't necessarily assume F is the smallest such extension.) Let G=G(F/Q) be the Galois group.
To review a concept, which has been introduced before, we define the norm of α with respect to the extension F⊇Q to be the product of all numbers σ(α) as σ ranges over elements of G. (More generally, the norm works also for Galois extensions of base fields that contain Q.) Symbolically, the norm is:
NF/Q(α) = ∏σ∈G σ(α)Note that "the" norm depends on the specific extension F/Q, and so the extension is indicated in the subscript.
For instance, let F=Q(√-5). F/Q is Galois, because it is the splitting field of the irreducible polynomial f(x) = x2+5 = 0. Any α∈F can be written as a+b√-5 for a,b∈Q. An element σ∈G can be specified by how it acts on a typical such element. Of course, since [F:Q]=2, G has only two elements: 1 (the identity) and σ, so σ is determined by how it acts on √-5. σ(√-5) has to be a root of f(x)=0 different from √-5, so it must be -√-5. It follows that σ(a+b√-5)=a-b√-5 for a,b∈Q, because σ is a field automorphism of F that leaves all elements of Q fixed. This should remind you of complex conjugation, because that is in fact the nontrivial automorphism of the group G(Q(i)/Q).
In the simple case at hand, we can give a simple formula for the norm:
NF/Q(a+b√-5) = (a+b√-5)(a-b√-5) = a2 + 5b2(In the field Q(i), the norm of a complex number is the square of the modulus, i. e. |a+bi|2 = a2 + b2, so norm in the sense used here is closely related to the complex "norm".)
The norm symbol has some fairly obvious properties. From the way it is defined as a product, the norm is a group homomorphism from the multiplicative group of F to the multiplicative group of Q. It is also a homomorphism of the multiplicative semigroups of the rings of integers OF and Z, which means that the value of the norm of an algebraic integer is always an integer in the base field (in this case the integers Z of Q). This is because each σ∈G is a field automorphism which satisfies σ(αβ)=σ(α)σ(β) for all α,β∈F. In other words,
NF/Q(αβ) = NF/Q(α) NF/Q(β)Furthermore, NF/Q(ε)=±1 if and only if ε is an invertible element of OF, i. e. a unit. (Since ±1 are the units of Z.)
Let's first determine the ring of integers OF. Let α=a+b√-5 be a general element of F, with a,b∈Q. If in fact α is an algebraic integer, then so is its conjugate α*=a-b√-5. Further, the sum α+α*=2a is in Q, and is also an algebraic integer, since the algebraic integers of an extension form a ring. But the only algebraic integers in Q are in fact in Z, so 2a∈Z. Similarly, 2b=(α-α*)/√-5 is an algebraic integer in Q, hence an element of Z, so 2b∈Z. The norm of both α and α* is a2+5b2, which is in Z. Multiplying the last expression by 4 shows (2a)2+(2b)25∈4Z. Since 5≡1 (mod 4), (2a)2+(2b)2≡0 (mod 4). This is impossible unless both 2a and 2b are even integers (just check the separate cases). Hence both a and b are in Z. The conclusion is that OF = {a+b√-5 | a,b∈Z}.
A slight elaboration of this argument shows that in any quadratic field Q(√d) where d∈Z is square-free, algebraic integers have the form OF = {a+b√d | a,b∈Z} unless d≡1 (mod 4), in which case OF = {(a+b√d)/2 | a,b∈Z, a≡b (mod 2)}. In other words, the naive guess that algebraic integers of Q(√d) just have the form a+b√d for a,b∈Z isn't entirely correct, but it is wrong only for d≡1 (mod 4), and then only by a little bit.
At this point, there is one delicate issue of nomenclature we must deal with. You will recall that a prime p∈Z is customarily defined as a (nonzero, nonunit) number which has no divisors other than units (±1) and ±p. We also proved that if p has this property, and if p divides a product mn, then either p divides m or p divides n (or maybe it divides both). In Z we can use this property to define p as a prime, since if the property is true of p then the more familiar condition that p has no nontrivial divisors is also true. This is because if p has this property, then the only divisors of p can be ±1 and ±p. (This follows from order properties of Z, because all divisors of a number n, except for ±n, have absolute values less than |n|.)
So these two properties of a nonzero p∈Z are equivalent. However, as we are about to see, the properties are not equivalent in other rings of integers. Nevertheless, we will find it convenient to use a generalization of the definition that p is prime if and only if p|mn implies p|m or p|m. So we will need a new term for the property of nonzero p that it is not a unit and not divisible by any other number except a unit times p. For this property we will use the term irreducible (like an irreducible polynomial). (And when this is the case, we say that p has only "trivial" factors, hence a nonunit p is defined to be irreducible if and only if all its factors are trivial, or equivalently if and only if it has no nontrivial factors.) We will make a similar distinction of "prime" and "irreducible" for integers α of other rings of integers.
Finally we can get back to factorization in F=Q(√-5). Observe that 21=3⋅7=(1+2√-5)(1-2√-5). We claim, first, that both 3 and 7 are irreducible in OF. Consider 3 first. If α=a+b√-5 were a nontrivial integral divisor of 3 – i. e. neither α nor 3/α is a unit – then we would have NF/Q(α) = a2+5b2 divides NF/Q(3) = 9. (Note, by the way, that for this extension, the norm is always nonnegative.) So NF/Q(α) must be either 3 or 9, since α isn't a unit. Obviously the equation a2+5b2=3 has no solutions for a,b∈Z. So NF/Q(α) isn't 3, hence it must be 9. Then NF/Q(3/α)=1, and 3/α is a unit, contrary to assumption. So 3 is irreducible. 7 is also irreducible, by a similar argument. The same kind of argument shows that 1±2√-5 must be irreducible, since both conjugates have norm 21, and any non-unit α that divided either would have a norm equal to 3 or 7, which we just observed is impossible. And we cannot have 1±2√-5 dividing either 3 or 7 (or vice versa), since 21∤9 and 21∤49.
What we've just shown is that 21 has two factorizations into irreducible numbers of OQ(√-5), and the factorizations are not equivalent, since the irreducible numbers in one factorization aren't unit multiples of either irreducible number in the other factorization. This shows that factorization of elements of the ring OQ(√-5) into irreducible numbers isn't unique.
This example shows that a number which has no non-trivial factors (e. g. 3 or 7) can divide a product (e. g. 21) of two other numbers (e. g. 1±2√-5) without dividing either one of the factors of the product. So an irreducible number is not "prime" in the sense that if it divides a product, it must divide at least one of the factors. This latter property is actually more useful in practice, so we want to use the term "prime" for it. Therefore, a distinction is made in a general ring of integers: a (nonzero, nonunit) number which has no non-trivial factors is said to be irreducible. (Equivalently, if α=βγ then either β or γ must be a unit.) On the other hand, the term prime is reserved for (nonzero, nonunit) numbers α which have the property α|βγ implies α|β or α|γ.
Now, in any ring of integers of an algebraic number field, a prime integer (in the new sense) must also be irreducible. This is because if α is not irreducible, then by definition we can write α=βγ, where neither factor is a unit. But if α is prime it must divide one of its factors. Say α divides β. Then β=αδ. Hence 1=δγ. That is, γ is a unit, contrary to assumption, and so α has only trivial factors, so it's irreducible. Thus the set of prime integers is a subset of the set of irreducible integers.
However, in the example we just examined where F=Q(√-5), where we do not have unique factorization, then some irreducible numbers are not prime. E. g. 1+2√-5 is irreducible (as we showed), but it is not prime, because it divides 21, but does not divide either 3 or 7. Therefore it's possible for the set of prime integers to be a proper subset of the set of irreducible integers, i. e. a strictly smaller subset.
This raises some interesting questions. Recall that the cases we are interested in are the rings of algebraic numbers. By definition, these are the rings of integers A=OF of a finite extension F/Q.
In any such A, it is always true that we can write any element as a product of a finite number of irreducible integers. The reason is that for any (nonzero, nonunit) α∈A, if α isn't irreducible, we can write α=βγ, where neither factor is 0 or a unit. Since F/Q is a finite extension, we can always compute norms, and we have NF/Q(α) = NF/Q(β)NF/Q(γ). In some extensions a norm can be negative, but we can also stick in the absolute values of each term, and since no term is ±1, each factor on the right hand side is an an integer in Z that is strictly smaller in absolute value than |NF/Q(α)|. Since all numbers here are finite, this process can't continue indefinitely. So not only do we get a finite product of irreducible integers, but we in fact get a finite product of finite powers of distinct irreducible integers. However, as the example above showed, this factorization need not be unique.
On the other hand, we can also say that if some algebraic integer α can be expressed as a product of powers of distinct prime integers, then (up to order and unit factors), the expression is unique as to which primes occur in the factorization and the powers of each that occur. To prove this, note first that any prime π which appears in one factorization into powers of primes must appear in the other. Because since π is in one factorization, it divides α, and because it is prime, it must divide at least one factor in any other factorization into powers of distinct primes. That factor must then be a power of π, since π can't divide a power of a different prime (using the fact that all primes are irreducible). Furthermore, if π occurs at all in some factorization of α, it must occur to eactly the same power in each factorization. Otherwise if the smaller power has exponent n, then we could cancel πn from both factorizations. That would leave the integer α/πn with distinct prime power factorizations, one containing π and the other not, which we just ruled out.
The problem here is that we do not know that every integer α∈A actually has even one factorization into distinct primes. Consequently, if there can be irreducible integers of A that aren't prime, so the set of prime integers is a proper subset of the set of irreducible integers, we cannot be sure that there is the kind of unique factorization theorem for integers of A that we have for Z, regardless of whether we specify "primes" or "irreducibles". Factorizations into irreducibles can't be guaranteed to be unique, while factorizations into only powers of primes might not even exist.
However, if the set of primes is the same as the set of irreducibles, then factorizations of any integer of A into irreducibles, and hence primes, are guaranteed to exist, And furthermore, the factorizations must also be unique.
What about converses? Suppose we can guarantee that factorizations of any α∈A into primes must exist. Does that imply prime = irreducible? Yes, for the following simple reason. We have already shown that if a factorization into primes exists, it must be unique. So suppose α is irreducible. If a factorization into primes must exist, then α=πβ for some prime π. But because α is irreducible, β has to be a unit. Any prime times a unit is still a prime, so α itself is a prime, and the set of irreducible elements contains only primes.
Or suppose we can guarantee that factorizations of any α∈A into irreducibles are unique. Does that imply prime = irreducible? Again the answer is yes. For suppose α is irreducible and that for some β and γ we have α|βγ. Since α|βγ there is a δ such that αδ = βγ. Write the right and left hand sides as a product of powers of distinct irreducible numbers, so that (ignoring possible factors which are units) αδ1⋅⋅⋅δm = β1⋅⋅⋅βn (except that if α is among the δi, combine the terms). Then by the assumed uniqueness αk = βj for some j and some power k≥1, and both sides are powers of the same irreducible number α. That number must have been a divisor of either β or γ (or both). In any case, this means α is prime.
What we have now proven is this: If A=OF is the ring of integers of a finite extension F/Q, the following conditions are equivalent:
It turns out that there are certain types of rings in which all irreducible elements are prime, so the two concepts are equivalent in such rings. Z is one example of such a ring, but it is not the only one. Certainly, if we could guarantee that the integers of some extension of Q always had some factorization into primes (in the special sense used here), then as we showed, the factorizations must be unique. In order to investigate this issue further, we need help from the theory of ideals of rings of integers. By placing certain conditions on the types of ideals that the ring can have, we will be able to guarantee that any irreducible integer is prime, so that factorizations of any integers into irreducibles (which always exist) are also factorizations into primes, and therefore that they are unique.
One important type of ring that has this property is called a principal ideal domain, which means that every ideal consists of elements that are multiples of a single element (that isn't a unit) by some element of the ring. This is in fact the case with Z, where all ideals are of the form nZ for some n. (The ideal is the full ring Z itself if n=±1.) But there are other rings of integers that are also principal ideal domaings, and a large part of algebraic number theory is about identifying which rings have this property. We'll look into this in much more detail, but first we need to explain further why we care about unique factorization.
The first noteworthy case of this seems to have been Leonhard Euler (1707-83). Number theory was one of a vast number of Euler's interests, and one of the problems he dealt with was what is now known as Fermat's Last Theorem: The equation xn+yn=zn has no nontrivial integer solutions for integers n>2. Pierre de Fermat (1601-65) himself publicly stated this result only in case n=3 or 4. Fermat stated the case for general n only in a private note in his copy of Diophantus' Arithmetica. No written proof of this by Fermat in even the simplest cases is known, but it can easily be proven if n=4 by a technique Fermat invented, called the "method of infinite descent". This method involves assuming a solution exists for some triple (x,y,z) and then showing that there must always be another non-trivial solution (x′,y′,z′) in which at least one of the numbers is strictly smaller than any number of the original solution. Since that's impossible, the contradiction shows there couldn't have been any solution to begin with.
Anyhow, Euler knew the proof when n=4, and set out to give a proof along the same lines when n=3. He had the brilliant and original idea to work with numbers of the form a+b√-3 for a,b∈Z, that is, numbers in Z[√-3]. This was a great idea because it provided an entirely new and powerful set of tools for dealing with questions about ordinary integers. Unfortunately, as prescient as Euler was, there were too many subtleties in this area to use the tools correctly from the start. One step of the argument involved reasoning that if for some c∈Z, c3 = a2+3b2 = (a+b√-3)(a-b√-3), and if the factors a+b√-3 and a-b√-3 are relatively prime, then each of the factors must themselves be cubes of numbers in Z[√-3]. One problem is that Z[√-3] isn't the full ring of integers of Q(√-3), since -3≡1 (mod 4). But even disregarding that, the conclusion depends on unique factorization of the numbers involved. Although it happens to be true that unique factorization holds in the integers of Q(√-3), Euler didn't seem to recognize the need to prove that.
This same lack of clarity about unique factorization in rings of algebraic integers seems to have persisted into the 1840s. In 1847 Gabriel Lamé (1795-1870) thought he had a proof of Fermat's theorem for arbitrary n. Lamé worked with numbers of the form Z[ζn] where ζn is a root of xn-1=0 -- which is called an nth root of unity. (Here we assume n is the smallest integer for which the chosen ζn satisfies the equation.) Z[ζn] is called the ring of cyclotomic integers (for a particular choice of n), and it is in fact the ring of integers of the field Q(ζn), the nth cyclotomic field -- of which we shall have much to say later on. By 1847 some astute mathematicians did recognize the need for proof of unique factorization, and they pointed it out to Lamé. He must have quickly appreciated the problem, since he didn't persist in developing his "proof".
The purported proof went something like this: Suppose there were a solution of xn+yn=zn for some n>2 and integers x, y, and z that are relatively prime. The equation could be rewritten as
xn = zn - yn = ∏1≤k≤n (z - ζnky)Since x∈Z but most of the factors on the right hand side aren't, there would be a clear violation of unique factorization. Unfortunately, such a violation can't be ruled out, so the proof doesn't work. (It does work for those n where Q(ζn) has unique factorization. It wasn't known until 1976 that there are only 29 distinct cyclotomic fields that do have unique factorization.)
Interestingly enough, at almost exactly the same time, Eduard Kummer (1810-93), working independently on questions involving cyclotomic fields, had not only understood the problem of (lack of) unique factorization, but had even started to develop a way around the problem -- what he called the method of "ideal numbers" or "divisors". Kummer had also found examples where unique factorization failed in Z[ζ37]. He wrote a letter to the mathematicians in Paris who were debating Lamé's work, and pretty much put an end to their deliberations. Although Kummer's work was not solely concerned with Fermat's Last Theorem, he made what were some of the most significant partial solutions to the problem, and in the process played a huge role in advancing algebraic number theory. His work also led to the theory of ideals as discussed here. Much of what Kummer tried to do was to find "ideal" numbers, of some sort, for which unique factorization could be proven, so that as above a contradiction would arise if Fermat's equation had a solution.
Because of the importance of unique factorization, an integral domain which has unique factorization is called a unique factorization domain, or UFD. A little more precisely, an integral domain R is a UFD if there is a set P of irreducible elements such that every nonzero element α of R can be written in a unique way as a finite product α=u∏1≤k≤n pkek, where u is a unit, n is a nonnegative integer, pk are distinct elements of P, and exponents ek are nonnegative integers (all of which depend on α).
The defining property of a UFD is pretty clear and understandable, but it is not expressed in the language of ideals. As we shall see, a great deal of what we want to know about algebraic numbers can be expressed in terms of ideals, so we'd like to know how unique factorization fits in. To this end, we have this centrally important concept: An integral domain R is called a principal ideal domain (PID for short) in case every ideal I⊆R is equal to an ideal of the form (α)=αR for some α∈R.
It is a fact, which is not difficult to prove, but not immediately obvious either, that every principal ideal domain is a unique factorization domain. So if we want to show that a given integral domain R has unique factorization, it is sufficient to show that R is a PID. Unfortunately, among all integral domains there are some which are UFDs but not PIDs. So R can be a UFD even if it is not a PID – being a PID is not a necessary condition. The class of UFDs contains the class of PIDs, but is strictly larger. For instance, if F is a field, the ring of polynomials in one variable F[x] is a PID (and a UFD). (The reason this is true will be mentioned in a moment.) However, the ring of polynomials in two variables F[x,y] is a UFD but not a PID.
Obviously, it would be convenient to have a simple criterion to determine whether a domain R is a PID (and hence a UFD). It turns out that the the process of division-with-remainder that can be performed in Z and in polynomial rings in one variable fills the bill. All that's needed is an integer valued function on the domain R with certain properties. For a∈R, let this function be written as |a| (since it is much like an absolute value). This function should have three properties:
Unfortunately, even among rings of integers of quadratic fields, only finitely many are Eucldean. If F=Q(√d) and OF is the ring of integers, it is known that for d<0 the ring is Euclidean only when d=-1, -2, -3, -7, or -11. If d>0, the number of rings which are Euclidean is larger, but still finite.
This may seem rather disappointing, but in fact the quadratic fields where OF is a PID are also quite scarce. If d<0, then in addition to the Euclidean cases, the only other values are d=-19, -43, -67, and -163. The proof that this is a complete list for d<0 is quite difficult and was not satisfactorily done until 1966.
The situation with d>0 is even more difficult. It is not actually known whether there are only finitely many d>0 such that OQ(√d) is a PID. Gauss himself conjectured that there are infinitely many, but this is still an important open question.
Returning to concepts, we recall that among all integral domains, the class of PIDs is strictly smaller than the class of UFDs. It turns out that there is a subclass of all integral domains in which the notions of UFD and PID are equivalent. In fact, this is an important class, because it includes all rings of algebraic integers. This class itself can be defined by a number of equivalent conditions. But to explain this, we need to discuss the group of fractional ideals of an integral domain.
In a situation like this, it's natural (for a mathematician anyhow) to wonder what conditions on R would make the set of its ideals into a full group -- that is, how the inverse of an ideal might be defined. It turns out that for integral domains the conditions are beautiful and everything one could hope for.
To get an idea of where to start, consider the principal ideal domain Z. For any nonzero n∈Z, the obvious thing to consider is (1/n) = {m/n | m∈Z}. That's sort of like an ideal, since it's a commutative group under addition, and m(1/n)⊆(1/n) for all m∈Z. Also, under the obvious definition, (n)(1/n) = Z (since the product contains 1). So (1/n) surely acts like the "inverse" of the ideal (n) of Z for any n.
Suppose R is any commutative ring with an identity and M is any set at all (not necessarily related directly to R) where one can define an operation of multiplication rm=mr for r∈R and m∈M. Suppose further that:
Given all that, if R is an integral domain, whatever a fractional ideal of R might be, it certainly should be an R-module. Indeed, we can formally define a fractional ideal of R as an R-module M such that:
Multiplication of fractional ideals is defined just like multiplication of ideals: M⋅M′= {∑1≤k≤n mkm′k | mk∈M, m′k∈M′, n∈Z, n>0}. With this definition, the set of fractional ideals of R is a monoid. The question is: what conditions on R will guarantee that fractional ideals form a group? This is not just a matter of idle curiosity, because it turns out that for rings with the right properties, one has unique factorization in the group of fractional ideals, and in the set of integral (i. e. ordinary) ideals as well. For rings of algebraic integers, which just happen to have the right properties, this unique factorization of ideals is almost as good, for many purposes, as having unique factorization of ring elements themselves.
We need just a few more concepts before we can state the necessary conditions. A proper ideal of R is an ideal I that is not equal to R, i. e. a proper subset. One writes I⊂R. A prime ideal P is a proper ideal such that for all a,b∈R, ab∈P only if either a∈P or b∈P (or both). In Z, for example, (6) isn't a prime ideal, since 2⋅3∈(6), but 2∉(6) and 3∉(6). A maximal ideal is a proper ideal P that is not properly contained in some other proper ideal P′. The integral domain R, with field of fractions F, is said to be integrally closed if every α∈F that is integral (i. e. an algebraic integer of F) over R is actually an element of R. For example, if R=Z, then F=Q, and to say α∈R is integral over F means f(α)=0 for some monic polynomial f(x) with coefficients in R, i. e. f(x)∈Z[x]. If α=a/b for a,b∈Z, then when you "clear fractions" in f(a/b)=0, you find b|a. This argument applies in any UFD, so in fact any UFD is integrally closed.
Lastly, we need a type of finiteness condition. An ideal I is said to be finitely generated if it has the form I = {∑x∈S axx | ax∈R for all x∈S}, where S⊆R is a finte set of generators. It is much like a principal ideal, except for having n=#(S) generators instead of 1. If all ideals of a ring are finitely generated, the ring is said to be Noetherian, after Emmy Noether (1882-1935). There are several equivalent characterizations of Noetherian rings. For instance, R is Noetherian if and only if every nonempty family of ideals of R has a maximal element (which contains all the other members of the family) with respect to inclusion.
Finally we can state the crucial result: If R is an integral domain, then the following are equivalent:
Any integral domain R that has one of these properties has all of them, and is called a Dedekind domain, after Richard Dedekind (1831-1916). It isn't hard to show that if F⊇Q is a finite field extension, then the ring OF of algebraic integers of F has the properties listed in the first item, and so OF is a Dedekind domain and has the other properties also.
As rings, Dedekind domains have some very nice properties. For example, if R is a Dedekind domain:
Dedekind, Emmy Noether, and a few others were the main developers of ideal theory in this form, and Dedekind was a leading figure in the theory of algebraic numbers in general. Ernst Kummer (1810-1893) somewhat earlier had a more primitive theory of "ideal numbers" which provided a kind of unique factorization of algebraic numbers, but Dedekind made the theory much simpler and more general.
We will shortly see some of the fruits of that effort.
If P is a prime ideal of OF, we have no good reason to expect that P⋅OE is prime in OE, and in fact it generally is not. For example, let's look at quadratic extensions of Q. So suppose F=Q, d is a square-free integer, positive or negative, and d ≠ 0 or 1. Let p∈Z be a positive prime. We can ask whether (the ideal corresponding to) p is still a prime ideal of the ring of integers of E=Q(√d). That is, we can ask what happens to the prime, principal ideal (p) in OQ(√d). (With principal ideals, we can usually get away without specifying what ring they are contained in.)
Consider the Diophantine equation ±p=a2-b2d. If we can solve it by finding a,b∈Z that make the equation true, then we can verify that we have a factorization of the ideal (p) in OQ(√d) expressed by the equation of principal ideals (p)=(a+b√d)(a-b√d). (Proof: p∈(a+b√d)(a-b√d) since ±p=a2-b2d, so (p)⊆(a+b√d)(a-b√d). But by the definition of a product of ideals, all members of (a+b√d)(a-b√d) are a product of an element of OQ(√d) and the number [a+b√d][a-b√d]=±p, and so (a+b√d)(a-b√d)⊆(p).)
So solutions of a certain Diophantine equation tell us about how an ideal (p) of Z factors in the integers of a quadratic extension. And in fact, if the equation can be solved, then the prime ideal (p) of Z is not also a prime ideal of OQ(√d). Is there a converse, that is, can we infer solutions of a Diophantine equation from how ideals factor in extensions? Unfortunately, the situation is more complicated. For instance, if OQ(√d) is not a PID, then it is possible for (p) to not be a prime ideal of OQ(√d), yet there may be no solutions of ±p=a2-b2d. One of the reasons that algebraic number theory is useful and interesting is that it helps get a handle on the complications, as we shall see.
For simplicity, suppose d≡3 (mod 4). Then we have remarked that the integers of Q(√d) are given by OQ(√d) = {a+b√d | a,b∈Z}. Suppose we could find some integer α∈Q(√d) such that the norm NQ(√d)/Qα = ±p. (For short, write Nα for the norm.) So if α=a+b√d with a,b∈Z, we have ±p=Nα=a2-b2d. When this happens, we can write ±p=(a+b√d)(a-b√d). Since Nα is a prime (in Z), α=