Polynomials in one variable: the combinatorial representation of equations

Polynomials with one variable are mathematical representations of the expressions used in polynomial equations. They allow algebraic methods to be applied to solving these equations.

1. Equations are “linguistic” objects

1.1 Polynomial equations and number systems

A polynomial equation is an equation of the form \(a_nx^n+a_{n-1}x^{n-1}+\ldots+a_1x+a_0=0\), where \(a_0,\ldots,a_n\) are classically numbers, with \(a_n\) non-zero. Informally, a polynomial is an expression \(a_nx^n+a_{n-1}x^{n-1}+\ldots+a_1x+a_0\), i.e. the left-hand side of such an equation. But we need to give them a mathematical definition, so that we can manipulate them as mathematical objects and apply elementary algebraic properties to them.

Example 1
For example, the equation \(x^2-x-1=0\), whose coefficients are \(-1,-1,1\), has as solutions the golden ratio \(\dfrac{1+\sqrt 5}{2}\) and the number \(\dfrac{1-\sqrt 5}{2}\). The equation \(x^2+1=0\) has no solution in the set \(\mathbb R\) of real numbers (see What is a real number?), but has two solutions in the set \(\mathbb C\) of complex numbers, the famous imaginary numbers \(i\) and \(-i\) (see What is a complex number?).

Classically, the ‘coefficients’ of such a polynomial equation are taken from a ‘system of numbers’: integers, rational numbers, real numbers, complex numbers. In fact, to write such an equation all you need is a set on which are defined an addition \( +\) and a multiplication \(\times\) with properties similar to those found in these number systems. Such a set is called a unitary commutative ring (see Ring theory).

1.2. Problem: study solutions in general

The degree of a polynomial equation is the natural number \(n\), i.e. the largest index of a non-zero coefficient in the equation, or the highest power of the variable \(x\) appearing in the equation. We know methods for solving equations of degree \(2\) (using the discriminant \(\Delta=b^2-4ac\), as we learn in high school), \(3\) and \(4\) and certain equations of degree \(\geq 5\), in certain number systems. For example, since Lagrange we know that an equation of the type \(x^3+px+q=0\), with \(p, q\in\mathbb C\), has three solutions, which can be obtained by extracting cubic roots and using the complex number \(j=e^{\frac{2i\pi}{3}}=-\dfrac 1 2 +i\dfrac{\sqrt 3}{2}\).

However, there exists no generic method for solving all equations in all rings and all degrees simultaneously. For a theoretical study of polynomial equations in general, we consider polynomials instead, i.e. the expressions \(a_nx^n+a_{n-1}x^{n-1}+\ldots+a_1x+a_0\) themselves, and all the values they take on each element of the system: the “roots” of a polynomial are those for which the expression equals $0$, in other words the solutions of the associated equation.

Example 2
The numbers \(i\) and \(-i\) are the roots of the polynomial \(x^2+1\) in the set \(\mathbb C\) (since \(i^2+1=(-i)^2+1=-1+1=0\)) whereas this polynomial has no roots in the set \(\mathbb R\) (since \(a^2\geq 0 >-1\) for any real number \(a\)).

2. Polynomials are mathematical objects

2.1. Representing expressions by sequences

In order to study polynomials systematically using mathematical methods, they must be mathematical objects. We have defined them as expressions, i.e. linguistic objects. We therefore need to transform them, or rather represent them as mathematical objects, which is possible thanks to the resources of elementary combinatorics.

Since a polynomial is, informally, an expression of the form \(a_nx^n+a_{n-1}x^{n-1}+\ldots+a_1x+a_0,\) we will simply represent such an expression, known as a “polynomial”, by the sequence \(a_0,a_1,\ldots,a_{n-1},a_n\) of its coefficients, in ascending order of indices. For convenience, it is represented as an infinite sequence of elements of $A$, by adding \(0\) after the last non-zero coefficient \(a_n\). A polynomial with coefficients in a ring \(A\) is therefore, strictly speaking, a sequence \(a_0,a_1,\ldots,a_n,\ldots\) of elements of \(A\) the terms of which are zero from a certain “rank”.

For example, the polynomial expression \(x+\pi x^3-e x^6\), whose coefficients are real numbers, is represented by the polynomial \(0,1,0,\pi,0,0,e,0,\ldots\), with coefficients in the ring \(\mathbb R\) of real numbers (the coefficients of the powers of orders \(0,2,4,5\) are zero).

2.2. Addition of polynomials

In a polynomial expression \(f(x)=a_nx^n+a_{n-1}x^{n-1}+ldots + a_1x+a_0\), the “indeterminate” \(x\) is a generic element which can be replaced by any value \( c\): we then obtain the value \(f(c)\) of the expression \(f\) in \(c\), which is the result of the calculation \(a_nc^n+a_{n-1}c^{n-1}+\ldots+a_1c+a_0\). In other words, the variable behaves, with respect to the operations \(+\) and \(\times\), like any element of the ring \(A\) where the coefficients of the expression are chosen.

Example 3
If $f(x)$ is the polynomial expression $2x^5+x^3-9x^2+4$, the value $f(1/3)$ of $f(x)$ in $x=1/3$ is $2\times (1/3)^5+(1/3)^3-9(1/3)^2+4=(2/243)+(1/27)-1+4=740/243$.

In particular, if \(g(x)=b_mx^m+b_{m-1}x^{m-1}+\ldots+b_1x+b_0\) is another polynomial expression, again replacing \(x\) with \(c\) gives the value \(g(c)=b_mc^m+b_{m-1}c^{m-1}+\ldots+b_1c+b_0\). For example, suppose that \( \(ngeq m\)): if we add \(f(c)\) and \(g(c)\), we get \((a_n+b_n)c^n+(a_{n-1}+b_{n-1})c^{n-1}+\ldots +(a_1+b_1)c+(a_0+b_0)\). In other words, to calculate the sum of the values \(f(c)\) and \(g(c)\), all we have to do is add the coefficients of the same index of the two polynomial expressions, to obtain a new expression \[h(x)=(a_n+b_n)x^n+(a_{n-1}+b_{n-1})x^{n-1}+\ldots +(a_1+b_1)x+(a_0+b_0)\] and calculate the value of \(h\) at \(c\)!

If we translate this into polynomial terms, it means that we can represent the addition of two polynomial expressions by adding up the polynomials, simply by adding up the coefficients that have the same place in the sequence. For example, the sum of the polynomial \(P=5,-2,0,4\) (which represents the expression \(4x^3-2x+5\)) and the polynomial \(Q=0,9,7\) (which represents the expression \(7x^2+9x\)), is the polynomial \(P+Q=5,7,7,4\), which represents the expression \(4x^3+7x^2+7x+5\).

2.3. The convolution product

In the same way, when we multiply the two previous polynomial expressions \(f(x)\) and \(g(x)\) by considering the variable \(x)\) as a generic value, we obtain a new polynomial expression \(u(x)\), and by replacing \(x\) by a given value \(c\), the result obtained, i.e. \(u(c)\), is the value of the product of \(f(c)\) and \(g(c)\). However, in this case it is not so easy, as with addition, to multiply coefficients of the same index together to obtain \(u(x)\): the multiplication is more complex, because the products contain “carryovers”, i.e. powers of the indeterminate which “raise” the rank of the coefficients.

Example 4
If \(f(x)=x^2+2x-3\) and \(g(x)=-5x+7\), their multiplication gives, by expanding, \((x^2+2x-3)(-5x+7)=-5x^3+7x^2-10x^2+14x+15x-21=-5x^3-3x^2+29x-21\).

It is possible to give a rigorous description of how to “combine” the coefficients of the two starting expressions to obtain those of their product. In fact, such a description amounts to describing the multiplication directly on the polynomials, and this is what we call a “convolution” product, to distinguish it from a “coefficient by coefficient” product.

Thus, if two polynomials are given as the sequences \(P=(a_0,a_1,\ldots,a_{n-1},a_n,0,\ldots)\) and \(Q=(b_0,b_1,\ldots,b_{m-1},b_m,0\ldots)\), we define their product \(P\times Q\) as the polynomial \((c_0,c_1,\ldots,c_{p-1},c_p,0,\ldots)\) whose coefficient \(c_k\) of index \(k\) is given by the following formula: \(c_k=\sum_{i+j=k} a_ib_j\). This formula means that to obtain \(c_k\), we add all the products of the form \(a_i\times b_j\) of a coefficient \(a_i\) of \(P\) and a coefficient \(b_j\) of \(Q\) for which the sum of the indices \(i+j\) is \(k\). This is a combinatorial translation of the intuitive multiplication of generic expressions, and the operations \(+\) and \(\times\) thus defined turn the set of these polynomials (with coefficients in a ring \(A\)) into a new ring (commutative, unitary) denoted \(A[X]\) (read “\(A\) bracket \(X\)”).

3. The indeterminate \(X\) and the search for the roots

3.1 Representation of the variable

In \(A[X]\), any element of \(A\) can be represented as the polynomial \((a,0,\ldots)\): we say that we have ” embedded ” \(A\) into \(A[X]\). This corresponds to the fact that we can represent, for example, any real number \(a\), as a “constant” polynomial expression whose value is \(a\). In a polynomial expression, the “variable”, \(x\) for example, is often called an “unknown”, meaning that its value is not known or fixed. Polynomial theory also allows this unknown variable to be represented as a particular polynomial, called the indeterminate, and denoted by \(X\). We must therefore distinguish between the unknown – which is a linguistic object – and the indeterminate – which is a mathematical object.

To represent the unknown \(x\) as a polynomial, it is sufficient to consider it as a polynomial expression, i.e. \(1.x + 0,\) in other words to represent it by the polynomial \(X=(0,1,0,\ldots)\). Using the convolution product rules, we can see that the powers of \(X\) are represented by sequences where \( 1\) only appears once, at the position corresponding to the power in question! For example, \(X^2=(0,0,1,0,\ldots)\) (the \(1\) appears in position \(2\), since we start at the index \(0\)…), and \(X^3=(0,0,0,1,0,\ldots)\), and so on. By convention, we have \(X^0=1\), i.e. the polynomial \((1,0,\ldots)\).

Thanks to this particular representation of the unknown \(x\) of the equations by the indeterminate \(X\) of the polynomials, it is then possible to find the original formulation of the polynomial expressions. Indeed, with the definitions of addition and convolution product of polynomials, a given polynomial \(P=(a_0,a_1,\ldots,a_{n-1},a_n,0, \ldots)\) can be written in the form \(a_0+a_1X+\ldots+a_{n-1}X^{n-1}+a_nX^n\), since we can add and multiply polynomials, and therefore in particular the elements of \(A\) and the powers of \(X\). In a way, we are back where we started – with polynomial expressions – but this time we are manipulating a mathematical object, to which we can now apply mathematical theory itself!

3.2. Roots of a polynomial and solutions of an equation

Since we’ve replaced polynomial expressions with polynomials, solving a polynomial equation now involves finding the ‘values’ for which a given polynomial \(P=a_0+a_1X+\ldots+a_{n-1}X^{n-1}+a_n X^n\) equals zero. To do this, we need to define rigorously what the value or evaluation is, which we note \(P(b)\), of such a polynomial \(P\) at an element \(b\) of the ring \(A\), where the coefficients \(a_0,a_1,\ldots,a_{n-1},a_n\) are taken. This is done by decreeing that \(P(b)=a_0+a_1\times b+a_2\times b^2+\ldots +a_{n-2}\times b^{n-2}+a_{n-1}\times b^{n-1}+a_n\times b^n\). In other words, we ‘replace’ the indeterminate \(X\), as with polynomial expressions, with the chosen \(b\) value. But this is now a mathematical procedure, and to be rigorous, this definition must be made by induction on the degree of the polynomial.

Example 5
As in Example 3, let $P=(-3/2)X^4+5X^2-X-7$, i.e. $P=(-7,-1,5,0,-3/2,0,\ldots)$: we have $P(-2)=(-3/2)(-2)^4+5(-2)^2-(-2)-7=(-3/2).16+5.4+2-7=-9$.

It might seem that we haven’t gained anything from polynomial expressions. However, since polynomials are mathematical objects, by defining the evaluation of a polynomial \(P\) at an element \(b\) of \(A\) one defines an application (a function) \(e\) from \(A[X]\) into \(A\), which is of course not possible with “expressions”… The advantage of this function is that it is a ‘ring homomorphism’ (it preserves addition and multiplication, which means that if \(Q\) is another polynomial, we have \((P+Q)(b)=P(b)+Q(b)\) and \((P.Q)(b)=P(b).Q(b)\)). We now have the resources of the algebraic theory of rings to study the polynomials of \(A[X]\) and their roots.

Note that, as is often the case when we formalise things in mathematics, defining polynomials as genuine mathematical objects comes at the price of changing our point of view. Looking for the solutions to a polynomial equation \(a_nx^n+a_{n-1}x^{n-1}+ldots +a_1x+a_0=0\) involves looking at the different values of the unknown \(x\) for which this equation, which is ‘fixed’, is verified. In contrast, working with the evaluation of polynomials involves fixing a value \(b\) of the indeterminate \(X\) and studying the values of all the polynomials \(P\) in \(b\), to identify those for which \(P(b)=0\).

0 Comments

Submit a Comment