All three of the matrix factorizations discussed in this section make use of *triangular* matrices, where all the elements either above or below the diagonal are zero. Systems of linear equations involving triangular matrices are easily and quickly solved using either *forward* or *back substitution*.

The Cholesky factorization expresses a symmetric matrix as the product of a triangular matrix and its transpose

*A* = *R*′*R*,

where *R* is an upper triangular matrix.

Not all symmetric matrices can be factored in this way; the matrices that have such a factorization are said to be positive definite. This implies that all the diagonal elements of *A* are positive and that the off-diagonal elements are “not too big.” The Pascal matrices provide an interesting example. Throughout this chapter, the example matrix `A`

has been the 3-by-3 Pascal matrix. Temporarily switch to the 6-by-6:

A = pascal(6) A = 1 1 1 1 1 1 1 2 3 4 5 6 1 3 6 10 15 21 1 4 10 20 35 56 1 5 15 35 70 126 1 6 21 56 126 252

The elements of `A`

are binomial coefficients. Each element is the sum of its north and west neighbors. The Cholesky factorization is

R = chol(A) R = 1 1 1 1 1 1 0 1 2 3 4 5 0 0 1 3 6 10 0 0 0 1 4 10 0 0 0 0 1 5 0 0 0 0 0 1

The elements are again binomial coefficients. The fact that `R'*R`

is equal to `A`

demonstrates an identity involving sums of products of binomial coefficients.

The Cholesky factorization also applies to complex matrices. Any complex matrix that has a Cholesky factorization satisfies

*A′* = *A*

The Cholesky factorization allows the linear system

*Ax* = *b*

to be replaced by

*R*′*R**x* = *b*.

Because the backslash operator recognizes triangular systems, this can be solved in the MATLAB^{®} environment quickly with

x = R\(R'\b)

If `A`

is *n*-by-*n*, the computational complexity of `chol(A)`

is O(*n*^{3}), but the complexity of the subsequent backslash solutions is only O(*n*^{2}).

LU factorization, or Gaussian elimination, expresses any square matrix *A* as the product of a permutation of a lower triangular matrix and an upper triangular matrix

*A* = *LU*,

where *L* is a permutation of a lower triangular matrix with ones on its diagonal and *U* is an upper triangular matrix.

The permutations are necessary for both theoretical and computational reasons. The matrix

$$\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right]$$

cannot be expressed as the product of triangular matrices without interchanging its two rows. Although the matrix

$$\left[\begin{array}{cc}\epsilon & 1\\ 1& 0\end{array}\right]$$

can be expressed as the product of triangular matrices, when *ε* is small, the elements in the factors are large and magnify errors, so even though the permutations are not strictly necessary, they are desirable. Partial pivoting ensures that the elements of *L* are bounded by one in magnitude and that the elements of *U* are not much larger than those of *A*.

For example:

[L,U] = lu(B) L = 1.0000 0 0 0.3750 0.5441 1.0000 0.5000 1.0000 0 U = 8.0000 1.0000 6.0000 0 8.5000 -1.0000 0 0 5.2941

The LU factorization of `A`

allows the linear system

A*x = b

to be solved quickly with

x = U\(L\b)

Determinants and inverses are computed from the LU factorization using

det(A) = det(L)*det(U)

and

inv(A) = inv(U)*inv(L)

You can also compute the determinants using `det(A) = prod(diag(U))`

, though the signs of the determinants might be reversed.

An *orthogonal* matrix, or a matrix with orthonormal columns, is a real matrix whose columns all have unit length and are perpendicular to each other. If *Q* is orthogonal, then

*Q*′*Q* = 1.

The simplest orthogonal matrices are two-dimensional coordinate rotations:

$$\left[\begin{array}{cc}\mathrm{cos}(\theta )& \mathrm{sin}(\theta )\\ -\mathrm{sin}(\theta )& \mathrm{cos}(\theta )\end{array}\right].$$

For complex matrices, the corresponding term is *unitary*. Orthogonal and unitary matrices are desirable for numerical computation because they preserve length, preserve angles, and do not magnify errors.

The orthogonal, or QR, factorization expresses any rectangular matrix as the product of an orthogonal or unitary matrix and an upper triangular matrix. A column permutation might also be involved:

*A* = *QR*

or

*AP* = *QR*,

where *Q* is orthogonal or unitary, *R* is upper triangular, and *P* is a permutation.

There are four variants of the QR factorization—full or economy size, and with or without column permutation.

Overdetermined linear systems involve a rectangular matrix with more rows than columns, that is *m*-by-*n* with *m* > *n*. The full-size QR factorization produces a square, *m*-by-*m* orthogonal *Q* and a rectangular *m*-by-*n* upper triangular *R*:

C=gallery('uniformdata',[5 4], 0); [Q,R] = qr(C) Q = 0.6191 0.1406 -0.1899 -0.5058 0.5522 0.1506 0.4084 0.5034 0.5974 0.4475 0.3954 -0.5564 0.6869 -0.1478 -0.2008 0.3167 0.6676 0.1351 -0.1729 -0.6370 0.5808 -0.2410 -0.4695 0.5792 -0.2207 R = 1.5346 1.0663 1.2010 1.4036 0 0.7245 0.3474 -0.0126 0 0 0.9320 0.6596 0 0 0 0.6648 0 0 0 0

In many cases, the last *m – n* columns of *Q* are not needed because they are multiplied by the zeros in the bottom portion of *R*. So the economy-size QR factorization produces a rectangular, *m*-by-*n* *Q* with orthonormal columns and a square *n*-by-*n* upper triangular *R*. For the 5-by-4 example, this is not much of a saving, but for larger, highly rectangular matrices, the savings in both time and memory can be quite important:

[Q,R] = qr(C,0) Q = 0.6191 0.1406 -0.1899 -0.5058 0.1506 0.4084 0.5034 0.5974 0.3954 -0.5564 0.6869 -0.1478 0.3167 0.6676 0.1351 -0.1729 0.5808 -0.2410 -0.4695 0.5792 R = 1.5346 1.0663 1.2010 1.4036 0 0.7245 0.3474 -0.0126 0 0 0.9320 0.6596 0 0 0 0.6648

In contrast to the LU factorization, the QR factorization does not require any pivoting or permutations. But an optional column permutation, triggered by the presence of a third output argument, is useful for detecting singularity or rank deficiency. At each step of the factorization, the column of the remaining unfactored matrix with largest norm is used as the basis for that step. This ensures that the diagonal elements of *R* occur in decreasing order and that any linear dependence among the columns is almost certainly be revealed by examining these elements. For the small example given here, the second column of `C`

has a larger norm than the first, so the two columns are exchanged:

[Q,R,P] = qr(C) Q = -0.3522 0.8398 -0.4131 -0.7044 -0.5285 -0.4739 -0.6163 0.1241 0.7777 R = -11.3578 -8.2762 0 7.2460 0 0 P = 0 1 1 0

When the economy-size and column permutations are combined, the third output argument is a permutation vector, rather than a permutation matrix:

[Q,R,p] = qr(C,0) Q = -0.3522 0.8398 -0.7044 -0.5285 -0.6163 0.1241 R = -11.3578 -8.2762 0 7.2460 p = 2 1

The QR factorization transforms an overdetermined linear system into an equivalent triangular system. The expression

norm(A*x - b)

equals

norm(Q*R*x - b)

Multiplication by orthogonal matrices preserves the Euclidean norm, so this expression is also equal to

norm(R*x - y)

where `y = Q'*b`

. Since the last *m*-*n* rows of *R* are zero, this expression breaks into two pieces:

norm(R(1:n,1:n)*x - y(1:n))

and

norm(y(n+1:m))

When `A`

has full rank, it is possible to solve for `x`

so that the first of these expressions is zero. Then the second expression gives the norm of the residual. When `A`

does not have full rank, the triangular structure of `R`

makes it possible to find a basic solution to the least-squares problem.

MATLAB software supports multithreaded computation for a number of linear algebra and element-wise numerical functions. These functions automatically execute on multiple threads. For a function or expression to execute faster on multiple CPUs, a number of conditions must be true:

The function performs operations that easily partition into sections that execute concurrently. These sections must be able to execute with little communication between processes. They should require few sequential operations.

The data size is large enough so that any advantages of concurrent execution outweigh the time required to partition the data and manage separate execution threads. For example, most functions speed up only when the array contains several thousand elements or more.

The operation is not memory-bound; processing time is not dominated by memory access time. As a general rule, complicated functions speed up more than simple functions.

`lu`

and `qr`

show significant increase in speed on large double-precision arrays (on order of 10,000 elements).