# Shortly on projectors

x k 2 2 + k (I P) x k 2 2) k x k 2 2 k P x k 2 2) k P x k 2 k x k 2 2 1; 8 x 2 C n. However, for ex 2 R (P) there holds k P ex k 2 2 k e x k 2 2 = 1. ...

Projectors and properties Definitions: Consider C n and a mapping P : C n → C n . P is called a projector if P 2 = P (i.e. P is idempotent). If P is a projector, then I − P is also such:

Shortly on projectors

(I − P )2 = I − 2P + P 2 = I − P. N (P ) = {x ∈ C n : P x = 0} (null space (kernel) of P ) R(P ) = {Ax : x ∈ C n } (range of P ).

A subspace S is called invariant under a square matrix A whenever AS ∈ S .

TDB − NLA

TDB − NLA – p. 1/64

– p. 2/64

Properties: P1: N (P )

T

Properties:

R(P ) = {0}. Indeed,

P4: Given two subspaces K and L of same dimension m, the following two conditions are mathematically equivalent: (i) No nonzero vector in K is orthogonal to L (ii) ∀x ∈ C n ∃ unique vector y : y ∈ K, x − y ∈ L⊥ . Proof (i)⇒(ii): T L ⊥ K L⊥ = {∅} ⇒ C n = K L ⇒ ∀x ∈ C n : x = y + z, where y ∈ K and z ∈ L⊥ . Thus, z = x − y ⇒ (ii).

if x ∈ R(P ) ⇒ ∃y : y = P x ⇒ P y = P 2 x = P x ⇒ y = x ⇒ x = Px

If x ∈ N (P ) ⇒ P x = 0 ⇒ x = P x ⇒ x = 0. P2: N (P ) = R(I − P ) x ∈ N (P ) ⇒ P x = 0. Then x = Ix − P x = (I − P )x. y ∈ R(I − P ) ⇒ y = (I − P )y ⇒ P y = 0. L P3: C n = R(P ) N (P ). TDB − NLA

TDB − NLA – p. 3/64

– p. 4/64

Properties (cont.) P6: If P is orthogonal, then kP k = 1. Proof x = P x + (I − P )x = y − z. Then (y, z) = 0 : (P x, (I − P )x) = (P x, x) − (P x, P x) = (P x, x) − (P x, x) = 0. ⇒ kxk22 = kP xk22 + k(I − P )xk22

P5: Orthogonal and oblique projectors P is orthogonal if N (P ) = R(P )⊥ . Otherwise P is oblique. Thus, if P is orthogonal onto K , then P x ∈ K and (I − P )x ⊥ K . Equivalently, ((I − P )x, y) = 0, ∀y ∈ K . x

⇒ kxk22 ≥ kP xk22 ⇒

x−Px

kP xk22 kxk22

≤ 1, ∀x ∈ C n .

However, for x e ∈ R(P ) there holds

ek22 kP x ek22 kx

= 1. Thus, kP k = 1.

P7: Any orthogonal projector has only two eigenvalues 0 and 1. Any vector from R(P ) is an eigenvector to λ = 1. Any vector from N (P ) is an eigenvector to λ = 0.

K Px

TDB − NLA

TDB − NLA – p. 5/64

– p. 6/64

Iterative solution methods ➾ Steepest descent

Theorem 1 Let P be orthogonal onto K . Then for any vector x ∈ C n there holds min kx − yk2 = kx − P xk2 .

y∈K

(1)

➾ ORTHOMIN ➾ Minimal residual method (MINRES)

Proof For any y ∈ K , P x − y ∈ K , P x ∈ K , (I − P )x ⊥ K kx − yk22 = k(x − P x) + (P x − y)k22 = kx−P xk22 +kP x−yk22 +2(x−P x, P x−y) = kx−P xk22 +kP x−yk22 . Therefore, kx − yk22 ≥ kx − P xk22 ∀y ∈ K and the minimum is reached for y = P x.

➾ Generalized minimal residual method (GMRES) ➾ Lanczos method ➾ Arnoldi method ➾ Orthogonal residual method (ORTHORES)

Corollary 1 Let K ⊂ C n and x ∈ C n be given. Then min kx − yk2 = kx − y∗ k2 is equivalent to y∗ ∈ K and x − y∗ ⊥ K . y

TDB − NLA

TDB − NLA – p. 7/64

– p. 8/64

Iterative solution methods

➾ Full orthogonalization method (FOM)

Projection-based iterative methods

➾ Incomplete orthogonalization method (IOM) ➾ SYMMLQ ➾ Biconjugate gradient method (BiCG) ➾ BiCGStab ➾ Conjugate gradients squared (CGS) ➾ Minimal residual method (MR) ➾ Quasiminimal residual method ➾ ··· TDB − NLA

TDB − NLA – p. 9/64

– p. 10/64

General framework – projection methods Notations: x e = x0 + δ - (δ - correction) r0 = b − Ax0 (r0 - residual)

Want to solve b − Ax = 0 , b, x ∈ Rn , A ∈ Rn×n . Instead, choose two subspaces L ⊂ Rn and K ⊂ Rn and

∗ find δ ∈ K , such that r0 − Aδ ⊥ L

∗ find x e ∈ x(0) + δ , δ ∈ K , such that b − Ax e⊥L K - search space L - subspace of constraints ∗ - basic projection step

The framework is known as Petrov-Galerkin conditions. There are two major classes of projection methods: orthogonal - if K ≡ L, TDB − NLA

oblique - if K 6= L.

TDB − NLA – p. 11/64

– p. 12/64

Matrix formulation, cont.

Matrix formulation Choose a basis in K and L: V = {v1 , v2 , · · · , vm } and W = {w1 , w2 , · · · , wm }. Then, x e = x0 + δ = x0 + V y for some y ∈ Rm .

x e = x0 + V (W T AV )−1 W T r0

The matrix W T AV will be small and, hopefully, with a nice structure.

The orthogonality condition can be written as (∗∗) W T (r0 − AV y)

!!! W T AV should be invertible.

which is exactly the Petrov-Galerkin condition. From (∗∗) we get W T r0 = W T AV y y = (W T AV )−1 W T r0 x e = x0 + V (W T AV )−1 W T r0 In practice, m < n, even m ≪ n, for instance, m = 1. TDB − NLA

TDB − NLA – p. 13/64

– p. 14/64

A prototype projection-based iterative method: Given Until

Plan: (1) Consider two important cases: L = K and L = AK (2) Make a special choice of K .

x(0) ; x = x(0) convergence do: Choose K and L Choose basis V in K and W in L Compute r = b − Ax y = (W T AV )−1 W T r x=x+Vy

Degrees of freedom: m, K, L, V, W . Clearly, if K ≡ L, then V = W .

TDB − NLA

TDB − NLA – p. 15/64

– p. 16/64

Property 1:

Property 1:

Theorem 2 Let A be square, L = AK . Then a vector x e is an oblique projection on K orthogonally to AK with a starting vector x0 if and only if x e minimizes the 2-norm of the residual over x0 + K , i.e., kr − Ax e k2 =

min kr − Axk2 .

x∈x0 +K

AK

(2)

b−Ax

Thus, the residual decreases monotonically.

b

Ax

Referred to as minimal residual methods CR, GCG, GMRES, ORTHOMIN

K ~ b−Ax ~ Ax

TDB − NLA

TDB − NLA – p. 17/64

– p. 18/64

Example: m = 1

Property 2: Theorem 3 Let A be symmetric positive definite, i.e., it defines a scalar product (A·, ·) and a norm k · kA . Let L = K , i.e., r0 − Ax e ⊥ K . Then a vector x e is an orthogonal projection onto K 0 with a starting vector x if and only if it minimizes the A-norm of the error e = x∗ − x over x0 + K , i.e., kx∗ − x e kA =

min kx∗ − xkA .

x∈x0 +K

(3)

Consider two vectors: d and e. Let K = span{d} and L = span{e}. Then x e = x0 + αd (δ = αd) and the orthogonality condition reads as: r0 −Aδ ⊥ e ⇒ (r0 −Aδ, e) = 0 ⇒ α(Ad, e) = (r0 , e) ⇒ α =

(r0 , e) . (Ad, e)

If d = e - the Steepest Descent method (minimization on a line). If we minimize over a plane - ORTHOMIN.

The error decreases monotonically in the A-norm. Error-projection methods.

TDB − NLA

TDB − NLA – p. 19/64

– p. 20/64

Choice of K : K = Km (A, v) = {v, Av, A2 v, · · · , Am−1 v}

How to construct a basis for K? CG

Krylov subspace methods L = K = Km (A, r0 ) and A spd ⇒ CG L = AK = AKm (A, r0 ) ⇒ GMRES

A question to answer: Why are Krylov subspaces of interest?

TDB − NLA

TDB − NLA – p. 21/64

– p. 22/64

Arnoldi’s method for general matrices

The result of Arnoldi’s process

Consider Km (A, v) = {v, Av, A2 v, · · · , Am−1 v}, generated by some matrix A and vector v. 1. Choose a vector v1 such that kv1 k = 1 2. For j = 1, 2, · · · , m 3. For i = 1, 2, · · · , j 4. hij = (Avj , vi ) 5. End j P 6. wj = Avj − hij vi

V m = {v1 , v2 , · · · , vm } is an orthonormal basis in Km (A, v) AV m = V m H m + wm+1 eTm

m+1

w *

A

i=1

7. 8. 9. TDB − NLA 10.

hj+1,j = kwj k If hj+1,j = 0, stop vj+1 = wj /hj+1,j End

(n,n) TDB − NLA – p. 23/64

*

V

m

(n,m)

=

V

m

H

m

(em)T

*

+

(1,m)

(m,m)

(n,m) (n,1) – p. 24/64

Arnoldi’s process - example

Arnoldi’s method for symmetric matrices

 (Av1 , v1 ) (Av2 , v1 ) (Av3 , v1 )   H 3 =  kw1 k (Av2 , v2 ) (Av3 , v2 ) 0 kw2 k (Av3 , v3 )

Let now A be real symmetric matrix. Then the Arnoldi method reduces to the Lanczos method. Recall: H m = (V m )T AV m If A is symmetric, then H m must be symmetric too, i.e., H m is three-diagonal

Since V m+1 ⊥ {v1 , v2 , · · · , vm } then it follows that (V m )T AV m = H m . H m is an upper-Hessenberg matrix.

H

γ1

 β  2 =   

m

β2 γ2

β3 .. . βm

γm

Thus, the vectors vj satisfy a three-term recursion:

TDB − NLA

      

βi+1 vi+1 = Avi − γi vi − βi vi−1

TDB − NLA – p. 25/64

– p. 26/64

Lanczos algorithm to solve symmetric linear systems Given: Compute Set For

End Set Compute TDB − NLA

Direct Lanczos: the factorization of Tm

x(0) r(0) = b − Ax(0) , β = kr(0) k, v1 = r(0) /β β1 = 0 and v0 = 0 j=1:m wj = Avj − βj vj−1 γj = (wj , vj ) w j = w j − γj v j βj+1 = kwj k2 , if βj+1 = 0, go out of the loop vj+1 = wj /βj+1 Tm = tridiag{βi , γi , βi+1 } −1 (βe ) y m = Tm 1 m 0 m x = x + V ym

The coefficients on the direct Lanczos algorithm correspond to the following factorization of Tm :

Tm

γ 1 β2  β2 γ 2 =  

β3 .. . βm γ m

1     λ2 1 =    

..

 .

λm 1

η1 β 2   η2   

β3 .. . ηm

    

where

TDB − NLA – p. 27/64

i i=1 i = 2, · · · m

λi λi = βi /ηi−1

ηi η1 = γ 1 ηi = γi − λi βi−1 – p. 28/64

Direct Lanczos

Projection-type methods continued

Instead of factorizing at the end, Gauss factorization without pivoting can be performed while constructing T . Recall xm = x0 + V m L−T L−1 βe1 and let G = V m L−T and z = L−1 βe1 Compute

r(0) = b − Ax(0) , ξ1 = β = kr(0) k, v1 = 1/β r(0)

Set

λ1 = 1, β1 = 1, g0 = 0, β1 = 0 and v0 = 0

For

j = 1, 2, · · · until convergence

Recall: Arnoldi/Lanczos process

w = Avj − βj vj−1

The Conjugate Gradient method - derivation, properties and convergence

if j > 1, λj = βj /ηj−1 , ξj = −λj ξj−1

The GMRES method - derivation, properties and convergence

gj = (ηj )−1 (vj − βj gj−1 )

The Generalized Conjugate Gradient method derivation, properties and convergence

γj = (w, vj )

ηj = γ j − λ j β j

xj = xj−1 + ξj gj , stop if convergence is reached w = w − γj v j βj+1 = kwj k;

vj+1 = w/βj+1

TDB − NLA

End

TDB − NLA – p. 29/64

– p. 30/64

Arnoldi’s method for general matrices

How to construct a basis for K?

Consider Km (A, v) = {v, Av, A2 v, · · · , Am−1 v}, generated by some matrix A and a vector v. 1. 2. 3.

Choose a vector v(1) such that kv(1) k = 1

For k = 1, 2, · · · , m

For i = 1, 2, · · · , k

hik = (Av(k) , v(i) )

4.

TDB − NLA

TDB − NLA – p. 31/64

5.

End

6.

w(k) = Av(k) −

k P

hik v(i)

i=1

7.

hk+1,k = kw(k) k

8.

If hk+1,k = 0, stop

9.

v(k+1) = w(k) /hk+1,k

10.

End

Memory demands: we keep all vectors v(k) and Av(k) , k = 1, · · · , m.

– p. 32/64

The result of Arnoldi’s process

The result of Arnoldi’s process, cont.

V m = {v(1) , v(2) , · · · , v(m) } is an orthonormal basis in Km (A, v)

V m = {v(1) , v(2) , · · · , v(m) } is an orthonormal basis in Km (A, v)

AV m = V m H m + wm+1 eTm

m+1

w *

H

m

AV m = V m+1 H

(em)T

*

+

*

V

m

=

V

m

(1,m)

(m,m) (n,n)

(n,n)

(n,m)

m − H

*

A A

m

*

V

m

(n,m)

=

V

m+1

(m+1,m)

(n,m+1)

(n,m) (n,1)

TDB − NLA

TDB − NLA – p. 33/64

– p. 34/64

Arnoldi’s process - example

H

(3)

Arnoldi’s method for symmetric matrices

 (Av(1) , v(1) ) (Av(2) , v(1) ) (Av(3) , v(1) )   = kw1 k (Av(2) , v(2) ) (Av(3) , v(2) ) 0 kw(2) k (Av(3) , v(3) )

For A - real symmetric, Arnoldi’s method reduces to the Lanczos method. Recall: H m = (V m )T AV m If A is symmetric, then H m must be symmetric too, i.e., H m is three-diagonal   γ 1 β2   β  γ 2 β3 2   m   H =  ..   .   βm γ m

Since V m+1 ⊥ {v(1) , v(2) , · · · , vm } then it follows that (V m )T AV m = H m . H m is an upper-Hessenberg matrix.

H

(3)

 (Av(1) , v(1) ) (Av(2) , v(1) ) (Av(3) , v(1) )  kw1 k  (Av(2) , v(2) ) (Av(3) , v(2) )  =   0 kw(2) k (Av(3) , v(3) ) 0 0 kw(3) k

Thus, the vectors v(k) satisfy a three-term recursion:

βk+1 v(k+1) = Av(k) − γk v(k) − βk v(k−1)

TDB − NLA

TDB − NLA – p. 35/64

– p. 36/64

Lanczos algorithm to solve symmetric linear systems

Arnoldi

Lanczos

1.

v(1)

2.

For k = 1, 2, · · · , m

such that

kv(1) k

=1

hik = (Av(k) , v(i) )

4. 5.

β=

kw(0) k,

v(1)

=

w(0) /β

x(0)

Compute

r(0) = b − Ax(0) , β = kr(0) k, v(1) = r(0) /β

Set

For k = 1, 2, · · · , m

For i = 1, 2, · · · , k

3.

w(0) ,

Given:

For

γk = (w(k) , v(k) )

w(k) = w(k) − γk v(k)

γk = (w(k) , v(k) )

(k) P

6.

w(k) = Av(k) −

7.

hk+1,k = kw(k) k

i=1

hik v(i)

βk+1 = kw(k) k, if βk+1 = 0, go out of the loop

w(k) = w(k) − γk v(k)

v(k+1) = w(k) /βk+1

βk+1 = kw(k) k

End

8.

If hk+1,k = 0, stop

if βk+1 = 0, stop

Set

9.

v(k+1) = w(k) /hk+1,k

v(k+1) = w(k) /βk+1

Compute

10.

End

End

Tm = tridiag{βk , γk , βk+1 } −1 ym = T m (βe(1) )

xm = x0 + V m ym

Set Tm = tridiag{βk , γk , βk+1 }

TDB − NLA

k=1:m w(k) = Av(k) − βk v(k−1)

w(k) = Av(k) − βk v(k−1)

End

β1 = 0 and v(0) = 0

TDB − NLA – p. 37/64

– p. 38/64

Lanczos algorithm to solve symmetric

Lanczos algorithm to solve symmetric

linear systems

linear systems

Leads to three-term CG. To solve, factor first Tm = LLT and then xm = x(0) + V m L−T L−1 βe(1)

Given: Compute Set For

End Set Compute TDB − NLA

TDB − NLA – p. 39/64

x(0) r(0) = b − Ax(0) , β = kr(0) k, v(1) = r(0) /β β1 = 0 and v(0) = 0 k=1:m w(k) = Av(k) − βk v(k−1) γk = (w(k) , v(k) ) w(k) = w(k) − γk v(k) βk+1 = kw(k) k, if βk+1 = 0, go out of the loop v(k+1) = w(k) /βk+1 Tm = tridiag{βk , γk , βk+1 } −1 (βe(1) ) ⇐= ⇐= ⇐= Why is that? y m = Tm x m = x 0 + Vm y m – p. 40/64

Lanczos algorithm to solve symmetric linear systems Compute

−1 (βe(1) ) ⇐= ⇐= ⇐= Why is that? y m = Tm x m = x 0 + Vm y m

The CG method

Recall: y = (W T AV )−1 W T r(0) . Here W = V , AV m = V m Hm and Hm ≡ Tm . Thus, W T AV = Tm and W T r(0) = βV T v(1) = βe(1) due to the orthogonality of the columns of V .

TDB − NLA

TDB − NLA – p. 41/64

– p. 42/64

Observations regarding CG: (1)

Observations regarding CG: (1), cont.

Relation 1: The residuals are orthogonal to each other. Proof: We have r(m) = b − Ax(m) . Then b − Ax(m) = −βm+1 em y(m) v(m+1) = const v(m+1) .

Thus, r(m) is collinear with v(m+1) . Since vj are orthogonal to each other, then the residuals are also mutually orthogonal, i.e., (r(k) , r(m) ) = 0 for k 6= m.

−1 (βe ) and To see the latter, recall that ym = Tm 1 −1 βe x(m) = x(0) + V m y(m) = x(0) + V m Tm 1 Then, −1 βeT b − Ax(m) = b Ax(0)} −AV m Tm 1 | −{z βv1 1 βv − (V m H

(m) + h T (m) v(m+1) ) = my m+1,m em y 1 m T T (m) (m+1) v = βv − V βe1 − hm+1,m em y | {z } | {z } 0

TDB − NLA

const

– p. 43/64

TDB − NLA – p. 44/64

Observations re. CG: (2)

Derivation of the CG method:

Denote G = V m L−T , G = {g1 , g2 , · · · , gm }. Relation 2: The vectors gj are A-conjugate, i.e., (Agi , gj ) = 0 for i 6= j . Proof: (V m )T AV m = Tm = LLT (V m )T A |V m{z L−T} = L T G AG} = | {z

G L−T (V m )T AV m V m L−T

=

symmetric

Thus, L−T L must be diagonal.

(i) x(k+1) = x(k) + τk g(k) b − Ax(k+1) r(k+1) Ag(k) 0

= b − Ax(k) − τk Ag(k) = r(k) − τk Ag(k) = τ1k (r(k) − r(k+1) ) = (r(k) , r(k) ) − τk (Ag(k) , r(k) )

⇒ τk =

−T |L {z L}

(r(k) ,r(k) ) (Ag(k) ,r(k) )

lowertriang.

TDB − NLA

TDB − NLA – p. 45/64

– p. 46/64

Derivation of the CG method (cont):

Derivation of the CG method, cont.: (ii) g(k+1) = r(k+1) + βk g(k) Why is this so? From the algorithm we have that g(k+1) = c1 v(k+1) + c2 g(k) for some constants c1 , c2 . We get (ii) after a proper scaling. Then (Ag(k+1) , g(k+1) ) = (Ag(k+1) , r(k+1) ) + βk (Ag(k+1) , g(k) ) | {z }

Rewrite the CG algorithm using the above relations: Initialize: For

x

βk =

(g(k) ,Ag(k) )

=

(r(k+1) , τ1 (r(k) −r(k+1) )) k

(g(k) ,Ag(k) )

=−

(r(k) ,r(k) ) (Ag(k) ,g(k) ) (k+1) (k)

=x

+ τk g(k)

r(k+1) = r(k) + τk Ag(k)

(r(k) ,r(k) ) (Ag(k) ,g(k) )

(r(k+1) ,Ag(k) )

k = 0, 1, · · · , until convergence τk =

0

⇒ τk =

r(0) = Ax(0) − b, g(0) = r(0)

(r(k+1) ,r(k+1) ) (r(k) ,r(k) ) (k+1) (k+1)

βk =

(r(k+1) ,r(k+1) )

g

(r(k) ,r(k) )

=r

+ βk g(k)

end r(k) – iteratively computed residuals g(k) – search directions Note: the coefficients βk are different from those in the Lanczos method.

TDB − NLA

TDB − NLA – p. 47/64

– p. 48/64

CG: computer implementation

Optimality properties of the CG method Opt1: Mutually orthogonal search directions: (g(k+1) , Agj ) = 0, j = 0, · · · , k

x = x0 r = A*x-b delta0 = (r,r) g = -r Repeat: h = A*g tau = delta0/(g,h) x = x + tau*g r = r + tau*h delta1 = (r,r) if delta1 <= eps, stop beta = delta1/delta0 g = -r + beta*g

Opt2: There holds r(k+1) ⊥ Km (A, r(0) , i.e., (r(k+1) , Ar(k) ) = 0, j = 0, · · · , k

Opt3: Optimization property: kr(k) k smallest possible at any step, since CG minimizes the functional f (x) = 1/2(x, Ax) − (x, b) j

Opt4: (e(k+1),Ag ) = (g(k+1) , Agj ) = (r(k+1) , r(k) ) = 0, j = 0, · · · , k

TDB − NLA

TDB − NLA – p. 49/64

– p. 50/64

Optimality properties of the CG method

Connection to the matrix Tm

Opt5: Finite termination property: there are no breakdowns of the CG algorithm. Reasoning: if g(k) = 0 then τk is not defined. the vectors g(k) are computed from the formula g(k) = r(k) + βk gk−1 . Then 0 = (r(k) , g(k) ) = −(r(k) , r(k) ) + βk (r(k) , gk−1 ), ⇒ r(k) = 0, | {z }

The general form of the m-dimensional Lanczos tri-diagonal matrix Tm in terms of the CG coefficients: 

1  √τ0  β0  τ  0

Tm =     

0

i.e., the solution is already found. As soon as x(k) 6= xexact , then r(k) 6= 0 and then g(k+1) ) 6= 0. However, we can generate at most n mutually orthogonal vectors in Rn , thus, CG has a finite termination property.

TDB − NLA

β0 τ0

1 τ1

+ ·

β0 τ0

β1 τ1

·

·

βm−2 τm−2

βm−2 τm−2

1 τm−1

+

βm−2 τm−2

        

TDB − NLA – p. 51/64

– p. 52/64

Convergence analysis of the CG method Theorem: In exact arithmetic, CG has the property that xexact = x(m) for some m ≤ n, where n is the order of A. Let S = {λi , si }ni=1 be the system of eigensolutions of A. n P Let r(0) = ξi si . Then, g(k) = pk−1 (A)r(0) , where pk−1 (t) is

Demo

i=1

some polynomial of degree k − 1. Note: ek = xexact − x(k) , thus, Aek = b − Ax(k) = r(k) . ek = A−1 r(k)

CG is such that

kek k

From (∗∗) we obtain TDB − NLA

TDB − NLA

(∗∗)

min kxexact y∈x(0) +K kek kA = kr(k) kA−1

− ykA

kr(k) kA−1 =

krkA−1

A

=

min

r∈r(0) +K

– p. 53/64

– p. 54/64

Convergence of the CG method (cont)

Rate of convergence of the CG method

Let Π1k = {Pk of degree k, Pk (0) = 1} and  e = r ∈ Rm : r = Pk (r(0) ), Pk ∈ Π1 . K k e ⊂ K k (A, r(0) ) and r(0) ∈ K e . Then Clearly, K kr(k) kA−1

Theorem: Let A be symmetric and positive definite. Suppose that for some set S, containing all eigenvalues of A, for some polynomial Pe(λ) ∈ Π1 and some constant M there holds max Pe(λ) ≤ M. Then, k

kxexact − x

= min krkA−1

= =

e r∈K

min kPk (A)r(0) kA−1

Pk ∈Π1k

min

Pk ∈Π1k

(r(0) )T A−1 (P

k

λ∈S

(k)

kA ≤ M kxexact − x(0) kA .

Proof: Let S = {λi , si }n i=1 be the system of eigensolutions of A, λ1 ≤ · · · ≤ λn , (si , sj ) = δij . n P ξi si , ξi = (si , r(0) ). r(0) = i=1

1/2 (A))2 r(0)

Then,

(r(0) )T A−1 (Pk (A))2 r(0) = ⇒ kr(0) kA−1 = min

n P

Pk ∈Π1 k i=1

Recall: (Pk (A))T A−1 Pk (A) = A−1 (Pk (A))2 .

⇒ kr(k) kA−1 ≤ M 2

TDB − NLA

n P

i=1

n P

i=1

2 ξi2 λ−1 i Pk (λi )

2 ξi2 λ−1 i Pk (λi )

= M 2 kr(0) kA−1 ξi2 λ−1 i

TDB − NLA – p. 55/64

– p. 56/64

Rate of convergence (cont)

Rate of convergence (cont): Repeat: max Pek (λ) = min max |Pk (λ)|

To quantify M , we seek a polynomial Pek ∈ Π1k , such that

λ∈IS

M = max Pek (λ)

λ∈IS Pk ∈Π1 k

The solution of the latter problem is given by the polynomial

λ∈S

Pek (λ) =

is small. In this way, the convergence estimate is replaced by a polynomial approximation problem, which is well known. For an s.p.d. matrix A and IS = [λ1 , λn ] find a polynomial Pek ∈ Π1k such that TDB − NLA



Tk

λn +λ1 −2λ λn −λ1



λn +λ1 λn −λ1





where Tk (z) = 21 (z k + z −1 ) are the Chebyshev polynomials of degree k . Moreover,

max Pek (λ) = min max |Pk (λ)|

max |Pk (λ)| =

Pk ∈Π1k λ∈IS

λ∈IS

Tk

λ∈IS

TDB − NLA

Tk



1 λn +λ1 λn −λ1

.

– p. 57/64

– p. 58/64

Rate of convergence (cont):

Rate of convergence (cont) Repeat:

Thus, we obtain the following estimate: k

ke kA ≤

Since for any z,

z+1 Tk ( z−1 )

Tk

=



1 2

k

1

0

λn +λ1 λn −λ1

 √

 ke kA =

z+1 √ z−1

k

"p

ke kA ≤ 2 p

+

Tk

√

z−1 √ z+1

κ(A) − 1 κ(A) + 1



1 κ(A)+1 κ(A)−1

k 

#k

>

kek kA ≤ 2

0

1 2

 ke kA

√

z+1 √ z−1

k



κ(A) − 1 κ(A) + 1

k

ke0 kA

Seek now the smallest k, such that kek kA ≤ εke0 kA

, we want

0

ke kA

⇒ k ln

√

κ+1 √ κ−1

√ √

κ+1 κ−1

k



⇒ k > ln( 2ε )/ln

>

2 ε

> ln( 2ε )

√

κ+1 √ κ−1



= ln( 2ε )/ln



 √ 1+( κ)−1 √ 1−( κ)−1

We are on the safe side if   √ √ 1+( κ)−1 k > 21 κ ln( 2ε ) > ln( 2ε )/ln 1−(√κ)−1

1+ǫ ) > 2ǫ for small ǫ. Note: ln( 1−ǫ TDB − NLA

TDB − NLA – p. 59/64

– p. 60/64

Alternative view-point

Alternative view-point, cont. F (x) = 21 (x, Ax) − (x, x) + c0

Let f (x) be a vector function and we restrict x to be of the form x = x + τ d. We pose the problem to minimize f (x) for such choice of x. Since x∗ + τ d is a line, d is called a search direction and the process is called line search. Consider the special vector function f ∗ (x) = (x∗ − x, A(x∗ − x)). The minimum of f ∗ (x) coincides with the minimum of f (x) = f ∗ (x) + C, where C is constant. For instance, we can take C = − 12 (b, x∗ ) + c0 . Then f (x)

=

1 ∗ f (x) 2

=

1 (x∗ 2

=

1 (x∗ , Ax∗ ) 2

=

1 (x, Ax) 2

We decide to compute the minimization problem for F (x) and to do it iteratively, locally per iteration, performing a line search, namely, we seek x(k+1) = x(k) + τk dk such that F will be minimized. How to choose τk and dk ?

− 12 (b, x∗ ) + c0

− x, A(x∗ − x)) − 12 (b, x∗ ) + c0 − 22 (x, Ax∗ ) + 21 (x, Ax) − 12 (b, x∗ ) + c0

− (x, x) + c0 ≡ F (x)

Thus, the minimizer of f (x) and that of F (x) coincide, provided that x∗ is the exact TDB − NLA

TDB − NLA

solution of Ax = b. – p. 61/64

Alternative view-point, cont.

– p. 62/64

Alternative view-point, cont. Theorem 2: Among all search directions d at some point x, F descents most rapidly for d = ∇F . Proof: We want to minimize the directional derivative of F at x over all possible search directions. The (first) directional derivative in direction y at x is defined as follows:

Theorem 1: Let F (x) ∈ C 1 )Rn ) and let ∇F be the gradient of F at some point x. If (∇F, d) < 0, then d is a descent direction for F at x. Proof: Descent direction: F (x + τ d) ≤ F (x) for 0 ≤ τ ≤ τ0

n

X ∂F dF = yi = (∇F, y). dy ∂xi i=1

F (x + τ d) = F (x) + τ (∇F, d) +O(τ ) | {z } <0

Let y be arbitrary, |yk = 1.

Thus, τ can be chosen small enough so that τ (∇F, d) + O(τ ) < 0

|(∇F, y)| ≤ k∇F kkyk = k∇F k

TDB − NLA – p. 63/64

Thus, there holds |(∇F, y)| ≥ −k∇F k. For the special choice y = −∇F/k∇F k TDB − NLA (∇F, −∇F/k∇F k) = −k∇F k.

we obtain – p. 64/64