Optimization Methods for Circuit Design

Technische Universit at Munchen Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation Optimizatio...

0 downloads 137 Views 2MB Size
Technische Universit¨at M¨unchen Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation

Optimization Methods for Circuit Design Compendium H. Graeb

Version 2.8 Version 2.0 - 2.7 Version 1.0 - 1.2

(WS 12/13) Michael Zwerger (WS 08/09 - SS 12) Michael Eick (SS 07 - SS 08) Husni Habal

Presentation follows: H. Graeb, Analog Design Centering and Sizing, Springer, 2007. R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2nd Edition, 2000.

Status: October 12, 2012

Copyright 2008 - 2012 Optimization Methods for Circuit Design Compendium H. Graeb Technische Universit¨at M¨ unchen Institute for Electronic Design Automation Arcisstr. 21 80333 Munich, Germany [email protected] Phone: +49-89-289-23679 All rights reserved.

Contents 1 Introduction

1

1.1

Parameters, performance, simulation . . . . . . . . . . . . . . . . . . . . .

1

1.2

Performance specification . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.3

Minimum, minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.4

Unconstrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.5

Constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.6

Classification of optimization problems . . . . . . . . . . . . . . . . . . . .

4

1.7

Classification of constrained optimization problems . . . . . . . . . . . . .

4

1.8

Structure of an iterative optimization process . . . . . . . . . . . . . . . .

5

1.8.1

. . . without constraints . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.8.2

. . . with constraints . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.8.3

Trust-region approach . . . . . . . . . . . . . . . . . . . . . . . . .

7

2 Optimality conditions 2.1

2.2

9

Optimality conditions – unconstrained optimization . . . . . . . . . . . . .

9

2.1.1

Necessary first-order condition for a local minimum of an unconstrained optimization problem . . . . . . . . . . . . . . . . . . . . . 10

2.1.2

Necessary second-order condition for a local minimum of an unconstrained optimization problem . . . . . . . . . . . . . . . . . . . . . 11

2.1.3

Sufficient and necessary conditions for second-order derivative ∇2 f (x? ) to be positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Optimality conditions – constrained optimization . . . . . . . . . . . . . . 13 2.2.1

Feasible descent direction r . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2

Necessary first-order conditions for a local minimum of a constrained optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3

Necessary second-order condition for a local minimum of a constrained optimization problem . . . . . . . . . . . . . . . . . . . . . 15

2.2.4

Sensitivity of the optimum with regard to a change in an active constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Optimization of Analog Circuits

i

3 Unconstrained optimization 3.1

3.2

3.3

17

Univariate unconstrained optimization, line search . . . . . . . . . . . . . . 17 3.1.1

Wolfe-Powell conditions . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2

Backtracking line search . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.3

Bracketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.4

Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.5

Golden Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.6

Line search by quadratic model . . . . . . . . . . . . . . . . . . . . 26

3.1.7

Unimodal function . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Multivariate unconstrained optimization without derivatives . . . . . . . . 29 3.2.1

Coordinate search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.2

Polytope method (Nelder-Mead simplex method) . . . . . . . . . . 31

Multivariate unconstrained optimization with derivatives . . . . . . . . . . 34 3.3.1

Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2

Newton approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.3

Quasi-Newton approach . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.4

Levenberg-Marquardt approach (Newton direction plus trust region) 37

3.3.5

Least-squares (plus trust-region) approach . . . . . . . . . . . . . . 38

3.3.6

Conjugate-gradient (CG) approach . . . . . . . . . . . . . . . . . . 40

4 Constrained optimization – problem formulations 4.1

4.2

47

Quadratic Programming (QP) . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.1

QP – linear equality constraints . . . . . . . . . . . . . . . . . . . . 47

4.1.2

QP - inequality constraints . . . . . . . . . . . . . . . . . . . . . . . 51

4.1.3

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Sequential Quadratic programming (SQP), Lagrange–Newton . . . . . . . 54 4.2.1

SQP – equality constraints . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.2

Penalty function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Optimization of Analog Circuits

ii

5 Statistical parameter tolerances

57

5.1

Univariate Gaussian distribution (normal distribution) . . . . . . . . . . . 58

5.2

Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3

Transformation of statistical distributions 5.3.1

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Expectation values and their estimators 6.1

6.2

. . . . . . . . . . . . . . . . . . 62

65

Expectation values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.1

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.2

Linear transformation of expectation value . . . . . . . . . . . . . . 66

6.1.3

Linear transformation of variance . . . . . . . . . . . . . . . . . . . 66

6.1.4

Translation law of variances . . . . . . . . . . . . . . . . . . . . . . 66

6.1.5

Normalizing a random variable . . . . . . . . . . . . . . . . . . . . 66

6.1.6

Linear transformation of a normal distribution . . . . . . . . . . . . 67

Estimation of expectation values . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.1

Expectation value estimator . . . . . . . . . . . . . . . . . . . . . . 68

6.2.2

Variance estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.3

Variance of the expectation value estimator . . . . . . . . . . . . . 69

6.2.4

Linear transformation of estimated expectation value . . . . . . . . 70

6.2.5

Linear transformation of estimated variance . . . . . . . . . . . . . 70

6.2.6

Translation law of estimated variance . . . . . . . . . . . . . . . . . 70

Optimization of Analog Circuits

iii

7 Worst-case analysis

71

7.1

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.2

Typical tolerance regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.3

Classical worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.4

7.5

7.3.1

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.3.2

Linear performance model . . . . . . . . . . . . . . . . . . . . . . . 73

7.3.3

Peformance type f ↑ ”good”, specification type f > fL . . . . . . . 74

7.3.4

Performance type f ↓ ”good”, specification type f < fU . . . . . . . 75

Realistic worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.4.1

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.4.2

Peformance type f ↑ ”good”, specification type f ≥ fL : . . . . . . . 77

7.4.3

Performance type f ↓ ”good”, specification type f ≤ fU . . . . . . . 78

General worst-case analysis

. . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.1

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.2

Peformance type f ↑ ”good”, specification type f ≥ fL : . . . . . . . 80

7.5.3

Performance type f ↓ ”good”, specification type f ≤ fU : . . . . . . 81

7.5.4

General worst-case analysis with tolerance box . . . . . . . . . . . . 81

7.6

Input/output of worst-case analysis . . . . . . . . . . . . . . . . . . . . . . 82

7.7

Summary of discussed worst-case analysis problems . . . . . . . . . . . . . 84

Optimization of Analog Circuits

iv

8 Yield analysis 8.1

8.2

8.3

8.4

85

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.1.1

Acceptance function . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.1.2

Parametric yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Statistical yield analysis/Monte-Carlo analysis . . . . . . . . . . . . . . . . 87 8.2.1

Variance of yield estimator . . . . . . . . . . . . . . . . . . . . . . . 88

8.2.2

Estimated variance of yield estimator . . . . . . . . . . . . . . . . . 88

8.2.3

Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Geometric yield analysis for linearized performance feature (”realistic geometric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.3.1

Yield partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.3.2

Defining worst-case distance βW L as difference from nominal performance to specification bound as multiple of standard deviation σf of linearized performance feature (βW L -sigma design) . . . . . . 93

8.3.3

Yield partition as a function of worst-case distance βW L . . . . . . . 93

8.3.4

Worst-case distance βW L defines tolerance region

8.3.5

specification type f ≤ fU . . . . . . . . . . . . . . . . . . . . . . . . 95

. . . . . . . . . . 94

Geometric yield analysis for nonlinear performance feature (”general geometric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.4.1

Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.4.2

Advantages of geometric yield analysis . . . . . . . . . . . . . . . . 97

8.4.3

Lagrange function and first-order optimality conditions of problem (336) 98

8.4.4

Lagrange function of problem (337) . . . . . . . . . . . . . . . . . . 98

8.4.5

Second-order optimality condition of problem (336) . . . . . . . . . 99

8.4.6

Worst-case distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.4.7

Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.4.8

Overall yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.4.9

Consideration of range parameters

Optimization of Analog Circuits

v

. . . . . . . . . . . . . . . . . . 102

9 Yield optimization/design centering/nominal design

103

9.1

Optimization objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

9.2

Derivatives of optimization objectives . . . . . . . . . . . . . . . . . . . . . 104

9.3

Problem formulations of analog optimization . . . . . . . . . . . . . . . . . 105

9.4

Analysis, synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.5

Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.6

Nominal design, tolerance design . . . . . . . . . . . . . . . . . . . . . . . 106

9.7

Optimization without/with constraints . . . . . . . . . . . . . . . . . . . . 107

10 Sizing rules for analog circuit optimization

109

10.1 Single (NMOS) transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 10.1.1 Sizing rules for single transistor that acts as a voltage-controlled current source (VCCS) . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.2 Transistor pair: current mirror (NMOS) . . . . . . . . . . . . . . . . . . . 111 10.2.1 Sizing rules for current mirror . . . . . . . . . . . . . . . . . . . . . 111 A Matrix and vector notations

113

A.1 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.2 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.5 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.6 Determinant of a quadratic matrix . . . . . . . . . . . . . . . . . . . . . . 116 A.7 Inverse of a quadratic non-singular matrix . . . . . . . . . . . . . . . . . . 116 A.8 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 B Abbreviated notations of derivatives using the nabla symbol

119

C Norms

121

Optimization of Analog Circuits

vi

D Pseudo-inverse, singular value decomposition (SVD)

123

D.1 Moore-Penrose conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 D.2 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 124 E Linear equation system, rectangular system matrix with full rank

125

E.1 underdetermined system of equations . . . . . . . . . . . . . . . . . . . . . 125 E.2 overdetermined system of equations . . . . . . . . . . . . . . . . . . . . . . 126 E.3 determined system of equations . . . . . . . . . . . . . . . . . . . . . . . . 127 F Partial derivatives of linear, quadratic terms in matrix/vector notation129 G Probability space

131

H Convexity

133

H.1 Convex set K ∈ Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 H.2 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Optimization of Analog Circuits

vii

Optimization of Analog Circuits

viii

1 1.1

Introduction Parameters, performance, simulation

design parameters

xd ∈ Rnxd

e.g. transistor widths, capacitances

statistical parameters

xs ∈ Rnxs

e.g. oxide thickness, threshold voltage

range parameters

xr ∈ Rnxr

e.g. operational parameters:

(circuit) parameters

 T x = xTd xTs xTr

performance feature

fi

supply voltage, temperature

e.g. gain, bandwidth, slew rate, phase margin, delay, power

(circuit) performance

f = [· · · fi · · · ]T ∈ Rnf

(circuit) simulation

x 7−→ f (x)

e.g. SPICE abstraction from physical level !

A design parameter and a statistical parameter may refer to the same physical parameter. E.g., an actual CMOS transistor width is the sum of a design parameter Wk and a statistical parameter ∆W . Wk is the specific width of transistor Tk while ∆W is a width reduction that varies globally and equally for all the transistors on a die. A design parameter and a statistical parameter may be identical.

1.2

Performance specification

performance specification feature (upper or lower limit on a performance): fi ≥ fL,i or fi ≤ fU,i

(1)

number of performance specification features: nf ≤ nP SF ≤ 2nf performance specification:     fL,1 ≤ f1 (x) ≤ fU,1 .. .    f ≤ f (x) ≤ f L,nf

nf

Optimization of Analog Circuits

   

U,nf

1

  

⇐⇒ fL ≤ f (x) ≤ fU

(2)

(3)

f (a)

strong local minimum

weak local minimum

global minimum x

(b) f

x

Figure 1. Smooth function (a), i.e. continuous and differentiable at least several times on a closed region of the domain. Non-smooth continuous function (b).

1.3

Minimum, minimization ≡

without loss of generality: optimum

minimum

max f ≡ − min −f

because: ( min ≡

minimum, i.e., a result minimize,

(5)

i.e., a process !

1.4

(4)

min f (x) ≡ f (x) → min

(6)

min f (x) −→ x∗ , f (x∗ ) = f ∗

(7)

Unconstrained optimization f∗ = x∗ =

min f (x)



argmin f (x) ≡

Optimization of Analog Circuits

min f x



argmin f ≡ x

2

min f (x) x



min{f (x)}

argmin f (x) ≡ argmin{f (x)} x

(8)

1.5

Constrained optimization

min f (x) s.t.

ci (x) = 0 , i ∈ E ci (x) ≥ 0 , i ∈ I

(9)

E: set of equality constraints I: set of inequality constraints Alternative formulations min f s.t. x ∈ Ω

(10)

min f

(11)

min {f (x) | x ∈ Ω}

(12)

x

x∈Ω

where, ) ( c (x) = 0, i ∈ E i Ω = x ci (x) ≥ 0, i ∈ I Lagrange function combines objective function and constraints in a single expression L(x, λ) = f (x) −

X

λi · ci (x)

i∈E∪I

λi : Lagrange multiplier associated with constraint i

Optimization of Analog Circuits

3

(13)

1.6

Classification of optimization problems

deterministic, stochastic

The iterative search process is deterministic or random.

continuous, discrete

Optimization variables can take an infinite number of values, e.g., the set of real numbers, or take a finite set of values or states.

local, global

The objective value at a local optimal point is better than the objective values of other points in its vicinity. The objective value at a global optimal point is better than the objective values of any other point.

scalar, vector

In a vector optimization problem, multiple objective functions shall be optimized simultaneously (multiplecriteria optimization, MCO). Usually, objectives have to be traded off with each other. A Pareto-optimal point is characterized in that one objective can only be improved at the cost of another. Pareto optimization determines the set of all Pareto-optimal points. Scalar optimization refers to a single objective. A vector optimization problem is scalarized by combining the multiple objectives into a single overall objective, e.g., by a weighted sum, least-squares, or min/max.

constrained, unconstrained

Besides the objective function that has to be optimized, constraints on the optimization variables may be given as inequalities or equalities.

with or without derivatives

The optimization process may be based on gradients (first derivative) or on gradients and Hessians (second derivative), or it may not require any derivatives of the objective/constraint functions.

1.7

Classification of constrained optimization problems objective function

constraint functions

linear



linear

linear programming

quadratic



linear

quadratic programming

nonlinear



nonlinear

nonlinear programming

linear equality

convex programming

constraints

(local ≡ global minimum)

convex

concave inequality constraints

Optimization of Analog Circuits

4

1.8

Structure of an iterative optimization process

1.8.1

. . . without constraints

Taylor series of a function f about iteration point x(κ) :  T f (x) = f (x(κ) ) + ∇f x(κ) · x − x(κ) T   1 + x − x(κ) · ∇2 f x(κ) · x − x(κ) + . . . (14) 2  1 T  T = f (κ) + g(κ) · x − x(κ) + x − x(κ) · H(κ) · x − x(κ) + . . . (15) 2 f (κ)

:

value of function f at point x(κ)

g(κ)

:

gradient (first derivative, direction of steepest ascent) at point x(κ)

H(κ) :

Hessian matrix (second derivative) at point x(κ)

Taylor series about search direction r starting from point x(κ) : x(r) = x(κ) + r 1 f (x(κ) + r) ≡ f (r) = f (κ) + g(κ)T · r + rT · H(κ) · r + . . . 2

(16) (17)

Taylor series about step length along search direction r(κ) starting from point x(κ) : x(α) = x(κ) + α · r(κ)

(18)

1 (κ) T r · H(κ) · r(κ) · α2 + . . . (19) 2 1 + ∇f (α = 0) · α + ∇2 f (α = 0) · α2 + . . . (20) 2 T

f (x(κ) + α · r(κ) ) ≡ f (α) = f (κ) + g(κ) · r(κ) · α + = f (κ) ∇f (α = 0)

:

∇2 f (α = 0) :

slope of f along direction r(κ) curvature of f along r(κ)

repeat determine the search direction r(κ) determine the step length α(κ) (line search) x(κ+1) = x(κ) + α(κ) · r(κ) κ := κ + 1 until termination criteria are fulfilled Steepest-descent approach search direction: direction of steepest descent, i.e., r(κ) = −g(κ)

Optimization of Analog Circuits

5

x?

x0

Figure 2. Visual illustration of the steepest-descent approach for Rosenbrock’s function f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 . A backtracking line search is applied (see Sec. 3.1.2, page 19) with an initial x(0) = [−1.0, 0.8]T and α(0) = 1, α := c3 · α. The search terminates when the Armijo condition is satisfied with c1 = 0.7, c3 = 0.6.

Optimization of Analog Circuits

6

1.8.2

. . . with constraints

• Constraint functions and objective functions are combined in an unconstrained optimization problem in each iteration step – Lagrange formulation – penalty function – Sequential Quadratic Programming (SQP) • Projection on active constraints, i.e. into subspace of an unconstrained optimization problem in each iteration step – active-set methods

1.8.3

Trust-region approach

model of the objective function: f (x) ≈ m x(κ) + r



 min m x(κ) + r s.t. r ∈ trust region r

e.g., krk < ∆ search direction and step length are computed simultaneously trust region to consider the model accuracy

Optimization of Analog Circuits

7

(21) (22)

Optimization of Analog Circuits

8

2 2.1

Optimality conditions Optimality conditions – unconstrained optimization

Taylor series of the objective function around the optimum point x? : f (x) = f (x? ) + ∇f (x? )T · (x − x? ) | {z } | {z } f?

g?T

+

1 (x − x? )T · ∇2 f (x? ) · (x − x? ) + . . . | {z } 2

(23)

H?

f?

:

value of the function at the optimum x?

g?

:

gradient at the optimum x?

H? :

Hessian matrix at the optimum x?

For x = x? + r close to the optimum: 1 f (r) = f ? + g? T · r + rT · H? · r + . . . 2

(24)

x? is optimal ⇐⇒ there is no descent direction, r, such that f (r) < f ? .

x2

∇f x(κ)



gradient x(κ) direction

steepest descent direction

f (x) > f x(κ)  f (x) = f x(κ)



f (x) < f x(κ)



x1 Figure 3. Descent directions from x(κ) : shaded area.

Optimization of Analog Circuits

9

2.1.1

Necessary first-order condition for a local minimum of an unconstrained optimization problem

T

∀ g? · r ≥ 0

(25)

g? = ∇f (x? ) = 0

(26)

r6=0

x? : stationary point descent direction r: ∇f x(κ)

T

·r<0

steepest descent direction: r = −∇f x(κ)



Figure 4. Quadratic functions: (a) minimum at x? , (b) maximum at x? , (c) saddle point at x? , (d) positive semidefinite with multiple minima along trench.

Optimization of Analog Circuits

10

2.1.2

Necessary second-order condition for a local minimum of an unconstrained optimization problem

∀ rT · ∇2 f (x? ) · r ≥ 0

r6=0

⇐⇒

∇2 f (x? ) is positive semidefinite

⇐⇒

f has non-negative curvature

(27)

sufficient: ∀ rT · ∇2 f (x? ) · r > 0

r6=0

⇐⇒

∇2 f (x? ) is positive definite (28)

⇐⇒

has positive curvature

Figure 5. Contour plots of quadratic functions that are (a),(b) positive or negative definite, (c) indefinite (saddle point), (d) positive or negative semidefinite.

Optimization of Analog Circuits

11

2.1.3

Sufficient and necessary conditions for second-order derivative ∇2 f (x? ) to be positive definite

• all eigenvalues > 0 • has a Cholesky decomposition: ∇f 2 (x? ) = L · LT with lii > 0

(29)

∇f 2 (x? ) = L · D · LT with lii = 1 and dii > 0 • all pivot elements during gaussian elimination without pivoting > 0 • all principal minors are > 0

x2 ci = const unconstrained ∇f x(κ)

unconstrained descent ∇ci x

(κ)



x(κ)



f = const descent

x1

Figure 6. Dark shaded area: unconstrained directions according to (35), light shaded area: descent directions according to (34), overlap: unconstrained descent directions. When no direction satisfies both (34) and (35) then the cross section is empty and the current point is a local minimum of the function.

Optimization of Analog Circuits

12

2.2 2.2.1

Optimality conditions – constrained optimization Feasible descent direction r

∇f x(κ)

T

·r<0   T feasible direction: ci x(κ) + r ' ci x(κ) + ∇ci x(κ) · r ≥ 0  Inactive constraint: i is inactive ⇐⇒ ci x(κ) > 0 descent direction:

(30) (31)

then each r with krk <  satisfies (31), e.g.,   ci x(κ) (κ) r=− · ∇f x k∇ci (x(κ) ) k · k∇f (x(κ) ) k

(32)

(32) in (31) gives:  # T ∇ci x(κ) · ∇f x(κ) ≥0 · 1− k∇ci (x(κ) ) k · k∇f (x(κ) ) k "

ci x

 (κ)

(33)

where,  T ∇ci x(κ) · ∇f x(κ) −1 ≤ ≤1 k∇ci (x(κ) ) k · k∇f (x(κ) ) k Active constraint (Fig.  6): (κ) i is active ⇐⇒ ci x =0 then (30) and (31) become: ∇f x(κ)

T

·r<0

(34)

T

·r≥0

(35)

∇ci x(κ)

no feasible descent direction exists: no vector r satisfies both (34) and (35) at x? : ∇f (x? ) = λ?i · ∇ci (x? )

with λ?i ≥ 0

(36)

no statement about sign of λ?i in case of an equality constraint (ci = 0 ⇐⇒ ci ≤ 0∧ci ≥ 0)

Optimization of Analog Circuits

13

2.2.2

Necessary first-order conditions for a local minimum of a constrained optimization problem

x? , f ? = f (x? ) , λ? , L? = L (x? , λ? ) Karush-Kuhn-Tucker (KKT) conditions

∇L (x? ) = 0 ci (x? ) = 0 i∈E ? ci (x ) ≥ 0 i∈I ? λi ≥ 0 i∈I λ?i · ci (x? ) = 0 i ∈ E ∪ I

(37) (38) (39) (40) (41)

(37) is analogous to (26) (13) and (37) give: ∇f (x? ) −

X

λ?i · ∇ci (x? ) = 0

(42)

A (x? ) = E ∪ {i ∈ I | ci (x? ) = 0}

(43)

i∈A(x? )

A (x? ) is the set of active constraints at x?

(41) is called the complementarity condition, Lagrange multiplier is 0 (inactive constraint) or constraint ci (x? ) is 0 (active constraint). from (41) and (13): L? = f ?

Optimization of Analog Circuits

14

(44)

2.2.3

Necessary second-order condition for a local minimum of a constrained optimization problem

f (x? + r) = L (x? + r, λ? )

(45)

1 = L (x? , λ? ) + rT · ∇L (x? ) + rT · ∇2 L (x? ) ·r + · · · | {z } 2 | {z } 0 .. . .. . z }| { X 1 T ? 2 ? ? 2 ? = f + r · [∇ f (x ) − λi · ∇ ci (x )] · r + · · · 2 ?

(46)

(47)

i∈A(x )

for each feasible stationary direction r at x? , i.e.,    

  r 6= 0   Fr = r ∇ci (x? )T · r ≥ 0    ∇c (x? )T · r = 0 i

i ∈ A (x? ) \A?+   i ∈ A?+ = j ∈ A (x? ) j ∈ E ∨ λ?j > 0 

(48)



necessary: ∀ rT · ∇2 L (x? ) · r ≥ 0

(49)

∀ rT · ∇2 L (x? ) · r > 0

(50)

r∈Fr

sufficient: r∈Fr

Optimization of Analog Circuits

15

2.2.4

Sensitivity of the optimum with regard to a change in an active constraint

perturbation of an active constraint at x? by ∆i ≥ 0 ci (x) ≥ 0 → ci (x) ≥ ∆i X L (x, λ, ∆) = f (x) − λi · (ci (x) − ∆i )

(51) (52)

i

∇f ? (∆i ) = ∇L? (∆i ) 

=

=

 ∂L ∂ λ ∂L ∂L ∂x + + · · ∂xT ∂∆i ∂ λT ∂∆i ∂∆Ti x? ,λ?

|{z} |{z} 0T 0T ∂L = λ?i ∂∆i x? ,λ? ∇f ? (∆i ) = λ?i

Lagrange multiplier: sensitivity to change in an active constraint close to x?

Optimization of Analog Circuits

16

(53)

3

Unconstrained optimization

3.1

Univariate unconstrained optimization, line search

single optimization parameter, e.g., step length α  T  f (α) ≡ f x(κ) + α · r = f x(κ) + ∇f x(κ) · r ·α | {z } | {z } ∇f (α=0)

f (κ)

 1 T + r · ∇2 f x(κ) · r ·α2 + · · · 2| {z }

(54)

∇2 f (α=0)

error vector: (κ) = x(κ) − x?

(55)

lim (κ) = 0

(56)

(κ+1)





= L · (κ) p

(57)

global convergence: κ→∞

convergence rate:

p = 1: linear convergence, L < 1 p = 2: quadratic convergence exact line search is expensive, therefore only find α to: • obtain sufficient reduction in f • obtain a big enough step length

Optimization of Analog Circuits

17

3.1.1

Wolfe-Powell conditions

∇f (α = 0) f c2 · ∇f (α = 0) f (α)

c1 · ∇f (α = 0) f x(κ)



α

0

αopt (58) satisfied (59) satisfied (60) satisfied Figure 7.

• Armijo condition: sufficient objective reduction f (α) ≤ f (0) + α · c1 · ∇f x(κ) | {z

T

∇f (α=0)

·r }

(58)

• curvature condition: sufficient gradient increase T T ∇f x(κ) + α · r · r ≥ c2 · ∇f x(κ) · r | {z } | {z } ∇f (α)

(59)

∇f (α=0)

• strong curvature condition: step close to the optimum

Optimization of Analog Circuits

|∇f (α) | ≤ c2 · |∇f (α = 0) |

(60)

0 < c1 < c2 < 1

(61)

18

3.1.2

Backtracking line search

0 < c3 < 1 determine step length αt /* big enough, not too big */ WHILE αt violates (58) [αt := c3 · αt

Optimization of Analog Circuits

19

3.1.3

Bracketing

finding an interval [αlo , αhi ] that contains minimum /* take αlo as previous αhi and find a new, larger αhi until the minimum has passed by; if necessary exchange αlo , αhi such that f (αlo ) < f (αhi ) */ αhi := 0 REPEAT  αlo := αhi   determine αhi ∈ [αlo , αmax ]   IF α violates (58) OR f (α ) ≥ f (α )  hi hi lo    /* minimum has been significantly passed, because the Armijo condition      is violated or because the new objective value is larger than that at the     lower border of the bracketing interval (Fig. 8) */     GOTO sectioning    IF αhi satisfies (60)    /* strong curvature condition satisfied and objective value smaller than      at αlo , i.e., step length found (Fig. 9) */     α? = α   hi   STOP    IF ∇f (αhi ) ≥ 0    /* α has lower objective value than α , and has passed the minimum hi lo      as the objective gradient has switched sign (Fig. 10) */       exchange αhi and αlo GOTO sectioning

Optimization of Analog Circuits

20

f

α

αlo αhi is somewhere here: Figure 8.

f

α

αlo αhi is somewhere here: Figure 9.

f

α

αlo αhi is somewhere here: Figure 10.

Optimization of Analog Circuits

21

3.1.4

Sectioning

sectioning starts with interval [αlo , αhi ] with the following properties • minimum is included • f (αlo ) < f (αhi ) > • αlo <αhi • ∇f (αlo ) · (αhi − αlo ) < 0

REPEAT  determine αt ∈ [αlo , αhi ]   IF αt violates (58) OR f (αt ) ≥ f (αlo )  "  /* new α found, α remains (Fig. 11) */  hi lo   αhi := αt    ELSE /* i.e., f (αt ) ≤ f (αlo ), new αlo found */    IF α satisfies (60)   t     /* step length found */       ?    α = αt     STOP       IF ∇f (αt ) · (αhi − αlo ) ≥ 0        /* αt is on the same side from the minimum as αhi , then αlo must       become the new αhi (Fig. 12) */        αhi := αlo       /* Fig. 13 */ αlo := αt

Optimization of Analog Circuits

22

f

f

αhi αt

αlo

new interval

α new interval

αlo

αt αhi

0 αlo

0 αhi

α

Figure 11.

f

f

αhi αt new interval

0 αlo

αlo 0 αhi

α

αlo

new interval

αt αhi

0 αhi

α

0 αlo

Figure 12.

f

f

αhi

α

αt αlo

αloαt new interval

new interval 0 αhi

0 αlo

Figure 13.

Optimization of Analog Circuits

23

0 αlo

αhi 0 αhi

α

3.1.5

Golden Sectioning

golden section ratio: 1−τ τ −1 + = ⇐⇒ τ 2 + τ − 1 = 0 : τ = τ 1 2



5

≈ 0.618

/* left inner point */ α1 := L + (1 − τ ) · |R − L| /* right inner point */ α2 := L + τ · |R − L| REPEAT  IF f (α1 ) < f (α2 )    /* α? ∈ [L, α2 ], α1 is new right inner point */       R := α2     α := α 2 1    α1 := L + (1 − τ ) · |R − L|   ELSE     /* α? ∈ [α1 , R], α2 is new left inner point */      L := α1     α := α 2   1 α2 := L + τ · |R − L| UNTIL ∆ < ∆min

Optimization of Analog Circuits

24

(62)

f

L

α1

(1 − τ ) · ∆ τ ·∆



α2

R

α

τ ·∆ (1 − τ ) · ∆

∆(2) L(2) α1(2)α2(2) R(2) (1 − τ ) · ∆(2) τ · ∆(2) τ · ∆(2) (1 − τ ) · ∆(2)

Figure 14. Accuracy after (κ)-th step interval size: ∆(κ) = τ κ · ∆(0) required number of iterations κ to reduce interval from ∆(0) to ∆(κ) :     ∆(0) ∆(0) 1 · log (κ) ≈ 4.78 · log (κ) κ= − log τ ∆ ∆

Optimization of Analog Circuits

25

(63)

(64)

3.1.6

Line search by quadratic model

m, f

m(α)  f L(κ)  f R(κ)  f E (κ) f (αt ) L(κ) E (κ) αt

R(κ)

α

Figure 15. quadratic model by, e.g., • 3 sampling points • first and second-order derivatives univariate Newton approach m (α) = f0 + g · α +

1 · h · α2 2

(65)

first-order condition: !

min m (α) → ∇m (α) = g + h · αt = 0 → αt = −g/h

(66)

second-order condition: h>0

Optimization of Analog Circuits

26

(67)

3.1.7

Unimodal function

f is unimodal over interval [L, R]: there exists exactly one value αopt ∈ [L, R] for which holds: ∀

α2 < αopt → f (α1 ) > f (α2 )

(Fig. 16 (a))

(68)



α1 > αopt → f (α1 ) < f (α2 )

(Fig. 16 (b))

(69)

L<α1 <α2
and

L<α1 <α2
f

L

α1α2 αopt α1 α2 (a) (b) Figure 16.

Optimization of Analog Circuits

27

R

α

f

(a)

(b) f

x0

x

α

αopt

Figure 17. Univariate unimodal function (a). Univariate monotone function (b). Remarks minimization of univariate unimodal function: interval reduction with f (L) > f (αt ) < f (R), αt ∈ [L, R] 3 sampling points root finding of univariate monotone function: interval reduction with sign (f (L)) = −sign (f (R)) , [L, R] two sampling points 1 (κ) · ∆(0) 2     (0) 1 ∆ ∆(0) κ= − · log (κ) ≈ 3.32 · log (κ) log 2 ∆ ∆ ∆κ =

Optimization of Analog Circuits

28

(70) (71)

3.2 3.2.1

Multivariate unconstrained optimization without derivatives Coordinate search

optimize one parameter at a time, alternating coordinates ej = [0 . . . 0 1 0 . . . 0]T ↑ j-th position

REPEAT  FOR j = 1, · · · , nx    α := arg minf (x + α · ej )   t α x := x + αt · ej UNTIL convergence or maximum allowed iterations reached

good for ”losely coupled”,”uncorrelated” parameters, e.g., ∇2 f (x) diagonal matrix:

x2

x? f = const

x

1-st run of REPEAT loop

x1

Figure 18. Coordinate search with exact line search for quadratic objective function with diagonal Hessian matrix.

Optimization of Analog Circuits

29

(a)

1st run of REPEAT loop

2nd run of REPEAT loop

x?

(b)

x?

Figure 19. Coordinate search with exact line search for quadratic objective function (a). Steepest-descent method for quadratic objective function (b).

Optimization of Analog Circuits

30

3.2.2

Polytope method (Nelder-Mead simplex method)

x2 f = const x3

x2

x1

f

x1

Figure 20. nx -simplex is the convex hull of a set of parameter vector vertices xi , i = 1, · · · , nx + 1 the vertices are ordered such that fi = f (xi ), f1 ≤ f2 ≤ · · · ≤ fnx +1 Cases of an iteration step reflection f1 ≤ fr < fnx

fr < f1 expansion fe < fr ?

fnx ≤ fr < fnx +1

fnx +1 ≤ fr

outer contraction

inner contraction

fc ≤ fr ?

fcc < fnx +1 ?

insert xr

yes

no

delete xn+1

yes

xn+1 = xe

xn+1 = xr

reorder

xn+1 = xc

no

no

reduction x1 , v2 , . . . , vnx +1

Optimization of Analog Circuits

31

yes xn+1 = xcc

Reflection x2 xr x0

x3

x1

Figure 21. nx 1 X x0 = · xi nx i=1

(72)

xr = x0 + ρ · (x0 − xnx +1 )

(73)

ρ: reflection coefficient ρ > 0 (default: ρ = 1) Expansion x2 xr x3

xe

x0 x1

Figure 22. xe = x0 + χ · (xr − x0 ) χ: expansion coefficient, χ > 1, χ > ρ (default: χ = 2)

Optimization of Analog Circuits

32

(74)

Outer contraction x2 xc

xr

x0

x3

x1

Figure 23. xc = x0 + γ · (xr − x0 )

(75)

γ: contraction coefficient, 0 < γ < 1 (default: γ = 21 ) Inner contraction x2 xcc

x0

x3 x1

Figure 24. xcc = x0 − γ · (x0 − xnx +1 )

(76)

Reduction (shrink) x2 x3 v2 v3

x1

Figure 25.

vi = x1 + σ · (xi − x1 ) , i = 2, · · · , nx + 1 σ: reduction coefficient, 0 < σ < 1 (default: σ = 12 )

Optimization of Analog Circuits

33

(77)

3.3 3.3.1

Multivariate unconstrained optimization with derivatives Steepest descent

(sec. 1.8.1, page 5) search direction: negative of the gradient at the current point in the parameter space (i.e., linear model of the objective function)

3.3.2

Newton approach

quadratic model of objective function   1 T f x(κ) + r ≈ m x(κ) + r = f (κ) + g(κ) · r + rT · H(κ) · r 2  g(κ) = ∇f x(κ)  H(κ) = ∇2 f x(κ)

(78) (79)

minimize m (r) First-order optimality condition ∇m (r) = 0 → H(κ) · r = −g(κ) → r(κ) r(κ) : search direction for line seach r(κ) is obtained through the solution of a system of linear equations Second-order optimality condition ∇2 m (r) positive (semi)definite if not: • steepest descent: r(κ) = −g(κ) • switch signs of negative eigenvalues  • Levenberg-Marquardt approach: H(κ) + λ · I · r = −g(κ) (λ → ∞: steepest-descent approach)

Optimization of Analog Circuits

34

(80)

3.3.3

Quasi-Newton approach

second derivative not available → successive approximation B ≈ H from gradients in the course of the optimization process, e.g., B(0) = I   (81) ∇f x(κ+1) = ∇f x(κ) + α(κ) · r(κ) = g(κ+1) approximate g(κ+1) from quadratic model (78) g(κ+1) ≈ g(κ) + H(κ) · α(κ) · r(κ)

(82)

α(κ) · r(κ) = x(κ+1) − x(κ)

(83)

Quasi-Newton condition for approximation of B: (κ+1) (κ) g(κ+1) − g(κ) = B(κ+1) · (x {z− x }) | | {z } s(κ)

y(κ)

Optimization of Analog Circuits

35

(84)

3.3.3.1 Symmetric rank-1 update (SR1) approach: B(κ+1) = B(κ) + u · vT

(85)

substituting (85) in (84)  B(κ) + u · vT · s(κ) = y(κ) u · vT · s(κ) = y(κ) − B(κ) · s(κ) y(κ) − B(κ) · s(κ) u = vT · s(κ)

(86)

substituting (86) in (85) (κ+1)

B

(κ)

=B

 y(κ) − B(κ) · s(κ) · vT + vT · s(κ)

(87)

because of the symmetry in B: (κ+1)

B

(κ)

=B

+

T  y(κ) − B(κ) · s(κ) · y(κ) − B(κ) · s(κ) T

(y(κ) − B(κ) · s(κ) ) · s(κ)

(88)

if y(κ) − B(κ) · s(κ) = 0, then B(κ+1) = B(κ) SR1 does not guarantee positive definiteness of B alternatives: Davidon-Fletcher-Powell (DFP), Broyden-Fletcher-Goldfarb-Shannon (BFGS) approximation of B−1 instead of B or of a decomposition of the system matrix of linear equation system (80) to compute the search direction r(κ) s(κ) , κ = 1, · · · , nx linear independent and f quadratic with Hessian H: Quasi-Newton with SR1 terminates after not more than nx + 1 steps with B(κ+1) = H

Optimization of Analog Circuits

36

3.3.4

Levenberg-Marquardt approach (Newton direction plus trust region)

min m (r) s.t. krk2 ≤ ∆ |{z}

(89)

rT ·r

constraint: trust region of model m (r), e.g., concerning positive definitiveness of B(κ) , H(κ) corresponding Lagrange function:  1 L (r, λ) = f (κ) + g(κ)T · r + rT · H(κ) · r − λ · ∆ − rT · r 2

(90)

stationary point ∇L (r) = 0 :  H(κ) + 2 · λ · I · r = −g(κ)

refer to the Newton approach, H(κ) indefinite

Optimization of Analog Circuits

37

(91)

3.3.5

Least-squares (plus trust-region) approach

min kf (r) − ftarget k2 s.t. krk2 ≤ ∆ r

linear model f (r) ≈ f

(κ)

   T (r) = f x(κ) + ∇f x(κ) ·r | {z } | {z } (κ)

f0

(92)

(93)

S(κ)

∆: trust region of linear model least-squares difference of linearized objective value to target value (index (κ) left out): k(r)k2 = kf (r) − ftarget k2 = T (r) · (r) = (f0 − ftarget +S · r)T · (0 + S · r) | {z } 0 T T T = 0 · 0 + 2 · r · S · 0 + rT · ST · S · r

(94) (95) (96)

ST · S: part of second derivative of k(r)k2 min k(r)k2 s.t. krk2 ≤ ∆ r

(97)

corresponding Lagrange function:  L (r, λ) = T0 · 0 + 2 · rT · ST · 0 + rT · ST · S · r − λ · ∆ − rT · r

(98)

stationary point ∇L (r) = 0: 2 · ST · 0 + 2 · ST · S · r + 2 · λ · r = 0

(99)

 ST · S + λ · I · r = −ST · 0

(100)

λ = 0: Gauss-Newton method (100) yields r(κ) for a given λ  determine the Pareto front k(κ) r(κ) k vs. kr(κ) k for 0 ≤ λ < ∞ (Figs. 26, 27) select r(κ) with small step length and small error (iterative process): x(κ+1) = x(κ) + r(κ)

Optimization of Analog Circuits

38

 k(κ) r(κ) k

λ→∞ λ k(r(κ) )k small λ=0

error

kr(κ) k = ∆

small step length Figure 26. Characteristic boundary curve, typical sharp bend for ill-conditioned problems. Small step length wanted, e.g., due to limited trust region of linear model.

(κ)

x2 − x2

k(κ) (r) k = const

λ=0 r(κ) (∆) (κ)

λ→∞

krk = ∆

x1 − x1

Figure 27. Characteristic boundary curve of a two-dimensional parameter space.

Optimization of Analog Circuits

39

3.3.6

Conjugate-gradient (CG) approach

3.3.6.1 Optimization problem with a quadratic objective function fq (x) = bT · x +

1 T ·x ·H·x 2

min fq (x)

(101) (102)

nx large, H sparse first-order optimality conditions leads to a linear equation system for the stationary point x? (101) g (x) = ∇fq (x) = 0 → g (x) = b + H · x = 0 (103) H · x = −b



x?

(104)

x? is computed by solving linear equation system, not by matrix inversion second-order condition: ∇2 fq (x? ) = H positive definite

Optimization of Analog Circuits

40

(105)

3.3.6.2 Eigenvalue decomposition of a symmetric positive definite matrix H 1

1

H = U · D 2 · D 2 · UT

(106)

1

D 2 : diagonal matrix of square roots of eigenvalues U: columns are orthogonal eigenvectors, i.e. U−1 = UT , U · UT = I

(107)

substituting (106) in (101) fq (x) = bT · x +

T  1  1  1 · D 2 · UT · x · D 2 · UT · x 2

(108)

coordinate transformation: 1

1

x0 = D 2 · UT · x, x = U · D− 2 · x0

(109)

substituting (109) in (108) T

fq0 (x) = b0 · x0 +

1 0T 0 ·x ·x 2

(110)

1

b0 = D− 2 · UT · b

(111)

Hessian of transformed quadratic function (110) is unity matrix H0 = I gradient of fq0 g0 = ∇fq0 (x0 )

(110) 0 = b + x0

(112)

transformation of gradient from (103), (106), (109), and (111) 1

1

∇f (x) = b + H · x = U · D 2 · b0 + U · D 2 · x0

(113)

from (112) 1

1

g = ∇f (x) = U · D 2 · ∇fq0 (x0 ) = U · D 2 · g0

(114)

min fq (x) ≡ min fq0 (x0 )

(115)

!

solution in transformed coordinates (g0 = 0) ?

x0 = −b0

Optimization of Analog Circuits

41

(116)

3.3.6.3 Conjugate directions T

∀ r(i) · H · r(j) = 0 (”H orthogonal”)

i6=j

(117)

from (106) ⇐⇒ ⇐⇒





i6=j

1

D 2 · UT · r(i) 0T

T  1  · D 2 · UT · r(j) = 0

0

∀ r(i) · r(j) = 0

(118)

i6=j

transformed conjugate directions are orthogonal 0T

1

r(i)

= D 2 · UT · r(i)

(119)

3.3.6.4 Step length substitute x = x(κ) + α · r(κ) in (101)   1 T fq (α) = bT · x(κ) + α · r(κ) + · x(κ) + α · r(κ) · H · x(κ) + α · r(κ) 2 T ∇fq (α) = bT · r(κ) + x(κ) + α · r(κ) · H · r(κ) T (103) = ∇fq x(κ) ·r(κ) + α · r(κ)T · H · r(κ) | {z }

(120) (121) (122)

T

g(κ)

∇fq (α) = 0: α(κ) =

−g(κ)T · r(κ) r(κ)T · H · r(κ)

(123)

3.3.6.5 New iterative solution x

(κ+1)

=x

(κ)

(κ)

(κ)

+ |{z} α · |{z} r =x (123) (117)

(0)

+

κ X

α(i) · r(i)

(124)

i=1

r(0) = − g(0) |{z} (103)

(125)

new search direction: combination of actual gradient and previous search direction r(κ+1) = −g(κ+1) + β (κ+1) · r(κ)

(126)

substituting (126) into (117) T

T

− g(κ+1) · H · r(κ) + β (κ+1) · r(κ) · H · r(κ) = 0

(127)

T

β

Optimization of Analog Circuits

(κ+1)

g(κ+1) · H · r(κ) = (κ)T r · H · r(κ) 42

(128)

x2 r(0) = −g(0) x? x(1)

r(1) with r(0)T · H · r(1) = 0 −g(1)

x(0)

x1 Figure 28.

x02

r(1)0 0T 0 with r(0) · r(1) = 0

0

r(0)

0

x(1)

x?

x(0)0 −g(0)0 x01 Figure 29.

Optimization of Analog Circuits

43

3.3.6.6 Some properties g(κ+1) = b + H · x(κ+1) = b + H · x(κ) + α(κ) · H · r(κ) = g(κ) + α(κ) · H · r(κ) T

g(κ+1) · r(κ)

(129)

=

(123)

=

=

T

(129)

T

g(κ) · r(κ) + α(κ) · r(κ) · H · r(κ) T r(κ) · H · r(κ) T T g(κ) · r(κ) − g(κ) · r(κ) · (κ)T r · H · r(κ) 0

(130)

i.e., the actual gradient is orthogonal to previous search direction (126)

T

− g(κ) · r(κ)

T

=

−g(κ) · −g(κ) + β (κ) · r(κ−1)

=

g(κ) · g(κ) − β (κ) · g(κ) · r(κ−1) | {z } 0 by (130)

=

g(κ) · g(κ)

T



T

T

(131)

i.e., descent direction going along conjugate direction T

g(κ+1) · g(κ)

(129)

=

(131),(126)

=

T

T

g(κ) · g(κ) + α(κ) · r(κ) · H · g(κ) T

T

T

T

−g(κ) · r(κ) + α(κ) · r(κ) · H · −r(κ) + β (κ) · r(κ−1)

(117)

−g(κ) · r(κ) − α(κ) · r(κ) · H · r(κ)

(123)

−g(κ) · r(κ) − g(κ) · r(κ) 0

=

= =

T



T

(132)

i.e., actual gradient is orthogonal to previous gradient substituting (126) into (130)  T g(κ+1) · −g(κ) + β (κ) · r(κ−1) = 0 (132)

⇐⇒

T

g(κ+1) · r(κ−1) = 0

(133)

i.e., actual gradient is orthogonal to all previous search directions substituting (129) into (132)  T g(κ+1) · g(κ−1) + α(κ−1) · H · r(κ−1) = 0 ⇐⇒

(126)

 T g(κ+1) · g(κ−1) + α(κ−1) · −r(κ+1) + β (κ+1) · r(κ) · H · r(κ−1) = 0

(117)

g(κ+1) · g(κ−1) = 0

⇐⇒

T

(134)

i.e., actual gradient is orthogonal to all previous gradients

Optimization of Analog Circuits

44

3.3.6.7 Simplified computation of CG step length substituting (131) into (123) T

α

(κ)

g(κ) · g(κ) = (κ)T r · H · r(κ)

(135)

3.3.6.8 Simplified computation of CG search direction substituting H · r(κ) from (129) into (128) T

β

(κ+1)

1 g(κ+1) · α(κ) · g(κ+1) − g(κ) = r(κ)T · H · r(κ)

 (136)

from (135) and (132) T

β

(κ+1)

g(κ+1) · g(κ+1) = g(κ)T · g(κ)

(137)

3.3.6.9 CG algorithm solves optimization problem (102) or linear equation system (103) in at most nx steps 

g(0) = b + H · x(0)

                             

r(0) = −g(0) κ=0 while r(k) 6= 0  g(κ)T ·g(κ) step length: α(κ) = (κ)T ⇒ line-search r ·H·r(κ)   in nonlinear programming    new point: x(κ+1) = x(κ) + α(κ) · r(κ)    new gradient: g(κ+1) = g(κ) + α(κ) · H · r(κ) ⇒ gradient computation   in nonlinear programming   T (κ+1) (κ+1) ·g  Fletcher-Reeves: β (κ+1) = g  (κ)T ·g (κ) g   new search-direction: r(κ+1) = −g(κ+1) + β (κ+1) · r(κ)  κ := κ + 1 end while

Polak-Ribi`ere:

T

β

(κ+1)

Optimization of Analog Circuits

g(κ+1) · g(κ+1) − g(κ) = g(κ)T · g(κ)

45

 (138)

Optimization of Analog Circuits

46

4 4.1 4.1.1

Constrained optimization – problem formulations Quadratic Programming (QP) QP – linear equality constraints

min fq (x) s.t. A · x = c fq (x) = bT · x + 12 xT · H · x H: symmetric, positive definite

(139)

A , nc ≤ nx , rank (A) = nc A·x=c underdetermined linear equation system with full rank (Appendix E).

Optimization of Analog Circuits

47

(140)

4.1.1.1 Transformation of (139) into an unconstrained optimization problem by coordinate transformation approach:   x = [ Y Z] ·  | {z }  non-singular

c y

   

=



Y | {z· c}

+

feasible point of (140)

Z·y | {z }

(141)

degrees of freedom in (140)

· Z} ·y = c A · x = |A{z · Y} ·c + A | {z

(142)

A · Y = I A · Z = 0

(143) (144)

(143)

(144)

φ (y) = fq (x (y))

(145)

1 = bT · (Y · c + Z · y) + (Y · c + Z · y)T · H · (Y · c + Z · y) 2  T 1 = b+ H·Y·c ·Y·c 2

(146) (147)

+ (b + H · Y · c)T · Z · y 1 + y T · ZT · H · Z · y 2

min fq (x) s.t. A · x = c ≡ min φ (y)

(148)

y? : solution of the unconstrained optimization problem (148), Z · y? lies in the kernel of A Stationary point of optimization problem (148): ∇φ (y) = 0  ZT · H · Z · y = − ZT · (b + H · Y · c) → y? → x? = Y · c + Z · y? | {z } {z } | reduced gradient reduced Hessian

(149)

Computation of Y, Z by QR-decomposition

from (143) and (483): RT · QT · Y = I →

Y = Q · R−T

(150)

from (143) and (483): RT · QT · Z = 0 →

Z = Q⊥

(151)

Optimization of Analog Circuits

48

x3

A · x = c ”feasible space” z2

Z·y

Y·c

z1

x2

x1 Figure 30. Y · c: ”minimum length solution”, i.e., orthogonal to plane A · x = c. zi , i = 1, · · · , nx − nc : orthonormal basis of kernel (null-space) of A, i.e. A · (Z · y) = 0.

4.1.1.3 Computation of Lagrange factor λ? required for QP with inequality constraints Lagrange function of problem (139) L (x, λ) = fq (x) − λT · (A · x − c) ∇L (x) = 0 : AT · λ → λ?

(152)

∇fq (x) (overdetermined)

= (143)

=

YT · ∇fq (x? )

λ? = YT · (b + H · x? )

(153)

(154)

or, in case of QR-decomposition AT · λ = b + H · x ?

(155)

Q · R · λ = b + H · x?

(156)

from (483) R · λ = QT · (b + H · x? )

−→ backward substitution

Optimization of Analog Circuits

49

λ?

(157)

4.1.1.4 Computation of Y · c in case of a QR-decomposition A·x (483)

T

⇐⇒ RT · Q · x | {z }

=

c

=

c

u

(158)

forward

RT · u Y·c

substitution

= (150)

=

c

−→

u

Q · R−T · c = Q · u (insert u)

4.1.1.5 Transformation of (139) into an unconstrained optimization problem by Lagrange multipliers first-order optimality conditions of (139) according to Lagrange function (152) ) b + H · x − AT · λ = 0 ∇L (x) = 0 : A·x−c=0      " # " # " #   H − AT x −b · = → x? , λ ? ⇐⇒  −A 0 λ −c    | {z }   Lagrange matrix L

(159)

(160)

L: symmetric, positive definite iff ZT · H · Z positive definite e.g., " L−1 =

K

−T

#

−TT

U −1 T K = Z · ZT · H · Z ·Z T = Y−K·H·Y U = YT · H · K · H · Y − YT · H · Y

factorization of L−1 by factorization of ZT · H · Z and computation of Y · Z

Optimization of Analog Circuits

50

(161) (162) (163) (164)

4.1.2

QP - inequality constraints

min fq (x) s.t. aTi · x = ci , i ∈ E aTj · x ≥ cj , j ∈ I fq (x) = bT · x + 12 xT · H · x H: symmetric, positive definite A , nc ≤ nx , rank (A) = nc .. .





  aT , i ∈ E    i   . .   .     A = · · · · · · · · ·   ..   .      aT , j ∈ I    j .. .

Optimization of Analog Circuits

51

(165)

compute initial solution x(0) that satisfies all constraints, e.g., by linear programming κ := 0 repeat  /* problem formulation at current iteration point x(κ) + δ   for active constraints A(κ) (active set method) */    min fq (δ) s.t. aTi · δ = 0, i ∈ A(κ) → δ (κ)   (κ)  case I: δ = 0 /* i.e., x(κ) optimal for current A(κ) */   (κ)  e.g., by (154) or (157)   compute λ     case Ia: ∀ λ(κ) ≥ 0, /* i.e., first-order optimality conditions satisfied */ i   h i∈A(κ) ∩I     x? = x(κ) ; stop       case Ib: ∃ λ(κ) < 0, i   i∈A(κ) ∩I     /* i.e., constraint becomes inactive, further improvement of f possible */      (κ)   minλi ,    q = arg(κ) i∈A ∩I          /* i.e., q is most sensitive constraint to become inactive */    (κ+1)  = A(κ) \ {q}    A   x(κ+1) = x(κ)   case II: δ (κ) 6= 0, /* i.e., f can be reduced for current A(κ) ; step-length computation;    if no new constraint becomes active then α(κ) = 1 (case IIa); else find inequality    constraint that becomes inactive first and corresponding α(κ) < 1 (case IIb) */   safety margin bound  z }| {  T (κ)   ci − ai · x   α(κ) = min(1, min )   (κ) (κ) T aTi · δ (κ)   i ∈ / A , a · δ < 0 i | {z }   | {z }     consumed safety margin inactive constraints that     (κ) at α =1   approach their lower bound     0   q = argmin (–”–)       case IIa: α(κ) = 1      A(κ+1) = A(κ)       case IIb: α(κ) < 1      A(κ+1) = A(κ) ∪ (q 0 )     x(κ+1) = x(κ) + α(κ) · δ (κ)  κ := κ + 1

Optimization of Analog Circuits

52

4.1.3

Example

(B) fq = const

x

δ (4)

(5)

x(6) x(4) = x(3)

δ (5)

x2

δ (2) x1

(C) (A)

x(1) = x(2)

Figure 31.

(κ)

Concerning q = arg minλi

(κ)

A(κ)

case

(1)

{A, B}

Ib

(2)

{B}

IIa

(3)

{B}

Ib

(4)

{}

IIb

(5)

{C}

IIa

(6)

{C}

Ia

for κ = 1:

i∈A(1) ∩I

∇A x(1)

∇B

∇fq Figure 32. ∇fq = λA · ∇A + λB · ∇B → λA < 0, λB = 0

Optimization of Analog Circuits

53

4.2 4.2.1

Sequential Quadratic programming (SQP), Lagrange–Newton SQP – equality constraints

min f (x)

s.t.

c (x) = 0

(166)

c , nc ≤ nx 4.2.1.1 Lagrange function of (166) L (x, λ) = f (x) − λT · c (x) = f (x) −

X

λi · ci (x)

(167)

i

  ∇L (x, λ) =  



∇L(x)

"

∇L (λ)

  = 

∇f (x) −

P

i λi · ∇ci (x)

#

−c (x) 



AT (x)

}| { z   =  g(x) − [· · · ∇ci (x) · · · ] ·λ  −c (x)

 ∇2 L (x, λ) = 

∂2L



T ∂ x·∂ λ 2



2

∇ L (x) ∂2L ∂ λ·∂ x

T

∇ L (λ)



P ∇2 f (x) − i λi · ∇2 ci (x)    ..  .      −∇ci (x)T  =     ..  .  | {z } −A(x)

Optimization of Analog Circuits

54

−AT (x)



0

       

(168)

(169)

4.2.1.2 Newton approach to (167) in κ-th iteration step 







"

(κ) (κ) (κ) (κ) (κ) ∇L(x + δ (κ)}, λ + ∇2 L x(κ) , λ(κ) · | {z | +{z∆λ }) ≈ ∇ L x , λ | {z } | {z } x(κ+1) λ(κ+1) (κ) 2 (κ) ∇L

δ (κ) ∆λ

# !

=0

(κ)

∇ L

(170) with (168) and (169) 



W(κ)

z X}| (κ) {  2 (κ) 2 (κ)  ∇ f (x ) − λi · ∇ ci x   i −A(κ)

−A

(κ)T

  · 

0

"

δ (κ) ∆λ(κ)

#

" =

T

−g(κ) + A(κ) · λ(κ)

#

c(κ) (171)

"

T

W(κ)

−A(κ)

−A(κ)

0

# " ·

#

δ (κ)

" =

λ(κ+1)

−g(κ)

#

c(κ)

(172)

⇑  1 T L (δ, λ) = g(κ) · δ + δ T · W(κ) · δ − λT · A(κ) · δ + c(κ) 2 ⇑ 1 T min g(κ) · δ + δ T · W(κ) · δ s.t. A(κ) · δ = −c(κ) 2

(173)

(174)

QP problem, W(κ) according to (171) including quadratic parts of objective and constraint, linearized equality constraints from (166) • Quasi-Newton approach possible (κ)

• inequality constraints, A(κ) · δ ≥ −cI , treated as in QP 4.2.2

Penalty function

transformation into a penalty function of an unconstrained optimization problem with penalty parameter µ > 0 quadratic: Pquad (x, µ) = f (x) +

1 T c (x) · c (x) 2µ

(175)

logarithmic: Plog (x, µ) = f (x) − µ ·

X

log ci (x)

(176)

i

l1 exact: Pl1 (x, µ) = µ · f (x) −

X

|ci (x)| +

i∈E

Optimization of Analog Circuits

X i∈I

55

max (−ci (x), 0)

(177)

Optimization of Analog Circuits

56

5

Statistical parameter tolerances

modeling of manufacturing variations through a multivariate continuous distribution function of statistical parameters xs cumulative distribution function (cdf): Z Z xs,1 ··· cdf (xs ) =

xs,nxs

pdf (t) · dt

(178)

−∞

−∞

dt = dt1 · dt2 · . . . · dtnxs (discrete: cumulative relative frequencies) probability density function (pdf): pdf (xs ) =

∂ nxs cdf (xs ) ∂xs,1 · · · ∂xs,nxs

(discrete: relative frequencies) xs,i denotes the random number value of a random variable Xs,i

Optimization of Analog Circuits

57

(179)

5.1

Univariate Gaussian distribution (normal distribution) xs ∼ N xs,0 , σ 2



(180)

xs,0 : mean value σ2

: variance

σ

: standard deviation

probability density function of the univariate normal distribution: pdfN xs , xs,0 , σ

2



1 −1 =√ ·e 2 2π · σ

s −xs,0 σ

x

2

(181)

pdfN

−3σ −2σ σ

0

σ









xs − xs,0

cdfN

1

0.5

−3σ −2σ σ

0

σ

xs − xs,0

Figure 33. Probability density function, pdf, and corresponding cdf of a univariate Gaussian distribution. The area of the shaded region under the pdf is the value of the cdf as shown.

xs − xs,0

−3σ

cdf (xs − xs,0 ) 0.1%

−2σ

−σ

0

σ





2.2% 15.8% 50% 84.1% 97.7% 99.8% Standard normal distribution table

Optimization of Analog Circuits

58

4σ 99.99%

5.2

Multivariate normal distribution xs ∼ N (xs,0 , C)

(182)

xs,0 : vector of mean values of statistical parameters xs C: covariance matrix of statistical parameters xs , symmetric, positive definite probability density function of the multivariate normal distribution: 1 2 1 · e− 2 β (xs ,xs,0 ,C) pdfN (xs , xs,0 , C) = √ nxs p 2π · det (C)

(183)

β 2 (xs , xs,0 , C) = (x − xs,0 )T · C−1 · (x − xs,0 )

(184)

C=Σ·R·Σ

(185)

ρ1,nxs



  ρ1,2  1 · · · ρ2,nxs ... ,R =   . .. ..   .. . .  0 σnxs ρ1,nxs ρ2,nxs · · · 1   σ1 ρ1,2 σ2 · · · σ1 ρ1,nxs σnxs σ12    σ1 ρ1,2 σ2 · · · σ2 ρ2,nxs σnxs  σ22   C=  . . . . .. .. .. ..     σ1 ρ1,nxs σnxs ··· ··· σn2 xs

     

 

σ1

0



1

ρ1,2

···

 Σ= 

R

:

correlation matrix of statistical parameters

σk

:

standard deviation of component xs,k , σk > 0

σk2

:

variance of component xs,k

σk ρk,l σl

:

covariance of components xs,k , xs,l

ρk,l

:

correlation coefficient of components xs,k and xs,l , −1 < ρk,l < 1

ρk,l = 0

:

uncorrelated and also independent if jointly normal

|ρk,l | = 1

:

strongly correlated components

Optimization of Analog Circuits

59

(186)

(187)

xs,2

xs,0 xs,0,2

β 2 (xs ) = const xs,0,1

xs,1

Figure 34. Level sets of a two-dimensional normal pdf

xs,2

xs,2

xs,2 β 2 (xs ) = const

xs,0,2

(c)

(b)

(a)

β 2 (xs ) = const

xs,0

xs,0

xs,0,2

xs,0,2

β 2 (xs ) = const

xs,0

x1 xs,0,1

xs,1

xs,0,1

xs,1

xs,0,1

xs,1

Figure 35. Level sets of a two-dimensional normal pdf with general covariance matrix C, (a), with uncorrelated components, R = I, (b), and uncorrelated components of equal spread, C = σ 2 · I, (c).

Optimization of Analog Circuits

60

x2

a · 2 · σ2

ρ=0

β 2 = const = a2

a · 2 · σ1

x1

Figure 36. Level set with β 2 = a2 of a two-dimensional normal pdf for different values of the correlation coefficient, ρ.

Optimization of Analog Circuits

61

5.3

Transformation of statistical distributions

y ∈ Rny , z ∈ Rnz , ny = nz z = z (y), y = y (z) such that the mapping from y to z is smooth and bijective (precisely z = φ (y), y = φ−1 (z)) y

Z

0

···

cdfy (y) = Z =

Z

Z

0

pdfy (y ) · dy =

−∞ z Z

z(y) Z

···

  ∂y · dz0 pdfy (y (z )) · det ∂zT 0

0

−∞

pdfz (z0 ) · dz0 = cdfz (z)

···

(188)

−∞

y

z

Z

Z ···

0

Z

0

pdfy (y ) · dy =

−∞

Z ···

pdfz (z0 ) · dz0

(189)

−∞

  ∂y pdfz (z) = pdfy (y (z)) · det ∂zT univariate case:

(190)

∂y pdfz (z) = pdfy (y(z)) · ∂z

(191)

In the simple univariate ∂y  case the function pdfz has a domain that is a scaled version of the domain of pdfy . ∂z determines the scaling factor. In high-order cases, the random ∂y variable space is scaled and rotated with the Jacobian matrix ∂z T determining the scaling and rotation.

pdfy (y)

y 1 y2

pdfz (z)

y3

y

z1 z2 z3

z

Figure 37. Univariate pdf with random number y is transformed to a new pdf of new random number z = z (y). According to (188) the shaded areas as well as the hatched areas under the curve are equal.

Optimization of Analog Circuits

62

5.3.1

Example

Given: probability density function pdfU (z), here a uniform distribution: ( 1 for 0 < z < 1 pdfU (z) = 0 otherwise

(192)

probability density function pdfy (y), y ∈ R random number z Find: random number y

from (188) Z

z

y

0

−∞

1

+∞

z 0

pdfz (z ) | {z }

0

0

Z

y

·dz =

pdfy (y 0 ) · dy 0

(193)

−∞

1 for 0 ≤ z ≤ 1 from (192) hence

Z

y

z=

pdfy (y 0 ) · dy 0 = cdfy (y)

(194)

y = cdfy−1 (z)

(195)

−∞

This example details a method to generate sample values of a random variable y with an arbitrary pdf pdfy if sample values are available from a uniform distribution pdfz : • insert pdfy (y) in (194) • compute cdfy by integration • compute inverse cdfy−1 • create uniform random number, z, insert into (195) to get sample value, y, according to pdfy (y)

Optimization of Analog Circuits

63

Optimization of Analog Circuits

64

6 6.1 6.1.1

Expectation values and their estimators Expectation values Definitions

h (z): function of a random number z with probability density function pdf (z) Expectation value Z E {h (z)} = E {h (z)} =

+∞ Z

···

h (z) · pdf (z) · dz

(196)

pdf (z) −∞

Moment of order κ m(κ) = E {z κ }

(197)

Mean value (first-order moment) m(1) = m = E {z}  E {z1 }  .. m = E {z} =  .  E {znz }

(198)    

(199)

Central moment of order κ c(κ) = E {(z − m)κ }

(200)

c(1) = 0 Variance (second-order central moment)  c(2) = E (z − m)2 = σ 2 = V {z}

(201)

σ: standard deviation Covariance cov {zi , zj } = E {(zi − mi ) (zj − mj )}

(202)

Variance/covariance matrix n o C = V {z} = E (z − m) · (z − m)T  V {z1 } cov {z1 , z2 } · · · cov {z1 , znz }   cov {z2 , z1 } V {z2 } · · · cov {z2 , znz }  =  . .. ... ..  .  cov {znz , z1 } cov {znz , z2 } · · · V {znz }

      

(203)

n o T V {h (z)} = E (h (z) − E {h (z)}) · (h (z) − E {h (z)})

(204)

Optimization of Analog Circuits

65

6.1.2

Linear transformation of expectation value

E {A · h (z) + b} = A · E {h (z)} + b

(205)

special cases: E {c} = c, c is a constant E {c · h (z)} = c · E {h (z)} E {h1 (z) + h2 (z)} = E {h1 (z)} + E {h2 (z)} 6.1.3

Linear transformation of variance

V {A · h (z) + b} = A · V {h (z)} · AT

(206)

special cases:  V aT · h (z) + b = aT · V {h (z)} · a V {a · h (z) + b} = a2 · V {h(z)} Gaussian error propagation: X  V aT · z + b = aT · C · a = ai aj σi ρi,j σj i,j

6.1.4

=

∀ ρi,j =0

i6=j

X

a2i σi2

i

Translation law of variances

n o V {h (z)} = E (h (z) − a) · (h (z) − a)T − (E {h (z)} − a) · (E {h (z)} − a)T (207) special cases:  V {h (z)} = E (h (z) − a)2 − (E {h (z)} − a)2   V {h (z)} = E h (z) · hT (z) − E {h (z)} · E hT (z)   V {h (z)} = E h2 (z) − (E {h (z)})2 6.1.5

Normalizing a random variable

z − mz z − E {z} p = σz V {z} E {z} − mz E {z 0 } = =0 σz n o E (z − m )2 z 2 0 0 V {z } = E (z − 0) = =1 2 σz z0 =

Optimization of Analog Circuits

66

(208) (209) (210)

6.1.6

Linear transformation of a normal distribution

x ∼ N (x0 , C) f (x) = fa + gT · (x − xa )

(211)

mean value µf of f :   µf = E f = E fa + gT · (x − xa ) = E {fa } + gT · (E {x} − E {xa }) µf = fa + gT · (x0 − xa )

(212)

variance σf2 of f : σf2

n

2 o

n

2 o = E f − µf = E g (x − x0 ) n o = E gT · (x − x0 ) · (x − x0 )T · g n o = gT · E (x − x0 ) · (x − x0 )T · g σf2 = gT · C · g

Optimization of Analog Circuits

67

T

(213)

6.2 6.2.1

Estimation of expectation values Expectation value estimator

ˆh = Eˆ {h (x)} = m

1 nM C

·

n MC X

h x(µ)



(214)

µ=1

x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nM C sample of the population with nM C sample elements, i.e., sample size nM C sample elements x(µ) , µ = 1, · · · , nM C , that are independently and identically distributed, i.e,   E h x(µ) = E {h (x)} = mh (215)      (ν) (µ) (µ) = 0, µ 6= ν (216) ,h x = V {h (x)} , cov h x V h x  ˆ (x) = φ ˆ x(1) , . . . , x(nM C ) : estimator function of φ (x) φ

6.2.2

(217)

Variance estimator

Vˆ {h (x)} =

1

n MC X

nM C − 1

µ=1

   T ˆ h · h x(µ) − m ˆh h x(µ) − m

(218)

x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nM C ˆ Vˆ {h (x)} =

1

n MC X

nM C

µ=1

estimator bias: unbiased estimator: consistent estimator:

   T h x(µ) − mh · h x(µ) − mh

(219)

n o ˆ bφˆ = E φ (x) − φ (x)

(220)

n o ˆ (x) = φ (x) ⇐⇒ b ˆ = 0 E φ φ

(221)

n o

ˆ

− φ < || = 1 lim P φ

(222)

nM C →∞

strongly consistent: →0

(223)

variance of an estimator (quality):    T  n o ˆ ˆ ˆ + b ˆ · bTˆ Qφˆ = E φ−φ · φ−φ =V φ φ φ bφˆ = 0 : Qφˆ = V

Optimization of Analog Circuits

68

n o ˆ φ

(224) (225)

6.2.3

Variance of the expectation value estimator

ˆ h} = V Qmˆ h = V {m

Qmˆ h

n

Eˆ {h (x)} = V

1

(206)

=

(

o

n2M C

·V

(n MC X

1 nM C

·

n MC X

h x

(µ)

) 

µ=1

) Ihnh ,nh i · h(µ)

µ=1

 h(µ) = h x(µ) , Ihnh ,nh i identity matrix of size nh of h(µ)

1

Qmˆ h =

Qmˆ h

(206) =

1

n2M C

   h(1)      (2)     h Ihnh ,nh i Ihnh ,nh i . . . Ihnh ,nh i ·  ·V ..   .       h(nM C )



n2M C

· Ihnh ,nh i Ihnh ,nh i

  h(1)       h(2)   . . . Ihnh ,nh i · V  ..   .       h(nM C )

              

   I   hnh ,nh i      I  hnh ,nh i   · ..   .       Ihnh ,nh i

Qmˆ h

(216) =

=

1 n2M C 1 n2M C



· Ihnh ,nh i Ihnh ,nh i

V {h}

  . . . Ihnh ,nh i ·   0

     

Ihnh ,nh i



  I  hnh ,nh i ... · ..    .  V {h} Ihnh ,nh i

     

 



0



· nM C · V {h}

ˆ h} = Qmˆ h = V {m

1 nM C

· V {h}

(226)

ˆ mˆ , V {h} by Vˆ {h}, (206) by (229) to obtain the variance estimator of replace Qmˆ h by Q h the expected value estimator ˆ mˆ = Vˆ {m ˆ h} = Q h

1 nM C

· Vˆ {h}

(227)

√ standard deviation of the mean estimator decreases with 1/ nM C , e.g., 100 times more sample elements for 10 times smaller standard deviation in the expectation value estimator Optimization of Analog Circuits

69

6.2.4

Linear transformation of estimated expectation value

Eˆ {A · h (z) + b} = A · Eˆ {h (z)} + b

6.2.5

Linear transformation of estimated variance

Vˆ {A · h (z) + b} = A · Vˆ {h (z)} · AT

6.2.6

(228)

(229)

Translation law of estimated variance

Vˆ {h (z)} =

 i nM C h ˆ  E h (z) · hT (z) − Eˆ {h (z)} · Eˆ hT (z) nM C − 1

Optimization of Analog Circuits

70

(230)

7 7.1

Worst-case analysis Task

indexes d, s, r for parameter types xd , xs , xr left out index i for performance feature fi left out Given: tolerance region T of parameters Find: worst-case performance value, fW , that the circuit takes over T and corresponding worst-case parameter vectors, xW

7.2

optimization

specification

“good”

“bad” worst-case performance

max f

lower bound: f ≥ fL

f↑

f↓

fW L = f (xW L )

min f

upper bound: f ≤ fU

f↓

f↑

fW U = f (xW U )

Typical tolerance regions

box: TB = {x |xL ≤ x ≤ xU } o n 2 , ellipsoid: TE = x β 2 (x) = (x − x0 )T · C−1 · (x − x0 ) ≤ βW C is symmetric, positive definite

x2

x2 β = βW

xU,2 TE TB x0

xL,2 xL,1

xU,1 x1

x1

Figure 38. Tolerance box, tolerance ellipsoid.

Optimization of Analog Circuits

71

7.3

Classical worst-case analysis

indexes d, s, r for parameter types xd , xs , xr left out index i for performance feature fi left out 7.3.1

Task

Given: hyper-box tolerance region TB = {x |xL ≤ x ≤ xU }

(231)

f (x) = fa + gT · (x − xa )

(232)

linear performance model Find: worst-case parameter vectors xW L/U and corresponding worst-case performance  values fW L/U = f W L/U xW L/U

x2 g

f = fa

xW U

xU,2

f = fW U = f (xW U ) xa TB xL,2

xW L

1 0 0 1 0 1 f = fW L = f (xW L ) 0 1 0 1

xL,1

1 0 0 1 0 1 0 1 0 1

xU,1

x1

Figure 39. Classical worst-case analysis with tolerance box and linear performance model.

Optimization of Analog Circuits

72

7.3.2

Linear performance model

sensitivity analysis: fa = f (xa ) g = ∇f (xa )

(233) (234)

forward finite-difference approximation: fa = f (xa ) ∇f (xa,i ) ≈ gi =

(235)

f (xa + ∆xi · ei ) − f (xa ) ∆xi

ei = [0 . . . 0 1 0 . . . 0]T ↑ i-th position

(236)

(237)

(a)

f, f f

b f (xa )

f

a g=

b a

xa

x (b)

f, f f f (xa + ∆x)

f

f (xa )

g=

xa

xa + ∆x

f (xa +∆x)−f (xa ) ∆x

x

Figure 40. Linear performance model based on gradient (a), based on forward finitedifference approximation of gradient (b).

Optimization of Analog Circuits

73

7.3.3

Peformance type f ↑ ”good”, specification type f > fL

( min f (x) s.t. | {z }

x ≥ xL

→ xW L , fW L = f (xW L )

x ≤ xU

min gT ·x

(238)

specific linear programming problem with analytical solution corresponding Lagrange function: L (x, λL , λU ) = gT · x − λTL · (x − xL ) − λTU · (xU − x)

(239)

first-order optimality conditions: ∇L (x) = 0 : g − λ?L + λ?U = 0

(240)

 λ?L/U ≥ 0     ( ?T xU − xW L ≥ 0, xW L − xL ≥ 0  λL · (xW L − xL ) = 0 T λ?L,j · (xW L,j − xL,j ) = 0, j = 1, · · · , nx  λ?U · (xU − xW L ) = 0     λ?U,j · (xU,j − xW L,j ) = 0, j = 1, · · · , nx 

∀ ai · bi = 0 ⇒ aT · b = i



P

ai · b i = 0

i T

a · b = 0, ai ≥ 0, bi ≥ 0 ⇒ ∀ ai · bi = 0

(241)

 

i

second-order optimality condition holds because ∇2 L (x) = 0 either constraint xL,j or constraint xU,j active, never both, therefore from (240) and (241): either: gj = λ?L,j > 0 or: gj = −λ?U,j < 0

(242) (243)

component of the worst-case parameter vector xW L :

xW L,j

  gj > 0   xL,j , = xU,j , gj < 0    undefined, g = 0 j

(244)

worst-case performance value: fW L = fa + gT · (xW L − xa ) = fa +

X j

Optimization of Analog Circuits

74

gj · (xW L,j − xa,j )

(245)

7.3.4

Performance type f ↓ ”good”, specification type f < fU

( min −f (x) s.t. | {z } min −gT ·x

x ≥ xL x ≤ xU

→ xW U , fW U = f (xW U )

(246)

component of the worst-case parameter vector xW U :

xW U,j =

    xL,j ,

gj < 0

xU,j , gj > 0    undefined, g = 0 j

(247)

worst-case performance value: fW U = fa + gT · (xW U − xa ) = fa +

X j

Optimization of Analog Circuits

75

gj · (xW U,j − xa,j )

(248)

7.4

Realistic worst-case analysis

indexes d, s, r for parameter types xd , xs , xr left out index i for performance feature fi left out 7.4.1

Task

Given: ellipsoid tolerance region n o 2 TE = x (x − x0 )T · C−1 · (x − x0 ) ≤ βW

(249)

linear performance model f (x) = fa + gT · (x − xa )

(250)

T

T

= f0 + g · (x − x0 ) with f0 = fa + g · (x0 − xa )

(251)

Find: worst-case parameter  vectors xW L/U and corresponding worst-case performance values fW L/U = f xW L/U

x2 f = fW L = f (xW L )

xW L

f = fa + gT · (x0 − xa ) x0,2

x0 g

f = fW U = f (xW U )

xW U

TE

x1

x0,1

Figure 41. Realistic worst-case analysis with tolerance ellipsoid and linear performance model.

Optimization of Analog Circuits

76

7.4.2

Peformance type f ↑ ”good”, specification type f ≥ fL :

2 min f (x) s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW → xW L , fW L = f (xW L ) | {z }

(252)

min gT ·x

specific nonlinear programming problem (linear objective function, quadratic constraint function) with analytical solution corresponding Lagrange function:   2 L (x, λ) = gT · x − λ βW − (x − x0 )T · C−1 · (x − x0 )

(253)

first-order optimality conditions: ∇L = 0 : g + 2λW L · C−1 · (xW L − x0 ) = 0

(254)

due to linear function f , the solution is on the border of TE , i.e., the constraint is active: 2 (xW L − x0 )T · C−1 · (xW L − x0 ) = βW

(255)

λW L > 0

(256)

second-order optimality condition holds because: ∇2 L (x) = 2λW L · C−1 , C−1 is positive definite, and because of (256) (254) gives: xW L − x 0 = −

1 ·C·g 2λW L

(257)

substituting (257) into (255) gives: 1 2 g T · C · g = βW 4λ2W L

(258)

inserting λW L from (258) into (257) to eliminate λW L gives a worst-case parameter vector in terms of the performance gradient and tolerance region constants: xW L − x0 = − √ βTW

·C·g

g ·C·g − βσW · C f

=

·g

(259)

substituting (259) in (251) gives the corresponding worst-case performance value: fW L = f (xW L ) = f0 + gT · (xW L − x0 ) p = f0 − βW · gT · C · g =

f0 − βW · σf

βw -sigma design, worst-case distance βw Optimization of Analog Circuits

77

(260)

7.4.3

Performance type f ↓ ”good”, specification type f ≤ fU

(252) becomes 2 min −f (x) s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW → xW U , fW U = f (xW U ) | {z }

(261)

min −gT ·x

(259) becomes xW U − x0 = + √ βTW

·C·g

g ·C·g + βσW · C f

=

·g

(262)

(260) becomes fW U = f (xW U ) = f0 + gT · (xW U − x0 ) p = f0 + βW · gT · C · g =

Optimization of Analog Circuits

f0 + βW · σf

78

(263)

7.5

General worst-case analysis

indexes d, s, r for parameter types xd , xs , xr left out index i for performance feature fi left out 7.5.1

Task

Given: ellipsoid tolerance region: n o 2 TE = x (x − x0 )T · C−1 · (x − x0 ) ≤ βW

(264)

general smooth performance f (x) Find: worst-case parameter  vectors xW L/U and corresponding worst-case performance values fW L/U = f xW L/U

x2

f

= fW U

f = fW U = f (xW U ) TE

∇f (xW U ) xW U x0 f

(W U )

f = f (x0 ) = f0

∇f (xW L ) (W L)

f = fW L f = fW L = f (xW L )

xW L

x1 Figure 42. General worst-case analysis with tolerance ellipsoid and nonlinear performance function.

Optimization of Analog Circuits

79

7.5.2

Peformance type f ↑ ”good”, specification type f ≥ fL :

2 min f (x) s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW x

(265)

specific nonlinear programming problem (non-linear objective function, one quadratic inequality constraint) with numerical solution, e.g., by Sequential Quadratic Programming (SQP), which yields ∇f (xW L ) assumption: unique solution on border of TE Linearization of objective function f at worst-case point xW L , i.e., after solution of (265): (W L) f (x) = fW L + ∇f (xW L )T · (x − xW L ) (266) substituting (266) in (265) gives: (W L)

min f | {z

2 (x) s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW }

(267)

min ∇f (xW L )T ·x

structure identical to realistic worst worst-case analysis (252), replace g by ∇f (xW L ) worst-case parameter vector: xW L − x0 = − √ =

βW

· C · ∇f (xW L )

∇f (xW L )T ·C·∇f (xW L ) W − σ β(W · C · ∇f L)

(xW L )

(268)

f

worst-case performance value: f

(W L)

(W L)

(x0 ) = f 0

= fW L + ∇f (xW L )T · (x0 − xW L ) (W L)

fW L = f 0

+ ∇f (xW L )T · (xW L − x0 )

(269)

substituting (268) in (269) gives the corresponding worst-case performance value: (W L)

fW L = f 0 =

− βW ·

q ∇f (xW L )T · C · ∇f (xW L )

(W L)

f0

βw -sigma design, worst-case distance βw

Optimization of Analog Circuits

80

− βW · σf (W L)

(270)

7.5.3

Performance type f ↓ ”good”, specification type f ≤ fU :

(265) becomes 2 min −f (x) s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW

(271)

(266) becomes f

(W U )

(x) = fW U + ∇f (xW U )T · (x − xW U )

(272)

(267) becomes 2 min −∇f (xW U )T · x s.t. (x − x0 )T · C−1 · (x − x0 ) ≤ βW

(273)

worst-case parameter vector: xW U − x0 = + √ =

βW

· C · ∇f (xW U )

∇f (xW U )T ·C·∇f (xW U ) W + σ β(W · C · ∇f U)

(xW U )

(274)

f

worst-case performance value: (W U )

fW U = f 0

+ βW ·

(W U )

f0

=

7.5.4

q ∇f (xW U )T · C · ∇f (xW U )

(275)

+ βW · σf (W U )

General worst-case analysis with tolerance box

lower worst-case:

( min f (x)

s.t.

x ≥ xL x ≤ xU

(276)

upper worst-case: ( min −f (x)

Optimization of Analog Circuits

s.t.

81

x ≥ xL x ≤ xU

(277)

7.6

Input/output of worst-case analysis

tolerance box TB (xL , xU )

for each fi worst-case parameter vector (i) (i) xW L,C , xW U,C

classical worst-case analysis

worst-case performance (i) (i) f W L,C , f W U,C (i) (i) fW L,C , fW U,C

x, f x, f (once) J sensitivity analysis

tolerance ellipsoid TE (C, x0 , βW ) ”βW -sigma design” x, f

for each fi worst-case parameter vector (i) (i) xW L,R , xW U,R

realistic worst-case analysis

worst-case performance (i) (i) f W L,R , f W U,R (i) (i) fW L,R , fW U,R

x, f (once) J sensitivity analysis

tolerance ellipsoid and/or tolerance box x, f

general worst-case analysis

x(κ) , f (κ)

(nit times) sensitivity analysis

Optimization of Analog Circuits

82

J(κ)

for each fi worst-case parameter vector (i) (i) xW L,G , xW U,G worst-case performance (i) (i) fW L,G , fW U,G

Jacobian matrix sensitivity matrix ∂f J = ∂x T

sensitivity analysis

x, f x + ∆xi ei

∆fi = f (x + ∆xi ei ) − f

(nx times) simulation



∂f1 ∂x1 ∂f2 ∂x1

∂f1 ∂x2 ∂f2 ∂x2

∂fnf

∂fnf

∂x1

∂x2

   J= .  .. 

.. .

··· ··· .. . ···

parameters   x  d  x=  xs  xr



∂f1 ∂xnx  ∂f2  ∂xnx 



(1)T



∂g =      ∂g(2)T =  1   = ∆fi · · ·  ≈ ··· .. ..    ∆xi . .     ∂fnf ∂fnf (nf )T ∂g = ∂xnx ∂ xT

simulation netlist, technology simulation bench

Optimization of Analog Circuits

∂f1 ∂ xT ∂f2 ∂ xT

83

performances f

7.7

Summary of discussed worst-case analysis problems

• For each performance feature fi there exists a worst-case parameter vector xW L,i and/or xW U,i respectively and a corresponding worst-case performance value fW L,i and/or fW U,i . • xW L,i and xW U,i are unique in the classical and realistic worst-case analysis. Several worst-case parameter vectors may exist in the general worst-case analysis. worst case

feasible region

objective function

classical

hyper-box

linear

general

hyper-box

non-linear

realistic

ellipsoid

linear

general

ellipsoid

non-linear

good for

analysis type uniform distribution unknown distribution discrete circuits range parameters normal distribution IC transistor parameters

• Worst-case analysis requires design technology (circuit, performance features) and process technology (statistical parameters, parameter distribution).

Optimization of Analog Circuits

84

8

Yield analysis

8.1

Task

Given: • statistical parameters with normal distribution, eventually obtained through transformation • performance specification Find: percentage/proportion of circuits that fulfill the specifications statistical parameter distribution (manufacturing process) 1 2 1 ∆ · e− 2 β (xs ) pdf (xs ) = pdfN (xs ) = √ nxs p 2π · det (C)

(278)

β 2 (xs ) = (xs − xs,0 )T · C−1 · (xs − xs,0 )

(279)

performance acceptance region, performance specification (customer) Af = {f | fL ≤ f ≤ fU }

(280)

solution requires either: non-normal distribution pdff (f ) or: non-linear parameter acceptance region As = {xs | f (xs ) ∈ Af } (dashed lines in Fig. 43)

xs,2

f2 β = const As

fU,2

xs,0

Af

f (xs,0 ) fL,2 xs,1

Figure 43.

Optimization of Analog Circuits

85

fL,1

fU,1

f1

8.1.1

Acceptance function

( δ (xs ) =

8.1.2

1, f (xs ) ∈ Af 0, f (xs ) ∈ / Af

( =

1, xs ∈ As

”circuit functions”

0, xs ∈ / As

”circuit malfunctions”

(281)

Parametric yield

Z Y

Z ···

=

pdf (xs ) · dxs

(282)

δ (xs ) · pdf (xs ) · dxs

(283)

As ∞

Z =

Z ··· −∞

= E {δ (xs )} yield: expected value of the acceptance function

Optimization of Analog Circuits

86

(284)

8.2

Statistical yield analysis/Monte-Carlo analysis

• sample of statistical parameter vectors according to given distribution x(µ) s ∼ N (xs,0 , C) , µ = 1, . . . , nM C

(285)

• (numerical) circuit simulation of each sample element (simulation of the stochastic manufacturing process on circuit level)  (µ) (µ) , µ = 1, . . . , nM C (286) → 7 f = f x x(µ) s s • evaluation of the acceptance function (simulation of a production test) (  1, f (µ) ∈ Af = , µ = 1, . . . , nM C δ (µ) = δ x(µ) s 0, f (µ) ∈ / Af

(287)

• statistical yield estimation Yˆ = Eˆ {δ (xs )} =

1

n MC X

nM C

µ=1

δ (µ)

(288)

number of functioning circuits sample size #{”+”} n+ = (Fig. 44) = nM C #{”+”} + #{”-”}

=

xs,2 β = const As

xs,1 Figure 44.

Optimization of Analog Circuits

87

(289) (290)

8.2.1

Variance of yield estimator

n o V Yˆ

=

V {δ(xs )}

V {δ (xs )} = σY2ˆ

nM C 2 E{δ (xs )} − (E {δ (xs )})2 = Y (1 − Y ) | {z } | {z } Y δ(xs ) {z } | {z } |

=

(291) (292)

Y2

Y

8.2.2

1

n o (226) ˆ V E {δ (xs )} =

(288)

Estimated variance of yield estimator

n o Vˆ Yˆ

(288)

=

n o (227) Vˆ Eˆ {δ (xs )} = 1

(230)

=

nM C

·

1 ˆ V {δ (xs )}

nM C ˆ 2 (xs )} − (Eˆ {δ (xs )})2 ] [E{δ | {z } | {z } nM C − 1 δ(xs ) Yˆ {z } | {z } |

Yˆ 1 − Yˆ =



nM C − 1

=σ ˆY2ˆ

(295)

Yˆ is binomially distributed: probability that n+ of nM C circuits are functioning nM C → ∞: Yˆ is normally distributed (central limit theorem) in practice: n+ > 4, nM C − n+ > 4 and nM C > 10

σ ˆY2ˆ 0.25 nM C

nM C

0.5 50%

0 0% Yˆ = 85%:

nM C σ ˆYˆ

10

50

11.9% 5.1% Figure 45.

Optimization of Analog Circuits

(294)

Yˆ 2





(293)

nM C

88

1.0 Yˆ 100% 100

500

1000

3.6% 1.6% 1.1%

• confidence interval, confidence level P (Y ∈ [Yˆ − kζ · σ ˆYˆ , Yˆ + kζ · σ ˆYˆ ]) = {z } | confidence interval

Z



t2 1 √ e− 2 dt 2π {z }

−kζ

|

(296)

confidence level

e.g., nM C = 1000, Yˆ = 85% → σ ˆYˆ = 1.1%; kζ = 3: P (Y ∈ [81.7%, 88.3%]) = 99.7% • given: yield estimator Yˆ , confidence interval Y ∈ [Yˆ − ∆Y, Yˆ + ∆Y ], confidence level ζ% find: nM C

    ζ = cdfN Yˆ + kζ · σ ˆYˆ − cdfN Yˆ − kζ · σ ˆYˆ → kζ

(297)

∆Y = kζ · σ ˆYˆ → σ ˆYˆ   Yˆ · 1 − Yˆ σ ˆY2ˆ = nM C − 1   Yˆ · 1 − Yˆ · (kζ )2 nM C = 1 + ∆Y 2

(298) (299)

(300)

number nM C for various confidence intervals and confidence levels: ζ

90%

95%

99%

99.9%



1.645

1.960

2.576

3.291

85% ± 10%

36

50

86

140

85% ± 5%

140

197

340

554

85% ± 1%

3,452

4,900

8,462

13,811

Yˆ ± ∆Y

99.99% ± 0.01%

66,352

! • given: Yˆ > Ymin , significance level α

find: nM C null hypothesis H0 , Yˆ < Ymin , rejected if all circuits functioning, n+ = nM C assuming H0 holds, the probability of falsely (i.e., Yˆ < Ymin ) rejecting H0 is 

test definition



binominal distribution

P (”rejection”)

=

=

nM C >

Optimization of Analog Circuits



89

P (”n+ = nM C ”)

(301)



Yˆ nM C

log α log Ymin

(falsely)

<

!

nM C Ymin <α

(302)

(303)

number nM C for a minimum yield and significance level: Ymin

8.2.3

α = 5% α = 1%

95%

60

92

99%

300

460

99.9%

3,000

4,600

99.99%

30,000

46,000

Importance sampling



Z Y

Z ···

= Z

δ (xs ) · pdf (xs ) · dxs

−∞ ∞ Z

pdf (xs ) · pdfIS (xs ) · dxs pdfIS (xs ) −∞   pdf (xs ) = E δ (xs ) · = E {δ(xs ) · w(xs )} pdfIS pdfIS (xs ) =

(304)

···

δ (xs ) ·

(305)

(306)

sample created according to a separate, specific distribution pdfIS ∞

Z

Z

E {w(xs )} =

···

pdfIS

−∞

pdf (xs ) · pdfIS (xs ) · dxs = 1 pdfIS (xs )

(307)

goal: reduction of estimator variance   n o V {δ(x )} ! pdfV {δ(xs )w(xs )} s IS = V Eˆ {δ(xs )} V Eˆ {δ(xs )} = < pdfIS pdfIS nM C nIS with

(308)



Z V {δ(xs )w(xs )} =

pdfIS

Z ···

δ (xs ) ·

−∞

pdf (xs ) · pdf (xs ) · dxs − Y 2 pdfIS (xs )

(309)

Eq. (308) is statisfied for nM C = nIS , if: ∀

{xs |δ(xs )=1}

Optimization of Analog Circuits

pdfIS (xs ) > pdf (xs )

90

(310)

8.3

Geometric yield analysis for linearized performance feature (”realistic geometric yield analysis”)

yield in case that As is a half of Rns defined by a hyperplane linear model for one single performance feature, index i left out: f = fL + gT · (xs − xs,W L ) e.g. after solution according to (336) or (337),

(311)

fL = g

f (xs,W L )

= ∇f (xs,W L )

(312)

specification type: f ≥ fL

xs,2

(313)

A0s,L , f ≥ fL 2 β 2 (xs ) = βW L

g

f = fL

xs,0 xs,a ∇β 2 (xs,a ) f = f0

β 2 (xs ) = (xs − xs,0 )T · C−1 · (xs − xs,0 ) = const

xs,1 Figure 46.

Optimization of Analog Circuits

91

8.3.1

Yield partition

Z

Z ···

YL =

Z

 pdff f · df

pdfN (xs ) · dxs =

(314)

f ≥fL

xs ∈A0s,L

linearized performance feature normally distributed (212), (213):     T T  f ∼N , g · C · g f + g · (x − x ) L s,0 s,W L | {z } | {z }

(315)

σ2

f 0 =f (xs,0 )

f

yield written in terms of the pdf of the linearized performance feature: Z



YL = fL

− 21 1 √ ·e 2π · σf



f −f 0 σ f

2

· df (Fig. 47)

pdff

YL

fL

f0 Figure 47.

Optimization of Analog Circuits

92

f

(316)

8.3.2

Defining worst-case distance βW L as difference from nominal performance to specification bound as multiple of standard deviation σf of linearized performance feature (βW L -sigma design)

f 0 − fL = gT · (xs,0 − xs,W L ) ( f 0 − fL =

+βW L · σf , f 0 > fL

”circuit functions”

−βW L · σf , f 0 < fL

”circuit malfunctions”

(317) (318)

variable substitution: f − f 0 df = σf (319) , σf dt ( −βW L , f 0 > fL fL − f 0 (320) tL = = σf +βW L , f 0 < fL   Z −∞ 2 (−t0 )2 1 1 dt 0 0 − t2 − 2 √ ·e √ ·e · dt = − · dt t = −t, 0 = −1 dt 2π 2π ±βW L t=

Z



YL = ∓βW L

8.3.3

Yield partition as a function of worst-case distance βW L

±β ZW L

YL = −∞

2 t0 1 √ · e− 2 · dt0 2π

f 0 > fL : +βW L

(321)

f 0 < fL : −βW L

standard normal distribution, statistical tables, exact within given digits, no estimation

Optimization of Analog Circuits

93

8.3.4

Worst-case distance βW L defines tolerance region

(311)

f (xs,a ) = fL ⇐⇒ gT · (xs,a − xs,W L ) = 0 {z } | .. .

(322)

.. . }| { z f (xs ) = fL + gT · (xs − xs,W L ) − gT · (xs,a − xs,W L ) = fL + gT · (xs − xs,a )

(323)

i.e., xs,W L in (311) can be replaced with any point of level set f = fL because of (322) from visual inspection of Fig. 46: ( ∇β 2 (xs,a ) = 2 · C−1 · (xs,a − xs,0 ) =

−λa · g, f 0 > fL +λa · g, f 0 < fL

⇐⇒ (xs,a − xs,0 ) = ∓

, λa > 0

λa ·C·g 2

(324)

(325)

λa 2 (323) (318) · σf = fL − f 0 = ∓βW L · σf 2 λa βW L = 2 σf

=⇒ gT · (xs,a − xs,0 ) = ∓

(326) (327)

substituting (327) in (325) (xs,a − xs,0 ) = ∓

βW L ·C·g σf

2 (xs,a − xs,0 )T · C−1 · (xs,a − xs,0 ) = βW L·

(328)

1 · gT · C · g σ 22 |f {z }

(329)

1

2 (xs,a − xs,0 )T · C−1 · (xs,a − xs,0 ) = βW L

(330)

2 βW L is level parameter (”radius”) of ellipsoid that touches level hyperplane f = fL

Optimization of Analog Circuits

94

8.3.5

specification type f ≤ fU

(318) becomes ( fU − f 0 =

+βW U · σf , f 0 < fU

”circuit functions”

−βW U · σf , f 0 > fU

”circuit malfunctions”

(331)

(321) becomes

±β ZW U

YU = −∞

2 t0 1 √ · e− 2 · dt0 2π

f 0 < fU : +βW U

(332)

f 0 > fU : −βW U

(323) becomes f (xs ) = fU + gT · (xs − xs,a )

Optimization of Analog Circuits

95

(333)

8.4

8.4.1

Geometric yield analysis for nonlinear performance feature (”general geometric yield analysis”) Problem formulation xs,2

β 2 (xs ) = (xs − xs,0 )T · C−1 · (xs − xs,0 ) = const

xs,0 ∇f

f

  xs,W L

(W L)

= fL f = fL

xs,W L A0s,L As,L xs,1

Figure 48. lower bound, nominally fulfilled / upper bound, nominally violated f (xs,0 ) > fL/U : max pdfN (xs ) s.t. f (xs ) ≤ fL/U xs

(334)

lower bound, nominally violated / upper bound, nominally fulfilled f (xs,0 ) < fL/U : max pdfN (xs ) s.t. f (xs ) ≥ fL/U xs

(335)

statistical parameter vector with highest probability density ”on the other side” of the acceptance region border lower bound, nominally fulfilled / upper bound, nominally violated f (xs,0 ) > fL/U : min β 2 (xs ) s.t. f (xs ) ≤ fL/U xs

(336)

lower bound, nominally violated / upper bound, nominally fulfilled f (xs,0 ) < fL/U : min β 2 (xs ) s.t. f (xs ) ≥ fL/U xs

(337)

statistical parameter vector with smallest distance (weighted according to pdf) to the acceptance region border specific form of a nonlinear programming problem, quadratic objective function, one nonlinear inequality constraint, iterative solution with SQP: worse-case parameter set

xs,W L/U

worst-case distance

βW L/U = β xs,W L/U  ∇f xs,W L/U

performance gradient

Optimization of Analog Circuits

96



8.4.2

Advantages of geometric yield analysis R R • YL/U = · · · As,L/U pdf (xs ) · dxs ; the larger the pdf value, the larger the error in Y if border of As is approximated inaccurately; A0s is exact at point xs,W L/U with highest pdf value, A0s differs from As the more, the smaller the pdf value

• YL/U ↑ −→ accuracy of Y L/U ↑ • systematic error, depends on the curvature of performance f in xs,W L/U • duality principle in minimum-norm problems: minimum distance between point and convex set equal to maximum distance between point to any separating hyperplane case 1: Y (A0s ) greatest lower bound of Y (As ) concerning any tangent hyperplane

11111111 00000000 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 x s,0

A0s

As

Figure 49. case 2: Y (A0s ) least upper bound of Y (As ) concerning any tangent hyperplane

xs,0

As A0s Figure 50. • in practice, error |Y L/U − YL/U | ≈ 1% . . . 2%

Optimization of Analog Circuits

97

8.4.3

Lagrange function and first-order optimality conditions of problem (336)

L (xs , λ) = β 2 (xs ) − λ · (fL − f (xs ))

(338)

∇L (xs ) = 0 : 2 · C−1 · (xs,W L − xs,0 ) + λW L · ∇f (xs,W L ) = 0

(339)

λW L · (fL − f (xs,W L )) = 0

(340)

T −1 2 · (xs,W L − xs,0 ) βW L = (xs,W L − xs,0 ) · C

(341)

assumption: λW L = 0; then fL ≥ f (xs,W L ) (i.e., constraint inactive); from (339): xs,W L = xs,0 and f (xs,W L ) = f (xs,0 ) > fL (from (336)), which contradicts assumption, therefore: λW L > 0 (342) f (xs,W L ) = fL

(343)

from (339): xs,W L − xs,0 = −

λW L · C · ∇f (xs,W L ) 2

(344)

substituting (344) in (341): 

λW L 2

2

2 · ∇f (xs,W L )T · C · ∇f (xs,W L ) = βW L

βW L λW L =q 2 ∇f (xs,W L )T · C · ∇f (xs,W L )

(345)

(346)

substituting (346) in (344): −βW L xs,W L − xs,0 = q · C · ∇f (xs,W L ) T ∇f (xs,W L ) · C · ∇f (xs,W L )

(347)

(347) corresponds to (268), worst-case analysis

8.4.4

Lagrange function of problem (337)

L (xs , λ) = β 2 (xs ) − λ · (f (xs ) − fL )

(348)

+βW L · C · ∇f (xs,W L ) xs,W L − xs,0 = q T ∇f (xs,W L ) · C · ∇f (xs,W L )

(349)

(347) becomes

Optimization of Analog Circuits

98

8.4.5

Second-order optimality condition of problem (336)

∇2 L (xs,W L ) = 2 · C−1 + λW L · ∇2 f (xs,W L )

(350)

curvature of β 2 stronger than curvature of f

8.4.6

Worst-case distance

linearization of f in xs,W L/U : f f

(W L/U )

(W L/U )

(xs ) = fL/U + ∇f xs,W L/U

T

· xs − xs,W L/U

(xs,0 ) − fL/U = ∇f xs,W L/U

T

· xs,0 − xs,W L/U



(351) 

(352)

substituting (347) in (352): lower bound, nominally fulfilled / upper bound, nominally violated ”performance safety margin”

βW L/U

z }| { T  (W L/U ) ∇f xs,W L/U · xs,0 − xs,W L/U f (xs,0 ) − fL/U =q T = σ (W L/U ) ∇f xs,W L/U · C · ∇f xs,W L/U | f {z } ”performance variability”

(353) lower bound, nominally violated / upper bound, nominally fulfilled ”performance safety margin”

βW L/U

}| { z T  (W L/U ) ∇f xs,W L/U · xs,W L/U − xs,0 fL/U − f (xs,0 ) =q T = σ (W L/U ) ∇f xs,W L/U · C · ∇f xs,W L/U | f {z } ”performance variability”

(354)

Optimization of Analog Circuits

99

8.4.7

Remarks

• worst-case distance: yield approximation for each individual performance bound • “realistic” geometric yield analysis by linearization in nominal parameter vector xs,0 • geometric yield analysis and general worst-case analysis are related by an exchange of objective function and constraint

geometric yield analysis: βW L/U,i / YL/U,i

general worst-case analysis: ,

min β 2 (xs ) s.t.

βW / YW min/maxfi (xs )

xs

xs

fi (xs ) ”on the other side” fL/U,i

Optimization of Analog Circuits

,

100

fW L/U,i

2 s.t.β 2 (xs ) ≤ βW

8.4.8

Overall yield

upper bound: Y ≤ min Y L/U,i i P  lower bound: Y ≥ 1 − 1 − Y L/U,i i

Monte-Carlo analysis with A0s = ∩A0s,L/U,i (Fig. 25), no circuit simulation required i

xs,2

As

A0s xs,1 Figure 51.

Optimization of Analog Circuits

101

8.4.9

Consideration of range parameters

parameter acceptance region: As,L,i

  = xs | ∀ fi (xs , xr ) ≥ fL,i

(355)

As,U,i

  = xs | ∀ fi (xs , xr ) ≤ fU,i

(356)

As,L,i

  = xs | ∃ fi (xs , xr ) < fL,i

(357)

As,U,i

  = xs | ∃ fi (xs , xr ) > fU,i

(358)

xr ∈Tr

xr ∈Tr

xr ∈Tr

xr ∈Tr

As,L,i ∩ As,L,i = φ and As,L,i ∪ As,L,i = Rns

(359)

geometric yield analysis, problem formulation: f (xs,0 ) ∈ As,L,i : max pdfN (xs ) s.t. xs ∈ As,L,i

(360)

f (xs,0 ) ∈ As,L,i : max pdfN (xs ) s.t. xs ∈ As,L,i

(361)

xs ∈ As,L,i

⇐⇒

min fi (xs , xr ) ≥ fL,i ”smallest value still inside”

(362)

xs ∈ As,U,i

⇐⇒

max fi (xs , xr ) ≤ fU,i ”largest value still inside”

(363)

xs ∈ As,L,i

⇐⇒

min fi (xs , xr ) < fL,i ”smallest value already outside”

(364)

xs ∈ As,U,i

⇐⇒

max fi (xs , xr ) > fU,i ”largest value already outside”

(365)

xs

xs

from (355) – (358): xr ∈Tr xr ∈Tr

xr ∈Tr xr ∈Tr

(360), (361) with (355) – (358): xs,0 ∈ As,L

:

min β 2 (xs ) s.t.

xs,0 ∈ As,L

:

min β 2 (xs ) s.t.

xs,0 ∈ As,U

:

min β 2 (xs ) s.t.

xs,0 ∈ As,U

:

min β 2 (xs ) s.t.

xs

xs xs

xs

Optimization of Analog Circuits

min f (xs , xr ) ≤ fL ”nominal inside”

xr ∈Tr

(366)

min f (xs , xr ) ≥ fL ”nominal outside” (367)

xr ∈Tr

max f (xs , xr ) ≥ fU ”nominal inside”

xr ∈Tr

(368)

max f (xs , xr ) ≤ fU ”nominal outside” (369)

xr ∈Tr

102

9

9.1

Yield optimization/design centering/nominal design Optimization objectives

Yield Y (xd ) Worst-case distances (W L/U,i)

(xs,0 ) − fL/U,i | βW L/U,i = q T  ∇fi xs,W L/U,i · C · ∇fi xs,W L/U,i T  |∇fi xs,W L/U,i · xs,0 − xs,W L/U,i + ∇fi (xd,0 )T · (xd − xd,0 ) | = σf (W L/U,i) |f

i = 1, . . . , nP SF

(370)

(371) (372)

Performance features fi , i = 1, . . . , nf Worst-case performance features fW L/U,i , i = 1, . . . , nf Performance distances ”performance safety margin” |fi (xd,0 ) − fL/U,i | αW L/U,i = q ∇fi (xd,0 )T · A2xd · ∇fi (xd,0 ) ”sensitivity”

(373)

with

A xd

f i (xd ) = fi (xd,0 ) + ∇fi (xd,0 )T · (xd − xd,0 )   axd,1 0     scaling of individual .   .. =  design parameters 0 axd,nxd

Optimization of Analog Circuits

103

(374)

(375)

9.2

Derivatives of optimization objectives

Statistical yield derivatives first derivative: ∇Y (xs,0 ) = Y · C−1 · (xs,0,δ − xs,0 )

xs,0,δ = E {xs } , pdfδ (xs ) = pdfδ

(376)

1 · δ (xs ) · pdf (xs ) Y

(377)

”center of gravity”, truncated normal distribution second derivative: h i ∇2 Y (xs,0 ) = Y · C−1 · Cδ + (xs,0,δ − xs,0 ) · (xs,0,δ − xs,0 )T − C · C−1 Cδ = E {(xs − xs,0,δ ) · (xs − xs,0,δ )}

(378) (379)

pdfδ

difficult: ∇Y (xd ) Worst-case distance derivatives first derivatives: ∇βW L/U,i (xs,0 ) =

±1 σf (W L/U,i)

  (κ) = ∇βW L/U,i xd

· ∇fi xs,W L/U,i

±1 σf (W L/U,i)



  (κ) · ∇fi xd

(380)

(381)

Performance distance derivative first derivative: ±1 ∇αW L/U,i (xd,0 ) = q ∇f (xd,0 ) T 2 ∇f (xd,0 ) Axd ∇f (xd,0 )

Optimization of Analog Circuits

104

(382)

9.3

Problem formulations of analog optimization

Design centering/yield optimization max Y (xd ) s.t. c (xd ) ≥ 0 xd

(383)

geometric: maxxd ±βW L/U,i (xd ) , i = 1, . . . , nP SF s.t. c (xd ) ≥ 0 “nominal inside” :+ “nominal outside” : −

(384)

(multicriteria optimization problem,MCO) Nominal design min ±fi (xd ) , i = 1, · · · , nf s.t. c (xd ) ≥ 0 xd

maxxd ±αW L/U,i (xd ) , i = 1, · · · , nP SF s.t. c (xd ) ≥ 0 “nominal inside” :+ “nominal outside” : −

(385)

(386)

(multicriteria optimization problem,MCO) Scalarization of MCO min oi (xd ), i = 1, . . . , no

weighted sum: tl1 (x) =

→ no X

min t(xd )

(387)

wi · oi (x)

(388)

wi · (oi (x) − oi,target )2

(389)

i=1

weighted least squares: tl2 (x) =

X i

weighted min/max: tl∞ (x) = max wi · oi i X exponential function: texp (x) = e−wi ·oi (x)

(390) (391)

i

with wi > 0 ,

i = 1, . . . , no ,

Pno

i=1

wi = 1

Inscribing largest ellipsoid in linearized acceptance region by linear programming in iteration step κ: (κ)

(κ)

(κ)

T max tβ s.t. βw,i + ∇βw,i (xd ) · (xd − xd ) ≥ tβ , i = 1, . . . , nP SF tβ ,xd h i (κ) c (xd ) ≥ 0

9.4

Analysis, synthesis

Optimization of Analog Circuits

105

(392)

design/sizing

synthesis

analysis optimization objective deterministic/stochastic

numerical optimization (deterministic/stochastic)

numerical simulation

9.5

Sizing analog optimization, circuit sizing .

&

nominal design

tolerance design

•) without statistical distribution

•) with statistical parameter

•) without parameter ranges

distribution

•) statistical and range •) with tolerance regions of

parameters at some fixed point

9.6

range parameters

Nominal design, tolerance design nominal design analysis

synthesis

sensitivity analysis (objective: gradient)

nominal optimization

simulation (objective: performance values) tolerance design analysis

synthesis

worst-case analysis

worst-case optimization

(objective: worst-case performance values and gradients) yield analysis

yield optimization

(objective: yield or worst-case distances and gradients)

Optimization of Analog Circuits

106

9.7

Optimization without/with constraints Optimization without constraints .

&

search direction

line search

– Newton direction

1. bracketing: start interval

– Quasi-Newton direction

2. sectioning: interval refinement

– coordinate direction

– Newton step

– gradient

– Golden section

– least squares: Levenberg–Marquardt

– backtracking

Gauss–Newton • Polytope (Nelder-Mead) method • Trust region Optimization with constraints ↓ nonlinear ↓ sequence of Quadratic programming (QP) with inequality constraints ↓ active set QP, equality constraints – Newton direction – step–length limitation: new constraint – active set update

Optimization of Analog Circuits

107

Optimization of Analog Circuits

108

10

Sizing rules for analog circuit optimization

10.1

Single (NMOS) transistor iDS D

triode region

saturation region

iDS vDS

G

vGS

vGS S vDS Figure 52. drain-source current (simple model)     0, iDS =

µ · Cox ·

  

1 2

W L

· µ · Cox ·

· vGS − Vth − W L

vDS < 0 vDS 2 2



· vDS · (1 + λ · vDS ) , 0 ≤ vDS ≤ vGS − Vth

· (vGS − Vth ) · (1 + λ · vDS ) ,

vGS − Vth ≤ vDS (393)

W, L [m] : transistor width and length h 2i µn Vm·s : electron mobility in silicon  F  Cox cm2 : capacitance per area due to oxide Vth [V ]

: threshold voltage

λ[ V1 ]

: channel length modulation factor

Optimization of Analog Circuits

109

(394)

saturation region, no channel length modulation: iDS = K ·

W · (vGS − Vth )2 L

with K =

1 · µn · Cox 2

(395)

derivatives: ∇iDS (K)

=

iDS /K

∇iDS (W ) =

iDS /W

∇iDS (L)

−iDS /L

=

(396)

∇iDS (Vth ) = − vGS 2−Vth · iDS variances:

 σ 2

AK K W ·L AVth (σVth )2 = W ·L K

(397)

=

(398)

area law: Lakshmikumar et al. IEEE Journal of solid-state circuits SC-21, Dec.1986 iDS variance, assumptions: K, W, L, Vth variations statistically independent, saturation region, no channel length modulation, linear transformation of variances X σi2DS ≈ [∇iDS (x)]2 · σx2 (399) x∈{K,W,L,Vth }

2 σi2DS Ak σW σL2 4 AVth ≈ + + + 2 · 2 2 2 iDS W ·L W L (vGS − Vth ) W · L

(400)

W ↑, L ↑, W · L ↑ → σiDS ↓ larger transistor geometries reduce iDS variance due to manufacturing variations in K, W, L, Vth 10.1.1

Sizing rules for single transistor that acts as a voltage-controlled current source (VCCS)

(1) vDS ≥ 0

type

origin

electrical (DC)

transistor

function

in saturation

geometry

manufacturing

robustness

variations affect iDS

(2) vGS − Vth ≥ 0 ? (3) vDS − (vGS − Vth ) ≥ Vsatmin ? (4) W ≥ Wmin

(5) L ≥ L?min (6) W · L ≥ A?min ? technology–specific value Optimization of Analog Circuits

110

10.2

Transistor pair: current mirror (NMOS)

i1

i2

T1

T2 vGS

Figure 53. function: I2 = x · I1 assumptions: K1 = K2 ; λ1 = λ2 = 0; Vth1 = Vth2 ; L1 = L2 ; T1 , T2 in saturation I2 W2 (vGS − Vth2 )2 1 + λ2 · vDS2 x= = · · I1 W1 (vGS − Vth1 )2 1 + λ1 · vDS1 10.2.1

Sizing rules for current mirror

in addition to sec. 10.1.1, (1) – (6):

(7)

L1 = L2

(8)

x=

(9)

? |vDS1 − vDS2 | ≤ ∆VDSmax

W2 W1

? (10) vGS − Vth1,2 ≥ VGSmin

type

origin

geometric

current

function

ratio

electrical

influence of channel

function

length modulation

electrical

influence of local variations

robustness

in threshold voltages

? technology-specific value

Optimization of Analog Circuits

111

(401)

Optimization of Analog Circuits

112

Appendix A A.1

Matrix and vector notations

Vector     b=  

b1 b2 .. . bm

       =  

..  .  m bi   , bi ∈ R, b ∈ R , b .. .

(402)

b: column vector bT = [· · · bi · · · ] transposed vector

(403)

bT : row vector

A.2

Matrix 

a a12 · · · a1n  11  a21 a22 · · · a2n  A= . .. .. ..  .. . . .  am1 am2 · · · amn

 

  ···     ..  =  . aij   ···

..  m×n , A (404) .   , aij ∈ R, A ∈ R

i: row index, j: column index A = [a1 a2 · · · an ]   T a  1   aT   2  A= .   ..    aTm transposed matrix: AT  a a · · · am1  11 21  a12 a22 · · · am2  T A = . .. .. ...  .. . .  a1n a2n · · · amn

Optimization of Analog Circuits





aT1



      =    

aT2 .. .

    = [a1 a2 · · · am ]  

113

aTn

(405)

(406)

(407)

A.3

A.4

Addition A + B = C

(408)

cij = aij + bij

(409)

A · B = C

(410)

Multiplication

n X

cij =

aik · bkj

(411)

k=1

j

1

p

1 B n 1 1 i m

p

1

n 1

C

m A

cij =

Optimization of Analog Circuits

aTi |{z}

·

bj |{z}

i-th row

j-th column

in A (410)

in B (410)

114

(412)

A.5

Special cases a = A

(413)

aT = A<1×m>

(414)

a = A<1×1> = a<1> = aT<1>

(415)

aT · b = c scalar product

(416)

a · bT = C dyadic product

(417)

A · b = c

(418)

bT · A = cT

(419)

bT · A · c = d   1 0   ..  identity matrix I =  .   0 1   a1 0     a 2   diagonal matrix diag (a ) =   ...     0 am



Optimization of Analog Circuits

115

(420)

(421)

(422)

A.6

Determinant of a quadratic matrix

det (A ) = |A| = =

m X j=1 m X

aij · αij , i ∈ 1, . . . , m aij · αij , j ∈ 1, . . . , m

(423)

i=1

adjugate matrix adj (A) 

T

···

 .  .. αij ...  adj (A) =    ···

(424)

minor determinant (row i, column j deleted):

αij = (−1)i+j ·

A.7

aij

Inverse of a quadratic non-singular matrix −1 A−1 =I · A = A · A

A−1 =

Optimization of Analog Circuits

1 · adj (A) det (A)

116

(425) (426)

A.8

Some properties (A · B) · C = A · (B · C)

(427)

A · (B + C) = A · B + A · C

(428)

(A · B)T = BT · AT T AT = A

(429) (430)

(A + B)T = AT + BT

(431)

⇐⇒

A symmetric A positive (semi)definite

⇐⇒

A negative (semi)definite

⇐⇒

A = AT

(432)

∀ xT · A · x > (≥) 0

(433)

∀ xT · A · x < (≤) 0

(434)

x6=0

x6=0

IT = I

(435)

A·I=I·A=A

(436)

A · adj (A) = det (A) · I = adj (A) · A  det AT = det (A)

(437) (438)

|a1 · · · b · aj · · · am | = b · det (A)  det b · AT = bm · det (A)

(440) (1)

(2)

(1)

(439)

(2)

|a1 · · · aj · · · am | + |a1 · · · aj · · · am | = |a1 · · · aj + aj · · · am | (441) |a2 a1 a3 · · · am | = −det (A)

(442)

aj = ak (A rank deficient) : det (A) = 0

(443)

|a1 · · · aj + b · ak · · · am | = det (A)

(444)

det (A · B) = det (A) · det (B)

(445)

−1

(A · B) = B−1 · A−1 −1 T AT = A−1 −1 A−1 =A  1 det A−1 = det (A)

(446)

A−1 = AT ⇐⇒ A orthogonal

(450)

Optimization of Analog Circuits

117

(447) (448) (449)

Optimization of Analog Circuits

118

Appendix B

Abbreviated notations of derivatives using the nabla symbol

Gradient vector: vector of first partial derivatives   ∂f ∂xs,1  ∂f    ∂f  ∂xs,2  = ∇f (xs ) =   ..    ∂xs x  .    xd  ∂f   ∂xs,nxs x= xs    

(x?s )

∇f

∂f  = ∂xs  xd     

xr

(451)



(452)



  ?  xs  

xr

Hessian Matrix: matrix of second partial derivatives ∂2f ∂x2s,1 ∂2f ∂xs,2 ·∂xs,1

 ∇2 f (xs )

   =   

∂2f ∂xs,1 ·∂xs,2 ∂2f ∂x2s,2

.. .

··· ··· .. .

.. .

∂2f ∂xs,nxs ·∂xs,1

∂2f ∂xs,nxs ·∂xs,2

···

∂2f ∂xs,1 ·∂xs,nxs ∂2f ∂xs,2 ·∂xs,nxs

.. .

∂2f ∂x2s,nxs

      

    x=  

xd



   xs  

xr

2

=

∂ f ∂xs · ∂xT

(453)

s x

2

∇f

(x?s )

∂ 2 f  = ∂xs · ∂xTs  xd     

 ∇2 f (x?d , x?s )<(nxd +nxs )×(nxd +nxs )> = 

" =

Optimization of Analog Circuits

∇2 f (xd ) ∂2f ∂xs ·∂xT d

119

(454)



  ?  xs  

xr

∂2f ∂xd ·∂xT d ∂2f ∂xs ·∂xT d

∂2f ∂xd ·∂xT s ∇2 f (xs )

∂2f ∂xd ·∂xT s ∂2f ∂xs ·∂xT s

#

 

    x=  

x?d



(455)

  ?  xs  

xr

(456) x

Jacobian matrix  T ∇f x?s



=

∂f  ∂xTs  xd     

    =   

(457)



  ?  xs  

xr

∂f1 ∂xs,1 ∂f2 ∂xs,1

.. .

∂f1 ∂xs,2 ∂f2 ∂xs,2

.. .

∂fnf

∂fnf

∂xs,1

∂xs,2

··· ··· .. . ···

∂f1 ∂xs,nxs ∂f2 ∂xs,nxs

.. .

∂fnf ∂xs,nxs

        x  d  ?  x s  

Optimization of Analog Circuits

120

xr

(458)       

Appendix C

Norms

Quadratic model 1 (x − x0 )T · H · (x − x0 ) 2 nx X nx nx X 1X hij · (xi − x0,i ) · (xj − x0,j ) = f0 + gi · (xi − x0,i ) + 2 i=1 j=1 i=1

f (x) = f (x0 ) + gT · (x − x0 ) +

(459) (460)

H: symmetric nx = 2 : f (x1 , x2 ) = f0 + g1 · (x1 − x0,1 ) + g2 · (x2 − x0,2 ) (461) 1 + h11 · (x1 − x0,1 )2 2 1 + h12 · (x1 − x0,1 ) · (x2 − x0,2 ) + h22 · (x2 − x0,2 )2 2 Vector norm l1 -norm

kxk1

=

nx X

|xi |

(462)

i=1

l2 -norm

kxk2

v u nx √ uX x2i = xT · x =t

(463)

i=1

l∞ -norm kxk∞ = max |xi |

(464)

i

lp -norm

kxkp

=

nx X

! p1 |xi |p

(465)

i=1

(466) Matrix norm max norm kAkM = nx · max |aij | i,j X row norm kAkZ = max |aij | i

column norm kAkS Euclidean norm kAkE

j

Optimization of Analog Circuits

kAkλ

121

=

X

|aij |

sX X p

(469)

i

i

spectral norm

(468)

j

= max =

(467)

|aij |2

(470)

j

λmax (AT · A)

(471)

Optimization of Analog Circuits

122

Appendix D

D.1

Pseudo-inverse, singular value decomposition (SVD)

Moore-Penrose conditions

For every matrix A , there always exists a unique matrix A+ that satisfies the following conditions. A · A+ · A A+ · A · A+ T A · A+ T A+ · A

= A, A · A+ maps each column of A to itself = A+ , A+ · A maps each column of A+ to itself

(472) (473)

= A · A+ ,

A · A+ is symmetric

(474)

= A+ · A,

A+ · A is symmetric

(475)

Optimization of Analog Circuits

123

D.2

Singular value decomposition

Every matrix A with rank(A) = r ≤ min(m, n) has a singular value decomposition ˆ · UT A = V · A

(476)

V: matrix of the m left singular vectors (eigenvectors of A · AT ) U: matrix of the n right singular vectors (eigenvectors of AT · A) U, V orthogonal: U−1 = UT , V−1 = VT " # D 0 ˆ = A 0 0

, r = rank (A)

(477)



D = diag( d1 , · · · , dr ) ↑ singular values

(478)

T ˆ+ A+ = U · A · V " # −1 D 0 ˆ+ = A 0 0   1 1 −1 D = diag ,..., d1 dr

(479)

singular values: positive roots of eigenvalues of AT · A columns of V :

basis for Rm

columns 1, . . . , r of V :

basis for column space of A

columns (r + 1), . . . , m of V :

basis for kernel/null-space of AT

columns of U :

basis for Rn

columns 1, . . . , r of U:

basis for row space of A

columns (r + 1), . . . , n of U :

basis for kernel/null-space of A

+

ˆ ·A ˆ = A ˆ+ ·A ˆ = A

A · A+ =

A+ · A =

Optimization of Analog Circuits

" # I 0 0 0 " # I 0 0 0 " # I 0 0 0 " # I 0 0

124

0



(480) (481)

Appendix E

Linear equation system, rectangular system matrix with full rank A · x = c

(482)

rank(A) = min(m, n)

E.1

m < n, rank(A) = m, underdetermined system of equations A · AT A·x=c



is invertible

T 0 A·A | {z· w} = c x



w0 = (A · AT )−1 · c

x = AT · (A · AT )−1 ·c minimum length solution | {z } + A Solving (482) by SVD A = V · [D 0] · UT T V · [D 0] · U | {z· x} = c w00 # " 00 w a [D 0] · w00 = VT · c w00 = w00b element-wise division

D · w00a = VT · c

" x=U·



w00a

→ #

w00a 0

Solving (482) by QR-decomposition "  ⊥

AT = Q Q · | {z } orthogonal, i.e., QT · Q⊥ = 0 

R 0

# (483)

A+ = AT · (A · AT )−1 = Q · R−T T

T

R ·Q ·x=c | {z } w000

Optimization of Analog Circuits

forward substitution



125

w000



x = Q · w000

E.2

m > n, rank(A) = n, overdetermined system of equations

AT · A is invertible A·x=c



AT · A · x = AT · c

x = (AT · A)−1 · AT c {z } | + A

least squares solution

Solving (482) by SVD A=V·

" # D

· UT

0 " # D V· · UT{z· x} = c 0 |w 00 " # " # D v a · w00 = VT · c = 0 vb element-wise division

D · w00 = va → x = U · w00



w00

Solving (482) by QR-decomposition " 

A = Q Q {z |





·

R

}

orthogonal

0

# =Q·R

A+ = (AT · A)−1 · AT = R−1 QT T

Q·R·x=c⇔R·x=Q ·c

Optimization of Analog Circuits

126

backward substitution



x

E.3

m = n, rank(A) = m = n, determined system of equations

A is invertible A+ = A−1 Solving (482) by SVD A = V · D · UT T V·D·U | {z· x} = c w00 element-wise division D · w00 = VT · c → 00 x=U·w

w00

Solving (482) by QR-decomposition A=Q·R A = R−1 · QT +

T

Q·R·x=c ⇔ R·x=Q ·c

backward substitution



x

Solving (482) by LU-decomposition L : lower left triangular matrix

A=L·U

U : upper right triangular matrix L·U | {z· x} = c w0000

L·w

0000

=c

U·x=w

0000

forward substitution



backward substitution



w0000 x

A orthogonal: AT · A = A · AT = I A+ = A−1 = AT A diagonal:

A+ ii =

1 Aii

(rank deficient, i.e. ∃i Aii = 0, then A+ ii = 0 )

Optimization of Analog Circuits

127

Optimization of Analog Circuits

128

Appendix F

Partial derivatives of linear, quadratic terms in matrix/vector notation ∂ ∂x

aT · x

∂ ∂xT T

a

T

a

x ·a

a

aT

xT · A · x

A · x + AT · x

xT · A T + xT · A

A = AT (symmetric):

A = AT :

2·A·x

2 · x T · AT

2·x

2 · xT

xT · x A · x = a1 · x1 + a2 · x2 + . . . + an · xn   aT1 · x    aT · x    2 =  ..   .   aTm · x



a1





aT1



  a2   .  ..  am

     

  aT  2 A= .  ..  aTm

     



xT · AT = aT1 · x1 + aT2 · x2 + . . . + aTn · xn   = aT1 · x aT2 · x · · · aTm · x

Optimization of Analog Circuits

AT = [a1 a2 · · · am ]

129

  T T a1 a2 · · · aTm

 .  ..   N    ∂    ..  .  = " #   ∂ 

.. . N  ∂   ∂ 

  

.. .



 . ..  N ∂  .. .





       

(484)



   

∂[

.. .





]

 = 





N

∂[

.. .

]

  





 N ∂ [· · · N · · · ]  ∂  = · · · " # " #   ∂ ∂

∂ [· · · ∂[

N

   ···  



 · · · ] = ··· ] ∂[

Optimization of Analog Circuits

(485)

130



(486)





N

]

···

(487)

Appendix G

Probability space

event B: e.g., performance value f is less than or equal fU , i.e., B = {f |f ≤ fU } R fU probability P : e.g., P (B) = P ({f | f ≤ fU }) = cdff (fU ) = −∞ pdff (f ) · df event B ⊂ Ω: subset of sample set of possible outcomes of probabilistic experiment Ω elementary event: event with one element (if event sets are countable) event set B: system of subsets of sample set, B ∈ B definitions: Ω∈B

(488)

¯∈B B∈B ⇒ B

(489)

Bi ∈ B, i ∈ N ⇒ ∪i Bi ∈ B

(490)

P (B) ≥ 0 for all B ∈ B

(491)

P (Ω) = 1 ”Ω: certain event”

(492)

Bi ∩ Bj = {}, Bi,j ∈ B ⇒ P (Bi ∪ Bj ) = P (Bi ) + P (Bj )

(493)

P ({}) = 0 ”{}: impossible event”

(494)

Bi ⊆ Bj ⇒ P (Bi ) ≤ P (Bj )  ¯ = 1 − P (B) P B

(495)

event set B: ”σ-algebra” Kolmogorov axioms:

corollaries:

Z

0 ≤ P (B) ≤ 1 Z +∞ +∞ ··· pdf (t) · dt = 1

−∞

Optimization of Analog Circuits

−∞

131

(496) (497) (498)

Optimization of Analog Circuits

132

Appendix H H.1

Convexity

Convex set K ∈ Rn





p0 ,p1 ∈K,a∈[0,1]

pa = (1 − a) · p0 + a · p1 ∈ K

(a)

(b)

K

p1 pa

p0

(499)

p1

p0

Figure 54. (a) convex set, (b) non-convex set

H.2

Convex function

∇2 f is positive definite f (pa ) ≤ (1 − a) · f (p0 ) + a · f (p1 )

∀ p0 , p1 ∈ K, a ∈ [0, 1] pa = (1 − a) · p0 + a · p1

f

p0

pa

p1

p

Figure 55. convex function c (p) is a concave function ⇐⇒ {p | c (p) ≥ c0 } is a convex function convex optimization problem: local minima are global minima strict convexity: unique global solution Optimization of Analog Circuits

133

(500)

Optimization of Analog Circuits

134