Intensive Course in Econometrics Slides

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig • See section 10.4 in the Appendix for detailed data descriptions. Data ...

0 downloads 31 Views 1MB Size
Intensive Course in Econometrics Slides Rolf Tschernig & Harry Haupt University of Regensburg

University of Bielefeld

March 20091

1

We are greatly indebted to Kathrin Kagerer, Joachim Schnurbus, and Roland Weigand who helped us enormously to improve and correct this course material. Of course, the usual disclaimer applies. c These slides may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes.

i

Contents

1 Introduction: What is Econometrics? 1.1 A Trade Example: What Determines Trade Flows? 1.2 Economic Models and the Need for Econometrics 1.3 Causality and Experiments . . . . . . . . . . . . 1.4 Types of Economic Data . . . . . . . . . . . . .

. . . .

. . . .

. . . .

4 4 13 21 24

2 The Simple Regression Model 29 2.1 The Population Regression Model . . . . . . . . . . . 30 ii

2.2 2.3 2.4 2.5 2.6 2.7 2.8

The Sample Regression Model . . . . . . . . . . . The OLS Estimator . . . . . . . . . . . . . . . . . Best Linear Prediction, Correlation, and Causality . Algebraic Properties of the OLS Estimator . . . . . Parameter Interpretation and Functional Form . . . Statistical Properties: Expected Value and Variance Estimation of the Error Variance . . . . . . . . . .

. . . . . . .

. . . . . . .

44 47 63 69 73 84 91

3 Multiple Regression Analysis: Estimation 94 3.1 Motivation: The Trade Example Continued . . . . . . . 94 3.2 The Multiple Regression Model of the Population . . . 98 3.3 The OLS Estimator: Derivation and Algebraic Properties 111 3.4 The OLS Estimator: Statistical Properties . . . . . . . 123 3.5 Model Specification I: Model Selection Criteria . . . . . 153

iii

4 Multiple Regression Analysis: Hypothesis Testing 4.1 Basics of Statistical Tests . . . . . . . . . . . . . . 4.2 Probability Distribution of the OLS Estimator . . . 4.3 The t Test in the Multiple Regression Model . . . . 4.4 Empirical Analysis of a Simplified Gravity Equation . 4.5 Confidence Intervals . . . . . . . . . . . . . . . . 4.6 Testing a Single Linear Combination of Parameters 4.7 The F Test . . . . . . . . . . . . . . . . . . . . . 4.8 Reporting Regression Results . . . . . . . . . . . .

. . . . . . . .

163 . 163 . 193 . 200 . 208 . 217 . 227 . 233 . 259

5 Multiple Regression Analysis: Asymptotics 262 5.1 Large Sample Distribution of the Mean Estimator . . . 262 5.2 Large Sample Inference for the OLS Estimator . . . . . 277

iv

6 Multiple Regression Analysis: Interpretation 6.1 Level and Log Models . . . . . . . . . . . . . . 6.2 Data Scaling . . . . . . . . . . . . . . . . . . . 6.3 Dealing with Nonlinear or Transformed Regressors 6.4 Regressors with Qualitative Data . . . . . . . . .

. . . .

. . . .

282 . 282 . 283 . 290 . 301

7 Multiple Regression Analysis: Prediction 317 7.1 Prediction and Prediction Error . . . . . . . . . . . . . 317 7.2 Statistical Properties of Linear Predictions . . . . . . . 324 8 Multiple Regression Analysis: Heteroskedasticity 8.1 Consequences of Heteroskedasticity for OLS . . . . 8.2 Heteroskedasticity-Robust Inference after OLS . . . 8.3 The General Least Squares (GLS) Estimator . . . . 8.4 Feasible Generalized Least Squares (FGLS) . . . . . v

. . . .

325 . 328 . 330 . 333 . 341

9 Multiple Regression Analysis: Model 9.1 The RESET Test . . . . . . . . . . 9.2 Heteroskedasticity Tests . . . . . . 9.3 Model Specification II: Useful Tests

Diagnostics 363 . . . . . . . . . . 363 . . . . . . . . . . 366 . . . . . . . . . . 386

10 Appendix 10.1 A Condensed Introduction to Probability 10.2 Important Rules of Matrix Algebra . . . 10.3 Rules for Matrix Differentiation . . . . . 10.4 Data for Estimating Gravity Equations .

vi

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

I I XXI XXVIII XXX

Organisation

Contact Prof. Dr. Rolf Tschernig Building RW(L), 5th floor, room 514 Universit¨atsstr. 31, 93040 Regensburg, Germany Tel. (+49) 941/943 2737, Fax (+49) 941/943 4917 Email: [email protected] http://www.wiwi.uni-regensburg.de/tschernig/ 1

Intensive Course in Econometrics — Organisation — UR March 2009 — R. Tschernig

Schedule and Location Date of Course: February 16 to February 27, 2009 Osteuropa-Institut and University of Regensburg Lectures: every morning 8.30 - 10.00

W 113 Tschernig

every morning 10.30 - 12.00 W 113 Tschernig Exercises: every afternoon 14.00 - 17.00 W 113 or PC-room Kagerer, Schnurbus, Weigand

Exam no exam in this course 2

Intensive Course in Econometrics — Organisation — UR March 2009 — R. Tschernig

Required Text Wooldridge, J.M. (2009). Introductory Econometrics. A Modern Approach, 4th ed., Thomson South-Western. Additional Reading Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed., Pearson, Addison-Wesley. plus what will be announced during the course.

3

1 Introduction: What is Econometrics?

1.1 A Trade Example: What Determines Trade Flows? Goal/Research Question: Identify the factors that influence imports to Kazakhstan and quantify their impact.

4

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

• Three basic questions that have to be answered during the analysis: 1. Which (economic) relationships could be / are “known” to be relevant for this question? 2. Which data can be useful for checking the possibly relevant economic conjectures/theories? 3. How to decide about which economic conjecture to reject or to follow? • Let’s have a first look at some data of interest: the imports (in current US dollars) to Kazakhstan from 55 originating countries in 2004.

5

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

Imports to Kazakhstan in 2004 in current US dollars 1400000000

1200000000

1000000000

800000000

600000000

400000000

200000000

YUG

USA

TWN

TKM

THA

SVN

RUS

PRT

NOR

MLT

MDA

LTU

KGZ

JPN

ISL

HUN

HKG

GER

GBR

FIN

ESP

CZE

CHN

CAN

BIH

BEL

AUT

ALB

0

The original data are from UN Commodity Trade Statistics Database (UN COMTRADE)

6

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

• See section 10.4 in the Appendix for detailed data descriptions.

Data are provided in the EViews file Kazakhstan imports 2004.wf1. We thank Richard Frensch, Osteuropa-Institut, Regensburg, Germany, who provided all data throughout this course for analyzing trade flows.

• A first attempt to answer the three basic questions: 1. Ignore for the moment all existing economic theory and simply hypothesize that observed imports depend somehow on the GDP of the exporting country. 2. Collect GDP data for the countries of origin, e.g. from the International Monetary Fund (IMF) – World Economic Outlook Database

3. Plot the data, e.g. by using a scatter plot. Can you decide whether there is a relationship between trade flows from and the GDP of exporting countries? 7

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

A scatter plot Some questions:

1.20E+09

• What do you see?

TRADE_0_D_O

1.00E+09

• Is there a relationship?

8.00E+08

• If so, how to quantify it?

6.00E+08

• Is there a causal relationship - what determines what?

4.00E+08

• By how much do the imports from Germany change if the GDP in Germany changes by 1%?

2.00E+08 0.00E+00 0

2000 4000 6000 8000 10000 12000 WEO_GDPCR_O

8

• Are there other relevant factors determining imports, e.g. distance? • Is it possible to forecast future trade flows?

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

• What have you done? – You tried to simplify reality – by building some kind of (economic) model. • An (economic) model – has to reduce the complexity of reality such that it is useful for answering the question of interest; – is a collection of cleverly chosen assumptions from which implications can be inferred (using logic) — Example: Heckscher-Ohlin model; – should be as simple as possible and as complex as necessary; – cannot be refuted or “validated” without empirical data of some kind.

9

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

• Let us consider a simple formal model for the relationship between imports and GDP of the originating countries importsi = β0 + β1gdpi,

i = 1, . . . , 55.

– Does this make sense? – How to determine the values of the so called parameters β0 and β1? – Fit a straight line through the cloud! 1.20E+09

TRADE_0_D_O

1.00E+09 8.00E+08 6.00E+08 4.00E+08 2.00E+08 0.00E+00 0

2000 4000 6000 8000 10000 12000 WEO_GDPCR_O

10

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

TRADE_0_D_O

TRADE_0_D_O vs. WEO_GDPCR_O

More questions:

1.20E+09

– How to fit a line through the cloud of points?

1.00E+09

– Which properties does the fitted line have?

8.00E+08

– What to do with other relevant factors that are currently neglected in the analysis?

6.00E+08 4.00E+08 2.00E+08 0.00E+00 0

2000 4000 6000 8000 10000 12000 W E O _ G D P C R _ O

11

– Which criteria to choose for identifying a potential relationship?

Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig

Further questions: 1,200,000,000

– Is the potential relationship really linear? Compare it to the green points of a nonlinear relationship.

1,000,000,000

800,000,000

– And: how much may results change with a different sample, e.g. for 2003?

600,000,000

400,000,000

200,000,000

0 0

4,000

8,000

12,000

WEO_GDPCR_O TRADE_0_D_O TRADE_0_D_F_LEV TRADE_0_D_F_LEV2

12

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

1.2 Economic Models and the Need for Econometrics • Standard problems of economic models: – The conjectured economic model is likely to neglect some factors. – Numeric results to the numerical questions posed depend in general on the choice of a data set. A different data set leads to different numerical results. =⇒ Numeric answers always have some uncertainty.

13

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

• Econometrics – offers solutions for dealing with unobserved factors in economic models, – provides “both a numerical answer to the question and a measure how precise the answer is (Stock & Watson 2007, p. 7)”, – as will be seen later, provides tools that allow to refute economic hypotheses using statistical techniques by confronting theory with data and to quantify the probability of such decisions to be wrong, – as will be seen later as well, allows to quantify risks of forecasts, decisions, and even of its own analysis.

14

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

• Therefore: Econometrics can also be useful for providing answers to questions like: – How reliable are predicted growth rates or returns? – How likely is it that the value realizing in the future will be close to the predicted value? In other words, how precise are the predictions? • Main tool: Multiple regression model It allows to quantify the effect of a change in one variable on another variable, holding other things constant (ceteris paribus analysis).

15

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

• Steps of an econometric analysis: 1. Careful formulation of question/problem/task of interest. 2. Specification of an economic model. 3. Careful selection of a class of econometric models. 4. Collecting data. 5. Selection and estimation of an econometric model. 6. Diagnostics of correct model specification. 7. Usage of the model. Note that there exists a large variety of econometric models and model choice depends very much on the research question, the underlying economic theory, availability of data, and the structure of the problem. 16

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

• Goals of this course: providing you with basic econometric tools such that you can – successfully carry out simple empirical econometric analyzes and provide quantitative answers to quantitative questions, – recognize ill-conducted econometric studies and their consequences, – recognize when to ask for help of an expert econometrician, – attend courses for advanced econometrics / empirical economics, – study more advanced econometric techniques.

17

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

Some Definitions of Econometrics – “... discover empirical relation between economic variables, provide forecast of various economic quantities of interest ... (First issue of volume 1, Econometrica, 1933).” – “The science of model building consists of a set of quantitative tools which are used to construct and then test mathematical representations of the real world. The development and use of these tools are subsumed under the subject heading of econometrics (Pindyck & Rubinfeld 1998).”

18

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

– “At a broad level, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data. Econometric methods are used in many branches of economics, including finance, labor economics, macroeconomics, microeconomics, marketing, and economic policy. Econometric methods are also commonly used in other social sciences, including political science and sociology (Stock & Watson 2007, p. 3).” So, some may also say: “Alchemy or Science?”, “Economictricks”, “Econo-mystiques”. – “Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy (Wooldridge 2009, p. 1).”

19

Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig

• Summary of tasks for econometric methods – In brief: econometrics can be useful whenever you encounter (economic) data and you want to make sense out of them. – In detail: ∗ Providing a formal framework for falsifying postulated economic relationships by confronting economic theory with economic data using statistical methods: Economic hypotheses are formulated and statistically tested on basis of adequately (and repeatedly) collected data such that test results may falsify the postulated hypotheses. ∗ Analyzing the effects of policy measures. ∗ Forecasting. 20

Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig

1.3 Causality and Experiments • Common understanding: “causality means that a specific action” (touching a hot stove) “leads to a specific, measurable consequence” (get burned) (Stock & Watson 2007, p. 8). • How to identify causality? Observe repeatedly an action and its consequence! • Thus, in science one aims at repeating an action and its consequences under identical conditions. How to generate repetitions of actions?

21

Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig

• Randomized controlled experiments: – there is a control group that receives no treatment (e.g. fertilizer) and a treatment group that receives treatment, and – where treatment is assigned randomly in order to eliminate any possible systematic relationship between the treatment and other possible influences. • Causal effect: A “causal effect is defined to be an effect on an outcome of a given action or treatment, as measured in an ideal randomized controlled experiment (Stock & Watson 2007, p. 9).” • In economics randomized controlled experiments are very often difficult or impossible to conduct. Then a randomized controlled experiment provides a theoretical benchmark and econometric analysis

22

Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig

aims at mimicking as closely as possible the conditions of a randomized controlled experiment using actual data. • Note that for forecasting knowledge of causal effects is not necessary. • Warning: in general multiple regression models do not allow conclusions about causality!

23

Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig

1.4 Types of Economic Data 1. Cross-Sectional Data • are collected across several units at a single point or period of time. • Units: “economic agents”, e.g. individuals, households, investors, firms, economic sectors, cities, countries. • In general: the order of observations has no meaning. • Popular to use index i. • Optimal: the data are a random sample of the underlying population, see Section 2.1 for details. • Cross-Sectional data allow to explain differences between individual units. 24

Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig

• Example: sample of countries that export to Kazakhstan in 2004 of Section 1.1. 2. Time Series Data • are sampled across differing points/periods of time. • Popular to use index t. • Sampling frequency is important: – variable versus fixed; – fixed: annually, quarterly, monthly, weekly, daily, intradaily; – variable: ticker data, duration data (e.g. unemployment spells). • Time series data allow the analysis of dynamic effects. • Univariate versus multivariate time series data. 25

Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig

• Example: Trade flow from Germany to Kazakhstan and GDP in Germany (in current US dollars), 1990 - 2007, T = 18. TRADE_0_D_O

WDI_GDPUSDCR_O

8.0E+08

3.00E+12

7.0E+08

2.80E+12

6.0E+08

2.60E+12

5.0E+08 2.40E+12 4.0E+08 2.20E+12 3.0E+08 2.0E+08 1.0E+08

2.00E+12 1.80E+12

1.60E+12 0.0E+00 1990 1992 1994 1996 1998 2000 2002 2004 2006 1990 1992 1994 1996 1998 2000 2002 2004 2006

3. Panel data • are a collection of cross-sectional data for at least two different points/periods of time. • Individual units remain identical in each cross-sectional sample (except if units vanish). 26

Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig

• Use of double index: it where i = 1, . . . , N and t = 1, . . . , T . • Typical problem: missing values - for some units and periods there are no data. • Example: growth rate of imports from 55 different countries to Kazakhstan from 1991 to 2008 where all 55 countries were chosen for the sample 1990 and kept fixed for all subsequent years (T = 18, N = 55). 4. Pooled Cross Sections • also a collection of cross-sectional data, however, allowing for changing units across time. • Example: in 1995 countries of origin are Germany, France, Russia and in 1996 countries of origin are Poland, US, Italy.

27

Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig

In this course: focus on the analysis of cross-sectional data and specific types of time series data: • simple regression model → Chapter 2, • multiple regression model → Chapters 3 to 9. • Time series analysis requires advanced econometric techniques that are beyond the scope of this course (given the time constraints). Recall the arithmetic quality of data: • quantitative variables, • qualitative or categorical variables. Reading: Sections 1.1-1.3 in Wooldridge (2009).

28

2 The Simple Regression Model

Distinguish between the • population regression model and the • sample regression model.

29

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

2.1 The Population Regression Model • In general: y and x are two variables that describe properties of the population under consideration for which one wants “to explain y in terms of x” or “to study how y varies with changes in x” or “to predict y for given values of x”. Example: By how much changes the hourly wage for an additional year of schooling keeping all other influences fixed? • If we knew everything, then the relationship between y and x may formally be expressed as y = m(x, z 1, . . . , z s)

(2.1)

where z 1, . . . , z s denote s additional variables that in addition to years of schooling x influence the hourly wage y. 30

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• For practical application it is possible – that relationship (2.1) is too complicated to be useful, – that there does not exist an exact relationship, or – that there exists an exact relationship for which, however, not all s influential variables z 1, . . . , z s can be observed, or – one has no idea about the structure of the function m(·). • Our solution: – build a useful model, cf. Section 1.1, – which focuses on a relationship that holds on “average”. What do we mean by “average”?

31

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• Crucial building blocks for our model: – Consider the variable y as random. You may think of y denoting the value of the variable of a random choice out of all units in the population. Furthermore, in case of discrete values of the random variable y, a probability is assigned to each value of y. (If the random variable y is continuous, a density value is assigned.) In other words: apply probability theory. See Appendices B and C in Wooldridge (2009). Examples: ∗ The population consists of all apartments in Almaty. The variable y denotes the rent of a single apartment randomly chosen from all apartments in Almaty.

32

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

∗ The population consists of all possible values of imports to Kazakhstan from a specific country and period. ∗ For a dice the population consists of all numbers that are written on each side although in this case statisticians prefer to talk about a sample space. – In terms of probability theory the “average” of a variable y is given by the expected value of this variable. In case of discrete y one has X  E[y] = yj P rob y = yj j ∈ all different yj in population

– Sometimes one may only look at a subset of the population, namely all y that have the same value for another variable x.

33

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

Example: one only considers the rents of all apartments in Almaty of size x = 75 m2. – If the “average” is conditioned on specific values of another variable x, then one considers the conditional expected value of y for a given x: E[y|x]. For discrete random variables y one has X  E[y|x] = yj P rob y = yj |x j ∈ all different yj in population

(See Appendix 10.1 for a brief introduction to probability theory and corresponding definitions for continuous random variables.) Example continued: the conditional expectation E[y|x = 75] corresponds to the average rent of all apartments in Almaty of size x = 75 m2.

34

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

– Note that the variable x can be random, too. Then, the conditional expectation E[y|x] is a function of the (random) variable x E[y|x] = g(x) and therefore a random variable itself. – From the identity y = E[y|x] + (y − E[y|x])

(2.2)

one defines the error term or disturbance term as u ≡ y − E[y|x] so that one obtains a simple regression model of the population y = E[y|x] + u 35

(2.3)

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• Interpretation: – The random variable y varies randomly around the conditional expectation E[y|x]: y = E[y|x] + u. – The conditional expectation E[y|x] is called the systematic part of the regression. – The error term u is called the unsystematic part of the regression. • So instead of trying the impossible, namely specifying m(x, . . .) given by (2.1), one focuses the analysis on the “average” E[y|x].

36

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• How to determine the conditional expectation? – This step requires assumptions! – To keep things simple we make Assumption (A) given by E[y|x] = β0 + β1x.

(2.4)

– Discussion of Assumption (A): ∗ It restricts the flexibility of g(x) = E[y|x] such that g(x) = β0 + β1x has to be linear in x. So if E[y|x] = δ0 + δ1 log x, Assumption A is wrong. ∗ It can be fulfilled if there are other variables influencing y linearly. For example, consider E[y|x, z] = δ0 + δ1x + δ2z.

37

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

Then, by the law of iterated expectations one obtains E[y|x] = δ0 + δ1x + δ2E[z|x] If E[z|x] is linear in x, one obtains E[y|x] = δ0 + δ1x + δ2(α0 + α1x) = γ0 + γ1 x

(2.5)

with γ0 = δ0 + δ2α0 und γ1 = δ1 + δ2α1. Note, however, that in this case E[y|x, z] 6= E[y|x] in general. Then model choice depends on the goal of the analysis: the smaller model can sometimes be preferable for prediction, the larger model is needed if controlling for z is important ⇔ controlled random experiments, see Section 1.3. ∗ In general, Assumption (A) is violated if (2.5) does not hold e.g. if E[z|x] is nonlinear in x. Then the linear population model is called misspecified. More on that in Section 3.4. 38

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• Properties of the error term u: From Assumption (A) 1. E[u|x] = 0, 2. E[u] = 0, 3. Cov(x, u) = 0.

39

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• An alternative set of assumptions: The above result E[u|x] = 0 together with the identity (2.3) allows to rewrite Assumption (A) in terms of the following two assumptions: 1. Assumption SLR.1 (Linearity in the Parameters) y = β0 + β1x + u, 2. Assumption SLR.4 (Zero Conditional Mean) E[u|x] = 0.

40

(2.6)

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• Linear Population Regression Model: The simple linear population regression model is given by equation (2.6) y = β0 + β1x + u and obtained by specifying the conditional expectation in the regression model (2.3) by a linear function (linear in the parameters). The parameters β0 and β1 are called the intercept parameter and slope parameter, respectively.

41

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• Some terminology for regressions y

x

Dependent variable Independent variable Explained variable Explanatory variable Response variable

Control variable

Predicted variable

Predictor variable

Regressand

Regressor Covariate

42

Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig

• A simple example: a game of dice Let the random numbers x and u denote the throws of two fair dices with x, u = {−2.5, −1.5, −0.5, 0.5, 1.5, 2.5}. Based on both throws the random number y denotes the following sum 3 x + u. y = |{z} 2 + |{z} β0

β1

This completely describes the population regression model. – Derive the systematic relationship between y and x holding x fixed. – Interpret the systematic relationship. – How can you obtain the values of the parameters β0 = 2 and β1 = 3 if those values are unknown? Next section: How can you determine/estimate β0 and β1? 43

Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig

2.2 The Sample Regression Model Estimators and Estimates • In practice one has to estimate the unknown parameters β0 and β1 of the population regression model using a sample of observations. • The sample has to be representative and has to be collected/drawn from the population. • A sample of the random numbers x and y of size n is given by {(xi, yi) : i = 1, . . . , n}. • Now we require an estimator that allows us — given the sample observations {(xi, yi) : i = 1, . . . , n} — to compute estimates for the unknown parameters β0 and β1 of the population.

44

Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig

• Note: – If we want to construct an estimator for the unknown parameters, we have not yet observed a sample. An estimator is a function that contains the sample values as arguments. – Once we have an estimator and observe a sample, we can compute estimates (=numerical values) for the unknown quantities. • For estimating the unknown parameters there exist many different estimators that differ with respect to their statistical properties (statistical quality)! Pn 1 Example: Two different estimators for estimating the mean: n i=1 yi and 12 (y1 + yn).

45

Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig

• If you denote estimators of the parameters β0 and β1 in the population regression model y = β0 + β1x + u by β˜0 and β˜1, then the sample regression model is given by yi = β˜0 + β˜1xi + u˜i, i = 1, . . . , n. It consists of – the sample regression function or regression line y˜i = β˜0 + β˜1xi, – the fitted values y˜i, and – the residuals u˜i = yi − y˜i, i = 1, . . . , n. With which method can we estimate? 46

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

2.3 The Ordinary Least Squares Estimator (OLS) Estimator • The ordinary least squares estimator is frequently abbreviated as OLS estimator. The OLS estimator goes back to C.F. Gauss (17771855). • It is derived by choosing the values β˜0 and β˜1 such that the sum of squared residuals (SSR) n n  2 X X u˜2i = yi − β˜0 − β˜1xi i=1

i=1

is minimized.

47

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

• One computes the first partial derivatives with respect to β˜0 and β˜1 and sets them equal to zero: n   X yi − βˆ0 − βˆ1xi = 0, (2.7) i=1

n X i=1

  xi yi − βˆ0 − βˆ1xi = 0.

The equations (2.7) and (2.8) are called normal equations.

48

(2.8)

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

From (2.7) one obtains βˆ0 = n−1

n X i=1

yi − βˆ1n−1

n X

xi

i=1

βˆ0 = y¯ − βˆ1x¯, (2.9) Pn −1 where z¯ = n i=1 zi denotes the estimated mean of zi, i = 1, . . . , n. Inserting (2.9) into the normal equation (2.8) delivers n     X xi yi − y¯ − βˆ1x¯ − βˆ1xi = 0. i=1

Moving terms leads to n n X X xi(yi − y¯) = βˆ1 xi(xi − x¯). i=1

i=1

49

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

Note that n X

i=1 n X i=1

such that:

xi(yi − y¯) = xi(xi − x¯) =

n X i=1 n X i=1

(xi − x¯)(yi − y¯), (xi − x¯)2,

Pn (xi − x¯)(yi − y¯) i=1 ˆ Pn . β1 = 2 ¯) i=1(xi − x

50

(2.10)

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

• Terminology: – The sample functions (2.9) and (2.10) βˆ0 = y¯ − βˆ1x¯, Pn (xi − x¯)(yi − y¯) i=1 ˆ Pn β1 = 2 (x − x ¯ ) i i=1

are called the ordinary least squares (OLS) estimators for β0 and β1. – For a given sample, the quantities βˆ0 and βˆ1 are called the OLS estimates for β0 and β1.

51

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

– The OLS sample regression function or OLS regression line for the simple regression model is given by yˆi = βˆ0 + βˆ1xi

(2.11)

with residuals uˆi = yi − yˆi. – The OLS sample regression model is denoted by yi = βˆ0 + βˆ1xi + uˆi

52

(2.12)

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

Note: – The OLS estimator βˆ1 only exists if the sample observations xi, i = 1, . . . , n exhibit variation. Assumption SLR.3 (Sample Variation in the Explanatory Variable): In the sample the outcomes of the independent variable xi, i = 1, 2, . . . , n are not all the same. – The derivation of the OLS estimator only requires assumption SLR.3 but not the population Assumptions SLR.1 and SLR.4. – In order to investigate the statistical properties of the OLS estimator one needs further assumptions, see Sections 2.7, 3.4, 4.2. – One also can derive the OLS estimator from the assumptions about the population, see below. 53

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

• The OLS estimator as a Moment Estimator: – Note that from Assumption SLR.4 E[u|x] = 0 one obtains two conditions on moments: E[u] = 0 and Cov(x, u) = 0. Inserting Assumption SLR.1 u = y − β0 − β1x defines moment conditions for the model parameters E(y − β0 − β1x) = 0 E[x(y − β0 − β1x)] = 0

(2.13) (2.14)

– How to estimate the moment conditions using sample functions? – Assumption SLR.2 (Random Sampling): The sample of size n is obtained by random sampling that is, the pairs (xi, yi) and (xj , yj ), i 6= j, i, j = 1, . . . , n, are pairwise identically and independently distributed following the population model. 54

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

– An important result in statistics, see Section 5.1, says: If Assumption SLR.2 holds, then the expected value can well be estimated by the sample average. (Assumption SLR.2 can be weakened, see e.g. Chapter 11 in Wooldridge (2009).) – If one replaces the expected values in (2.13) and (2.14) by their sample averages, one obtains n   X n−1 yi − βˆ0 − βˆ1xi = 0, (2.15) i=1

n−1

n X i=1

  xi yi − βˆ0 − βˆ1xi = 0.

(2.16)

By multiplying (2.15) (2.16) by n one obtains the normal equations (2.7) and (2.8).

55

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

The Trade Example Continued Question: Do imports to Kazakhstan increase if the exporting country experiences an increase in GDP? Scatter plot (from Section 1.1) 1.20E+09

TRADE_0_D_O

1.00E+09 8.00E+08 6.00E+08 4.00E+08 2.00E+08 0.00E+00 0

2000 4000 6000 8000 10000 12000 WEO_GDPCR_O

56

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

The OLS regression line is given by \ i = 53441198 + 2.16 · 10−05gdpi, imports

i = 1, . . . , 55,

and the sample regression model by importsi = 53441198 + 2.16 · 10−05gdpi + uˆi, TRADE_0_D_O vs. WEO_GDPCR_O 1.20E+09

TRADE_0_D_O

1.00E+09 8.00E+08 6.00E+08 4.00E+08 2.00E+08 0.00E+00 0

2000 4000 6000 8000 10000 12000 W E O _ G D P C R _ O

57

i = 1, . . . , 55.

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig ==================================================================== Dependent Variable: TRADE_0_D_O Method: Least Squares Date: 02/09/09 Time: 17:12 Sample: 1 55 Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 53441198 25852541 2.067155 0.0439 WDI_GDPUSDCR_O 2.16E-05 1.37E-05 1.572746 0.1221 ==================================================================== R-squared 0.047139 Mean dependent var 67801339 Adjusted R-squared 0.028081 S.D. dependent var 1.77E+08 S.E. of regression 1.74E+08 Akaike info criterion 40.82943 Sum squared resid 1.52E+18 Schwarz criterion 40.90448 Log likelihood -1059.565 F-statistic 2.473530 Durbin-Watson stat 2.297310 Prob(F-statistic) 0.122085 ====================================================================

• For a data description see Appendix 10.4: importsi (from country i)

TRADE 0 D O

gdpi (in exporting country i) WDI GDPUSDCR O

58

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

• Potential interpretation of estimated slope parameter: ∆imports ˆ β1 = ∆gdp indicates by how many US dollars imports in Kazakhstan increase if GDP in an exporting country increases by 1 US dollar. • Does this interpretation really make sense? Aren’t there other important influencing factors missing? What about using economic theory as well? • What about the quality of the estimates?

59

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

Example: Wage Regression Question: How does education influence the hourly wage of an employee? • Data (Source: Example 2.4 in Wooldridge (2009)): Sample of U.S. employees with n = 526 observations. Available data are: – wage per hour in dollars and – educ years of schooling of each employee. • The OLS regression line is given by wage ˆ i = −0.90 + 0.54 educi,

i = 1, . . . , 526.

The sample regression model is wagei = −0.90 + 0.54 educ + uˆi, 60

i = 1, . . . , 526

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig ==================================================================== Dependent Variable: WAGE Method: Least Squares Sample: 1 526 Included observations: 526 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C -0.904852 0.684968 -1.321013 0.1871 EDUC 0.541359 0.053248 10.16675 0.0000 ==================================================================== R-squared 0.164758 Mean dependent var 5.896103 Adjusted R-squared 0.163164 S.D. dependent var 3.693086 S.E. of regression 3.378390 Akaike info criterion 5.276470 Sum squared resid 5980.682 Schwarz criterion 5.292688 Log likelihood -1385.712 F-statistic 103.3627 Durbin-Watson stat 1.823686 Prob(F-statistic) 0.000000 ====================================================================

• Interpretation of the estimated slope parameter: ∆wage ˆ β1 = ∆educ indicates by how much the hourly wage changes if the years of schooling increases by one year: – An additional year in school or university increases the hourly 61

Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig

wage by 54 cent. – But: Somebody without any education earns an hourly wage of -90 cent? Does this interpretation make sense? • Is it always sensible to interpret the slope coefficient? Watch out spurious causality, see next section. • Are these estimates reliable or good in some sense? What do we mean by “good” in econometrics and statistics? To get more insight study – the statistical properties of the OLS estimator and the OLS estimates, see Section 2.7 and – check the choice of the functional form for the conditional expectation E[y|x], see Section 2.6.

62

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

2.4 Best Linear Prediction, Correlation, and Causality Best Linear Prediction • What does the OLS estimator estimate if Assumptions SLR.1 and SLR. 4 (alias Assumption (A)) are not valid in the population from which the sample is drawn? Pn • Note that SSR(γ0, γ1)/n = i=1 (yi − γ0 − γ1xi)2 /n is a sample average and thus estimates h the expected value i E (y − γ0 − γ1x)2

(2.17)

if Assumption SLR.2 (or some weaker form) holds. (For existence of (2.17) it is required that 0 < V ar(x) < ∞ and V ar(y) < ∞.) Equation (2.17) is called the mean squared error of a linear predictor γ0 + γ1x. 63

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

• Mimicking minimizing SSR(γ0, γ1), the theoretically best fit of a linear predictor γ0 + γ1x to y is obtained by minimizing its mean squared error (2.17) with respect to γ0 and γ1. This leads (try to derive it) to γ0∗ = E[y] − γ1∗E[x],

s

Cov(x, y) ∗ = Corr(x, y) γ1 = V ar(x)

(2.18) V ar(y) V ar(x)

(2.19)

with Corr(x, y) = p

Cov(x, y)

,

−1 ≤ Corr(x, y) ≤ 1

V ar(x)V ar(y) denoting the correlation that measures the linear dependence between two variables in a population, here x and y. The expression γ0∗ + γ1∗x 64

(2.20)

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

is called the best linear predictor of y where “best” is defined by minimal mean squared error. • Now observe that for the simple regression model

y = γ0∗ + γ1∗x + ε one has Cov(x, ε) = 0, a weaker form of SLR.4, since Cov(x, y) Cov(x, y) = V ar(x) + Cov(x, ε). V ar(x) This indicates that one can show that under Assumption SLR.2 and SLR.3 the OLS estimator estimates the parameters γ0∗ and γ1∗ of the best linear predictor. Observe also that the OLS estimator (2.10) for the slope coefficient consists of the sample averages of the moments defining γ1∗ Pn (xi − x¯)(yi − y¯) i=1 Pn γˆ1 = 2 (x − x ¯ ) i i=1 65

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

• Rewriting γˆ1 as

sP

[ y) βˆ1 = Corr(x,

n (yi − y¯)2 i=1 Pn ¯)2 i=1(xi − x

using the empirical correlation coefficient Pn (xi − x¯)(yi − y¯) i=1 [ Corr(x, y) = qP Pn n 2 2 (x − x ¯ ) (y − y ¯ ) i i i=1 i=1

shows that the estimated slope coefficient is non-zero if there is sample correlation between x and y.

66

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

Causality • Recall Section 1.3. • Be aware that the slope coefficient of the best linear predictor γ1∗ and its OLS estimate γˆ1 cannot be automatically interpreted in terms of a causal relationship since estimating the best linear predictor – only captures correlation but not direction, – may not estimate the model of interest, e.g. if Assumptions SLR.1 and SLR.4 are violated and β1 6= γ1∗, [ y) estimates spurious correlation – may produce garbage if Corr(x, (Corr(x, y) = 0 and Assumption SLR.2 (or its weaker versions) are violated), – or relevant control variables are missing in the simple regression 67

Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig

model such that the results cannot represent results of a fictive randomized controlled experiment, see Chapter 3 onwards. Therefore, before any causal interpretation takes place one has to use specification and diagnostic techniques for regression models. Frequently one needs economic theory to identify causal relationships.

68

Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig

2.5 Algebraic Properties of the OLS Estimator Basic properties: Pn • i=1 uˆi = 0, Pn • i=1 xiuˆi = 0,

because of normal equation (2.7), because of normal equation (2.8).

• The point (¯ x, y¯) lies on the regression line.

Can you provide some intuition for these properties?

69

Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig

• Total sum of squares (SST) n X (yi − y¯)2 SST ≡ i=1

• Explained sum of squares (SSE) n X SSE ≡ (ˆ yi − y¯)2 i=1

• Sum of squared residuals (SSR) SSR ≡

n X

uˆ2i

i=1

• The decomposition SST = SSE + SSR holds if the regression model contains an intercept β0. 70

Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig

• Coefficient of Determination R2 or (R-squared) SSE 2 . R = SST – Interpretation: share of variation of yi that is explained by the variation of xi. – If the regression model contains an intercept term β0, then SSE SSR 2 =1− R = SST SST due to the decomposition SST = SSE + SSR, and therefore 0 ≤ R2 ≤ 1. – Later we will see: Choosing regressors with R2 is in general misleading.

71

Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig

Reading: • Sections 1.4 and 2.1-2.3 in Wooldridge (2009) and Appendix 10.1 if needed. • 2.4 and 2.5 in Wooldridge (2009).

72

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

2.6 Parameter Interpretation and Functional Form and Data Transformation • The term linear in “simple linear regression models” does not imply that the relationship between the explained and the explanatory variable is linear. Instead it refers to the fact that the parameters β0 and β1 enter the model linearly. • Examples for regression models that are linear in their parameters: yi = β0 + β1xi + ui, yi = β0 + β1 ln xi + ui, ln yi = β0 + β1 ln xi + ui, ln yi = β0 + β1xi + ui, yi = β0 + β1x2i + ui. 73

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

The Natural Logarithm in Econometrics Frequently variables are transformed by taking the natural logarithm ln. Then the interpretation of the slope coefficient has to be adjusted accordingly. Taylor approximation of the logarithmic function: ln(1 + z) ≈ z if z is close to 0. Using this approximation one can derive a popular approximation of growth rates or returns (∆xt)/xt−1 (∆xt)/xt−1

≡ (xt − xt−1)/xt−1 ≈ ln (1 + (xt − xt−1)/xt−1) , ≈ ln(xt) − ln(xt−1).

which approximates well if the relative change ∆xt/xt−1 is close to 0.

74

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

One obtains percentages by multiplying with 100: 100∆ ln(xt) ≈ %∆xt = 100(xt − xt−1)/xt−1.

Thus, the percentage change for small ∆xt/xt−1 can be well approximated by 100[ln(xt) − ln(xt−1)]. • Examples of models that are nonlinear in the parameters (β0, β1, γ, λ, π, δ): γ

yi = β0 + β1xi + ui, γ yi = β0 + β1 ln xi + ui,

1 yi = β0 + β1xi + (γ + δxi) + ui. 1 + exp(λ(xi − π))

• The last example allows for smooth switching between two linear regimes. The possibilities for formulating nonlinear regression models are huge. However, their estimation requires more advanced methods such as nonlinear least squares that are beyond the scope of this course. 75

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

• Note, however, that linear regression models allow for a wide range of nonlinear relationships between the dependent and independent variables, some of which were listed at the beginning of this section. Economic Interpretation of OLS Parameters • Consider the ratio of relative changes of two non-stochastic variables y and x ∆y %change of y %∆y y = = . ∆x %change of x %∆x x ∆y dy If ∆y → 0 and ∆x → 0, then it can be shown that ∆x → dx . • If this result is applied to the ratio above, one obtains the elasticity dy x η(x) = . dx y

76

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

• Interpretation: If the relative change of x is 0.01, then the relative change of y given by 0.01η(x). In other words: If x changes by 1%, then y changes by η(x)%. • If y, x are random variables, then the elasticity is defined with respect to the conditional expectation of y given x: dE[y|x] x . η(x) = dx E[y|x] This can be derived from E[y|x1=x0+∆x]−E[y|x0] E[y|x0] = ∆x x0

E[y|x1 = x0 + ∆x] − E[y|x0] x0 ∆x E[y|x0] and letting ∆x → 0. 77

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

Different Models and Interpretations of β1 For each model it is assumed that SLR.1 and SLR.4 hold. • Models that are linear with respect to their variables (level-level models) y = β0 + β1x + u. It holds that and thus

dE[y|x] = β1 dx ∆E[y|x] = β1∆x.

In words: The slope coefficient denotes the absolute change in the conditional expectation of the dependent variable y for a one-unit change in the independent variable x. 78

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

• Level-log models y = β0 + β1 ln x + u. It holds that and thus

1 dE[y|x] = β1 dx x

β1 β1 100∆ ln x ≈ %∆x. ∆E[y|x] ≈ β1∆ ln x = 100 100 In words: The conditional expectation of y changes by β1/100 units if x changes by 1%.

79

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

• Log-level models

ln y = β0 + β1x + u

or y = eln y = eβ0+β1x+u = eβ0+β1xeu. Thus E[y|x] = eβ0+β1xE[eu|x]. If E[eu|x] is constant, then dE[y|x] xE[eu|x] = β E[y|x]. = β1 e|β0+β1{z 1 } dx E[y|x]

One obtains the approximation ∆E[y|x] ≈ β1∆x, or %∆E[y|x] ≈ 100β1∆x E[y|x] In words: The conditional expectation of y changes by 100 β1% if x changes by one unit. 80

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

• Log-log models are frequently called loglinear models or constant-elasticity models and are very popular in empirical work ln y = β0 + β1 ln x + u. Similar to above one can show that dE[y|x] E[y|x] = β1 , and thus β1 = η(x) dx x if E[eu|x] is constant. In these models the slope coefficient is interpreted as the elasticity between the level variables y and x.

81

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

The Trade Example Continued ==================================================================== Dependent Variable: LOG(TRADE_0_D_O) Method: Least Squares Date: 02/08/09 Time: 12:29 Sample: 1 55 Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C -3.460963 3.280047 -1.055157 0.2964 LOG(WDI_GDPUSDCR_O) 0.770127 0.129507 5.946600 0.0000 ==================================================================== R-squared 0.414260 Mean dependent var 15.97292 Adjusted R-squared 0.402545 S.D. dependent var 2.613094 S.E. of regression 2.019797 Akaike info criterion 4.281574 Sum squared resid 203.9791 Schwarz criterion 4.356622 Log likelihood -109.3209 F-statistic 2.65E-07 ====================================================================

Note cient

the very different interpretation of the estimated slope coeffiβˆ1:

– Level-level model (Section 2.3): an increase in GDP in the exporting country by 1 billion US dollars corresponds to an increase 82

Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig

of imports to Kazakhstan by 0.216 million US dollars. – Log-log model: an 1%-increase of GDP in the exporting country corresponds to an increase of imports by 0.77%. But wait before you take these numbers seriously.

83

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

2.7 Statistical Properties of the OLS Estimator: Expected Value and Variance • Some preparatory transformations (all sums are indexed by i = 1, . . . , n): Pn Pn (xi − x¯) (yi − y¯) (xi − x¯) yi i=1 i=1 ˆ =P β1 = Pn 2 2 n ¯) xj − x¯ i=1 (xi − x j=1   n X X (x − x ¯ ) i P  yi = = wi yi  2 n ¯ j=1 xj − x i=1 | {z } wi

where it can be shown that (try it): P P P 2 wi = 0, wixi = 1 and w i = Pn 1 2. x) j=1(xj −¯ 84

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

• Unbiasedness of the OLS estimator: If Assumptions SLR.1 to SLR.4 hold, then E[βˆ0] = β0, E[βˆ1] = β1. Interpretation: If one keeps repeatedly drawing new samples and estimating the regression parameters, then the average of all obtained OLS parameter estimates roughly corresponds to the population parameters. The property of unbiasedness is a property of the sample distribution of the OLS estimators for β0 and β1. It does not imply that the population parameters are perfectly estimated for a specific sample.

85

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

Proof for βˆ1 (clarify where each SLR assumption is needed): i h 1. E βˆ1 x1, . . . , xn can be manipulated as follows: hX i =E w i y i x1 , . . . , xn i hX =E wi(β0 + β1xi + ui) x1, . . . , xn X = E [wi(β0 + β1xi + ui)| x1, . . . , xn] X X X wi + β1 w i xi + = β0 E [wiui| x1, . . . , xn] X = β1 + wiE [ui| x1, . . . , xn] X = β1 + wiE [ui| xi] = β1.

2. From E[βˆ1] = E[E[βˆ1|x1, . . . , xn]] one obtains unbiasedness E[βˆ1] = β1. 86

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

• Variance of the OLS estimator

In order to determine the variance of the OLS estimators βˆ0 and βˆ1 we need another assumption, Assumption SLR.5 (Homoskedasticity): V ar(u|x) = σ 2.

• Variances of parameter estimators conditional on the sample observations If Assumptions SLR.1 to SLR.5 hold, then   1 2 ˆ V ar β1 x1, . . . , xn = σ Pn , 2 − x¯) i i=1 (x P 2   −1 n xi 2 ˆ V ar β0 x1, . . . , xn = σ Pn . 2 ¯) i=1 (xi − x 87

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

Proof (for the conditional variance of βˆ1):   V ar βˆ1 x1, . . . , xn X  = V ar wiui x1, . . . , xn X = V ar (wiui| x1, . . . , xn) X = wi2V ar (ui| x1, . . . , xn) X = wi2V ar (ui| xi) X = wi2σ 2 X 2 =σ wi2 1 2 =σ P . 2 (xi − x¯) 88

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

• Covariance between the intercept and the slope estimator: x¯ 2 ˆ ˆ . Cov(β0, β1|x1, . . . , xn) = −σ Pn 2 ¯) i=1 (xi − x Proof: Cov(βˆ0, βˆ1 | x1, . . . , xn) can be manipulated as follows:   = Cov y¯ − βˆ1x¯, βˆ1 x1, . . . , xn     = Cov u¯, βˆ1 x1, . . . , xn −Cov βˆ1x¯, βˆ1 x1, . . . , xn | {z } =0see below  = −¯ xCov βˆ1, βˆ1 x1, . . . , xn   = −¯ xV ar βˆ1 x1, . . . , xn x¯ 2 = −σ P . 2 (xi − x¯) 89

Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig

 Cov u¯, βˆ1 x1, . . . , xn   X = Cov u¯, wiui x1, . . . , xn 

= Cov ( u¯, w1u1| x1, . . . , xn) + · · · + Cov ( u¯, wnun| x1, . . . , xn) = w1Cov ( u¯, u1| x1, . . . , xn) + · · · + wnCov ( u¯, un| x1, . . . , xn) X = wiCov ( u¯, ui| x1, . . . , xn) X = Cov ( u¯, u1| x1, . . . , xn) wi = 0.

90

Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig

2.8 Estimation of the Error Variance • One estimator for the error variance σ 2 is given by n X 1 2 uˆ2i , σ˜ = n i=1

where the uˆi’s denote the residuals of the OLS estimator. Disadvantage: The estimator σ˜ 2 does not take into account that 2 restrictions were imposed on obtaining the OLS residuals, namely: P P uˆi = 0, uˆixi = 0. This leads to biased estimates, E[˜ σ 2|x1, . . . , xn] 6= σ 2.

• Unbiased estimator for the error variance: n X 1 2 σˆ = uˆ2i . n−2 i=1

91

Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig

• If Assumptions SLR.1 to SLR.5 hold, then

E[ˆ σ 2|x1, . . . , xn] = σ 2.

• Standard error of the regression, standard error of the estimate or root mean squared error: √ σˆ = σˆ 2 • In the formulas for the variances of and covariance between the parameter estimators βˆ0 und βˆ1 the variance estimator σˆ 2 can be used for estimating the unknown error variance σ. Example: Vd ar(βˆ1|x1, . . . , xn) = P 92

σˆ 2 2

(xi − x¯)

.

.

Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig

Denote the standard deviation as q sd(βˆ1|x1, . . . , xn) = V ar(βˆ1|x1, . . . , xn), then

b βˆ1|x1, . . . , xn) =  sd( P

σˆ (xi − x¯)2

1/2

is frequently called the standard error of βˆ1 and reported in the output of software packages. Reading: Sections 2.4 and 2.5 in Wooldridge (2009) and Appendix 10.1 if needed.

93

3 Multiple Regression Analysis: Estimation

3.1 Motivation for Multiple Regression: The Trade Example Continued • In Section 2.6 two simple linear regression models for explaining imports to Kazakhstan were estimated (and interpreted): a levellevel model and a log-log model. • It is hardly credible that imports to Kazakhstan only depend on the GDP of the exporting country. What about, for example, distance, 94

Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig

borders, and other factors causing trading costs? • Such quantities have been found to be relevant in the empirical literature on gravity equations for explaining intra- and international trade. In general, bi-directional trade flows are considered. Here we consider only one-directional trade flows, namely exports to Kazakhstan in 2004. Such a simplified gravity equation reads as ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui. (3.1) Standard gravity equations are based on bilateral imports and exports over a number of years and thus require panel data techniques that are beyond the scope of this course. However, in Section 4.4 we will consider cross-section data on both imports and exports for 2004. For a brief introduction to gravity equations see e.g. Fratianni (2007). A recent theoretic underpinning of gravity equations was provided by Anderson & Wincoop (2003). 95

Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig

• If relevant variables are neglected, Assumptions SLR.1 and/or SLR.4 could be violated and in this case interpretation of causal effects can be highly misleading, see Section 3.4. To avoid this trap, the multiple regression model can be useful. • To get an idea about the change in the elasticity parameter due to a second independent variable, like e.g. distance, inspect the following OLS estimate of the simple import equation (3.1): Dependent Variable: LOG(TRADE_0_D_O), Method: Least Squares, Date: 02/13/09 Sample: 1 55, Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 4.800950 3.497341 1.372743 0.1761 LOG(WDI_GDPUSDCR_O) 1.088546 0.137001 7.945508 0.0000 LOG(CEPII_DIST) -1.970804 0.480555 -4.101103 0.0002 ==================================================================== R-squared 0.563937 Mean dependent var 15.97292 Adjusted R-squared 0.546138 S.D. dependent var 2.613094 S.E. of regression 1.760423 Akaike info criterion 4.024946 Sum squared resid 151.8554 Schwarz criterion 4.137518 Log likelihood -101.6486 F-statistic 31.68448 Durbin-Watson stat 2.117895 Prob(F-statistic) 0.000000

96

Time: 14:31

Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig

Instead of an estimated elasticity of 0.77, see Section 2.6, one obtains a value of 1.09. Furthermore, the R2 increases from 0.41 to 0.56, indicating a much better statistical fit. Finally, a 1% increase in distance reduces imports by almost 2%. Is this model then better? Or is it (also) misspecified? To answer these questions we have to study the linear multiple regression model first.

97

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

3.2 The Multiple Regression Model of the Population • Assumptions: The Assumptions SLR.1 and SLR.4 of the simple linear regression model have to be adapted accordingly to the multiple linear regression model (MLR) for the population (see Section 3.3 in Wooldridge (2009)): – MLR.1 (Linearity in the Parameters) The multiple regression model allows for more than one, say k, explanatory variables y = β0 + β1x1 + β2x2 + · · · + βk xk + u and the model is linear in its parameters. Example: the import equation (3.1). 98

(3.2)

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

– MLR.4 (Zero Conditional Mean) E[u|x1, . . . , xk ] = 0 for all x. Observe that all explanatory variables of the multiple regression (3.2) must be included in the conditioning set. Sometimes the conditioning set is called information set. • Remarks: – To see the need for MLR.4, take the conditional expectation of y in (3.2) given all k regressors E[y|x1, x2, . . . , xk ] = β0 + β1x1 + · · · βk xk + E[u|x1, x2, . . . , xk ]. If E[u|x1, x2, . . . , xk ] 6= 0 for some x, then the systematic part β0 +β1x1 +· · · βk xk does not model the conditional expectations E[y|x1, . . . , xk ] correctly. 99

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

– If MLR.1 and MLR.4 are fulfilled, then equation (3.2) y = β0 + β1x1 + β2x2 + · · · + βk xk + u is also called the linear multiple regression model for the population. Frequently it is also called the true model (even if any model may be fare from truth). Alternatively, one may think of equation (3.2) as the data generating mechanism (although, strictly speaking, a data generating mechanism also requires specification of the probability distributions of all regressors and the error). • To guarantee nice properties of the OLS estimator and the sample regression model, we adapt SLR.2 and SLR.3 accordingly: – MLR.2 (Random Sampling) The sample of size n is obtained by random sampling, that is the observations {(xi1, . . . , xik , yi) : i = 1, . . . , n} are pairwise 100

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

independently and identically distributed following the population model in MLR.1. – MLR.3 (No Perfect Collinearity) (more on MLR.3 in Section 3.3) • Interpretation: – If Assumptions MLR.1 and MLR.4 are correct and the population regression model allows for a causal interpretation, then the multiple regression model is a great tool for ceteris paribus analysis. It allows to hold the values of all explanatory variables fixed except one and check how the conditional expectation of the explained variable changes. This resembles changing one control variable in a randomized control experiment. Let xj be the control variable of interest.

101

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

– Taking conditional expectations of the multiple regression (3.2) and applying Assumption MLR.4 delivers E[y|x1, . . . , xj , . . . , xk ] = β0 + β1x1 + · · · + βj xj + · · · + βk xk . – Consider a change in xj : xj + ∆xj E[y|x1, . . . , xj +∆xj , . . . , xk ] = β0+β1x1+· · ·+βj (xj +∆xj )+· · ·+βk xk . ∗ Ceteris-paribus effect: In (3.2) the absolute change due to a change of xj by ∆xj is given by ∆E[y|x1, . . . , xj , . . . , xk ] ≡ E[y|x1, . . . , xj−1, xj + ∆xj , xj+1, . . . , xk ] − E[y|x1, . . . , xj−1, xj , xj+1, . . . , xk ] = βj ∆xj ,

102

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

where βj corresponds to the first partial derivative ∂E[y|x1, . . . , xj−1, xj , xj+1, . . . , xk ] = βj . ∂xj The parameter βj gives the partial effect of changing xj on the conditional expectation of y while all other regressors are held constant. ∗ Total effect: Of course one can also consider simultaneous changes in the regressors, for example ∆x1 and ∆xk . For this case one obtains ∆E[y|x1, . . . , xk ] = β1∆x1 + βk ∆xk . – Note that the specific interpretation of βj depends on how variables enter, e.g. as log variables. In a ceteris paribus analysis the results of Section 2.6 remain valid. 103

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

Trade Example Continued • Considering the log-log model (3.1)

ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui

a 1% increase in distance leads to a increase of β2% in imports keeping GDP fixed. In other words, one can separate the effect of distance on imports from the effect of economic size. From the output table in Section 3.1 one obtains that a 1% increase in distance decreases imports by about 2%. • Keep in mind that determining distances between countries is a complicated matter and results may change with the choice of the method for computing distances. Our data are from CEPII, see also Appendix 10.4. • There may still be missing variables, see also Section 4.4. 104

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

Wage Example Continued • In Section 2.3 it was assumed that hourly wage is determined by wage = β0 + β1 educ + u.

Instead of a level-level model one may also consider a log-level model ln(wage) = β0 + β1 educ + u.

(3.3)

However, since we expect that experience also matters for hourly wages, we want to include experience as well. We obtain ln(wage) = β0 + β1 educ + β2 exper + v.

(3.4)

What about the expected log wage given the variables educ and exper? E[ln(wage)|educ, exper] = β0 + β1 educ + β2 exper +E[v|educ, exper] E[ln(wage)|educ, exper] = β0 + β1 educ + β2 exper, 105

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

where the second equation only holds if MLR.4 holds, that is if E[v|educ, exper] = 0. • Note that if instead of (3.4) one investigates the simple linear loglevel model (3.3) although the population model contains exper one obtains E[ln(wage)|educ] = β0 + β1 educ + β2 E[exper|educ] +E[v|educ] indicating misspecification of the simple model since it ignores the influence of exper via β2. Thus, the smaller model suffers from misspecification if E[ln(wage)|educ] 6= E[ln(wage)|educ, exper] for some values of educ or exper.

106

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

• Empirical results: See Example 2.10 in Wooldridge (2009), file: wage1.wf1, output from EViews 6: – Simple log-level model Dependent Variable: LOG(WAGE) Method: Least Squares, Date: 02/17/09 Time: 15:59 Sample: 1 526, Included observations: 526 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.583773 0.097336 5.997510 0.0000 EDUC 0.082744 0.007567 10.93534 0.0000 ==================================================================== R-squared 0.185806 Mean dependent var 1.623268 Adjusted R-squared 0.184253 S.D. dependent var 0.531538 S.E. of regression 0.480079 Akaike info criterion 1.374061 Sum squared resid 120.7691 Schwarz criterion 1.390279 Log likelihood -359.3781 Hannan-Quinn criter. 1.380411 F-statistic 119.5816 Durbin-Watson stat 1.801328 Prob(F-statistic) 0.000000 ====================================================================

107

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

ln(wagei) = 0.5838 + 0.0827 educi + uˆi,

i = 1, . . . , 526,

R2 = 0.1858. If SLR.1 to SLR.4 are valid, then each additional year of schooling is estimated to increase hourly wages by 8.3% on average. The sample regression model explains about 18.6% of the variation of the dependent variable ln(wage).

108

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

– Multivariate log-level model: Dependent Variable: LOG(WAGE), Method: Least Squares Date: 02/17/09 Time: 15:48 Sample: 1 526, Included observations: 526 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.216854 0.108595 1.996909 0.0464 EDUC 0.097936 0.007622 12.84839 0.0000 EXPER 0.010347 0.001555 6.653393 0.0000 ==================================================================== R-squared 0.249343 Mean dependent var 1.623268 Adjusted R-squared 0.246473 S.D. dependent var 0.531538 S.E. of regression 0.461407 Akaike info criterion 1.296614 Sum squared resid 111.3447 Schwarz criterion 1.320940 Log likelihood -338.0094 Hannan-Quinn criter. 1.306139 F-statistic 86.86167 Durbin-Watson stat 1.789452 Prob(F-statistic) 0.000000 ====================================================================

ln(wagei) = 0.2169 + 0.0979 educi + 0.0103 experi + uˆi, i = 1, . . . , 526, R2 = 0.2493. 109

Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig

∗ Ceteris-paribus interpretation: If MLR.1 to MLR.4 are correct, then the expected increase in hourly wages due to an additional year of schooling is about 9.8% and thus slightly larger than obtained from the simple regression model. An additional year of experience corresponds to an in increase in expected hourly wages by 1%. ∗ Model fit: The model explains 24.9% of the variation of the independent variable. Does this imply that the multivariate model is better than the simple regression model with an R2 of 18.6%? Be careful with your answer and wait until we investigate model selection criteria.

110

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

3.3 The OLS Estimator: Derivation and Algebraic Properties • For an arbitrary estimator the sample regression model for a sample (yi, xi1, . . . , xik ), i = 1, . . . , n, is given by yi = β˜0 + β˜1xi1 + β˜2xi2 + · · · + β˜k xik + u˜i,

i = 1, . . . , n.

• Recall the idea of the OLS estimator: Choose β˜0, . . . , β˜k such that the sum of squared residuals (SSR) n n  2 X X SSR(β˜0, . . . , β˜k ) = u˜2i = yi − β˜0 − β˜1xi1 − · · · − β˜k xik i=1

i=1

is minimized. Taking first partial derivatives of SSR(β˜0, . . . , β˜k ) with respect to all k + 1 parameters and setting them to zero yields

111

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

the first order conditions of a minimum: n   X yi − βˆ0 − βˆ1xi1 − · · · − βˆk xik = 0

(3.5a)

xi1 yi − βˆ0 − βˆ1xi1 − · · · − βˆk xik = 0

(3.5b)

.. ..   xik yi − βˆ0 − βˆ1xi1 − · · · − βˆk xik = 0

(3.5c)

n X i=1

n X i=1

i=1





This system of normal equations contains k + 1 unknown parameters and k + 1 equations. Under some further conditions (see below) it has a unique solution. Solving this set of equations becomes cumbersome if k is large. This can be circumvented if the normal equations are written in matrix notation. 112

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

• The Multiple Regression Model in Matrix Form Using matrix notation the multiple regression model can be rewritten as (Wooldridge 2009, Appendix E) y = Xβ + u, where







y

|

xn0 xn1



β0







  u   β1   1     u     2    β2  +  .  .    .    ..      xn2 · · · xnk un {z } βk | {z } | {z } u X β

y x x x · · · x1k  1   10 11 12  y   x x x ··· x 2k  2   20 21 22  . = . .. .. ..  .   .    yn | {z }



(3.6)

The matrix X is called the regressor matrix and has n rows and k + 1 columns. The column vectors y and u have n rows each, the column vector β has k + 1 rows. 113

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

• Derivation: The OLS Estimator in Matrix Notation – One possibility to derive the OLS estimator in matrix notation is to rewrite the normal equations (3.5) in matrix notation. We do this explicitly for the j-th equation n   X xij yi − βˆ0xi0 − βˆ1xi1 − · · · − βˆk xik = 0 i=1

that is manipulated to n   X xij yi − βˆ0xij xi0 − βˆ1xij xi1 − · · · − βˆk xij xik = 0 i=1

and further to n  n  X X βˆ0xij xi0 + βˆ1xij xi1 + · · · + βˆk xij xik = xij yi. i=1

i=1

114

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

By factoring out we have       n n n n X X X X  xij xi0 βˆ0+ xij xi1 βˆ1+· · ·+ xij xik  βˆk = xij yi. i=1

i=1

i=1

i=1

Similarly, rearranging all other equations and collecting all k + 1 equations in a vector delivers  P  P P ( ni=1 xi0xi0) βˆ0 + ( ni=1 xi0xi1) βˆ1 + · · · + ( ni=1 xi0xik ) βˆk   ..     Pn P P ( i=1 xik xi0) βˆ0 + ( ni=1 xik xi1) βˆ1 + · · · + ( ni=1 xik xik ) βˆk P  n xi0yi i=1   .. . =   Pn i=1 xik yi 115

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

Applying the rules for matrix multiplication yields   P P P βˆ0 ( ni=1 xi0xi0) ( ni=1 xi0xi1) · · · ( ni=1 xi0xik )    .. .. ..    ..  ...    Pn Pn Pn βˆk ( i=1 xik xi0) ( i=1 xik xi1) · · · ( i=1 xik xik ) | {z } | {z } ˆ X′X β  P n xi0yi i=1   ..  =   Pn i=1 xik yi | {z } X′y

as well as the normal equations in matrix notation ˆ = X′y. (X′X)β

116

(3.7)

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

– Note: The matrix X′X has k + 1 columns and rows so that it is a square matrix. The inverse (X′X)−1 exists if all columns (and rows) are linearly independent. This can be shown to be the case if all columns of X are linearly independent. This is exactly what the next assumption states. Assumption MLR.3 (No Perfect Collinearity): In the sample none of the regressors can be expressed as an exact linear combination of one or more of the other regressors. Is this a restrictive assumption?

117

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

– Finally, multiply the normal equation (3.7) by (X′X)−1 from the left and obtain the OLS estimator in matrix notation: ˆ = (X′X)−1X′y. β (3.8) This is the compact notation for   P Pn n ˆ ( i=1 xi0xi0) ( i=1 xi0xi1) β0    .. ..  ..  =     Pn Pn ˆ βk ( i=1 xik xi0) ( i=1 xik xi1) | {z } | {z ˆ (X′X)−1 β P  n xi0yi i=1    .. .   Pn i=1 xik yi {z } | X′y

118

··· ... ···

−1 Pn ( i=1 xi0xik )  ..   Pn ( i=1 xik xik ) }

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

Algebraic Properties of the OLS Estimator Pn ′ ˆ = 0, that is i=1 xij uˆi = 0 for j = 0, . . . , k. •Xu ˆ +u ˆ into the normal equation yields Proof: Plugging y = Xβ ˆ = (X′X)β ˆ + X′u ˆ and hence X′u ˆ = 0. (X′X)β Pn • If xi0 = 1, i = 1, . . . , n, it follows that i=1 uˆi = 0.

• For the special case k = 1, the algebraic properties of the simple linear regression model follow immediately. • The point (¯ y , x¯1, . . . , x¯k ) is always located on the regression hyperplane if there is a constant in the model.

• The definitions for SST, SSE and SSR are like in the simple regression. • If a constant term is included in the model, we can decompose SST = SSE + SSR. 119

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

• The Coefficient of Determination: R2 is defined as in the SLR case as SSE 2 R = SST or, if there is an intercept in the model, SSR 2 . R =1− SST It can be shown that the R2 is the squared empirical coefficient of correlation between the observed yi’s and the explained yˆi’s, namely 2 Pn ¯ (yi − y¯) yˆi − yˆ i=1 2  P  R = P  2 n 2 n ¯ (y − y ¯ ) y ˆ − y ˆ i=1 i i=1 i h i2 [ yˆ) . = Corr(y, h i2 [ yˆ) can be used even when R2 is not useful. Note that Corr(y, 120

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

• Adjusted R2: If we rewrite R2 by expanding the SSR/SST term by n SSR/n 2 , R =1− SST/n

we can interpret SSR/n and SST/n as estimators for σ 2 and σy2 respectively. They are biased estimators, however. Using unbiased estimators thereof instead one obtains the “adjusted” R2 SSR/(n − k − 1) 2 ¯ R =1− SST/(n − 1) n−1 SSR =1− · n − k − 1 SST

121

Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig

  n − 1 ¯2 = 1 − R 1 − R2 n−k−1 n−1 −k + · R2 = n−k−1 n−k−1

¯ 2 (see Section 6.3 in Wooldridge (2009)): Properties of R ¯ 2 can increase or fall when including an additional regressor. –R ¯ 2 always increases if an additional regressor reduces the unbi–R ased estimate of the error variance. ¯ 2 of regression Attention: Analogously to R2 one may not compare R models with different y. ¯ 2 are both called goodness-of-fit mea• The quantities R2 and R sures.

122

Intensive Course in Econometrics — Section 3.4 — UR March 2009 — R. Tschernig

3.4 The OLS Estimator: Statistical Properties Assumptions (Recap): • MLR.1 (Linearity in the Parameters) • MLR.2 (Random Sampling) • MLR.3 (No Perfect Collinearity) • MLR.4 (Zero Conditional Mean)

123

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

3.4.1 The Unbiasedness of Parameter Estimates ˆ =β • Let MLR.1 through MLR.4 hold. Then we have E[β] Proof: ˆ = (X′X)−1X′y β

MLR.3

= (X′X)−1X′ (Xβ + u)

MLR.1

= (X′X)−1X′Xβ + (X′X)−1X′u = β + (X′X)−1X′u. Taking conditional expectation ˆ E[β|X] = β + E[(X′X)−1X′u|X] = β + (X′X)−1X′E[u|X] = β.

124

MLR.2 and MLR.4

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

The last equality holds because     0 E[u1|X]      E[u |X]  0     2 =  , E[u|X] =   ..    ..      E[un|X] 0 where the latter follows from

E[ui|X] = E[ui|x11, . . . , x1k , . . . , xnk ] = E[ui|xi1, . . . , xik ] =0 for i = 1, . . . , n.

125

MLR.2 MLR.4

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

• The Danger of Omitted Variable Bias We partition the k + 1 regressors in a (n × k) matrix XA and a (n × 1) vector xa. This yields y = XAβ A + xaβa + u.

(3.9)

In the following it is assumed that the population regression model has the same structure as (3.9). Trade Example Continued (from Section 3.2): Assume that in the population imports depend on gdp, distance, and whether the trading countries share to some extent the same language ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + β3 ln(languagei) + ui.

(3.10)

so that XA includes the constant, gdpi, and distancei and xa 126

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

includes languagei, each i = 1, . . . , n. Imagine now that you are only interested in the values of β A (the parameters for the constant, gdp, and distance), and that the regressor vector xa has to be omitted because you have, for instance, no data. Which effect has the omission of the variable xa on the esˆ A if, for example, the model timation of β y = XAβ A + w

(3.11)

is considered? Model (3.11) is frequently called the smaller model. Or, stated differently, which estimation properties does the OLS estimator for β A have on basis of the smaller model (3.11)?

127

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

Derivation: ˜ . – Denote the OLS estimator for β A from the small model by β A Following the proof of unbiasedness for the small model but replacing y with the true population model (3.9) delivers ˜ A = (X′ XA)−1X′ y β A A

= (X′AXA)−1X′A(XAβ A + xaβa + u)

= β A + (X′AXA)−1X′Axaβa + (X′AXA)−1X′Au. – By the law of iterated expectations E[u|XA] = E [E[u|XA, xa]|XA] and therefore E[u|XA] = E[0|XA] = 0 by validity of MLR.4 for the population model (3.9).

128

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

– Compute the conditional expectation of β A. Treating the (unobserved) xa in the same way as XA one obtains h i ˜ A|XA, xa = β A + (X′ XA)−1X′ xaβa. E β A A ˜ is unbiased only if Therefore the estimator β A (X′AXA)−1X′Axaβa = 0.

(3.12)

Take a closer look at the term on the left hand side of (3.12), i.e. (X′AXA)−1X′Axaβa. One observes that δ˜ = (X′AXA)−1X′Axa is the OLS estimator of δ in a regression of xa on XA xa = XAδ + ε. 129

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

Condition (3.12) holds (and there is no bias) if ∗ δ˜ = 0, so xa is uncorrelated with XA in the sample or ∗ βa = 0 holds and the smaller model is the population model. ˜ A is biased If neither of these conditions holds, then β ˜ A|XA, xa] = β A + δβ ˜ a. E[β

˜ is in general biased for This means that the OLS estimator β A every parameter in the smaller model. Since these biases are caused by using a regression model that misses a variable that is relevant in the population model, this kind of bias is called omitted variable bias and the smaller model is said to be misspecified. (See Appendix 3A.4 in Wooldridge (2009).)

130

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

– One may also ask about the unconditional bias. Applying the LIE delivers i h i h ˜ A βa, ˜ A|XA = β A + E δ|X E β h i h i ˜ A = β A + E δ˜ βa E β

Interpretation: The second expression delivers the expected value of the OLS estimator if one keeps drawing new samples for y and XA. Thus, in repeated sampling there is only bias if there is correlation in the population between the variables in XA and h i xa since otherwise E δ˜ = 0, cf. 2.4.

131

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

• Wage Example Continued (from Section 3.2): – If the observed regressor educ is correlated with the unobserved variable ability, then the regressor xa = ability is missing in the regression and the OLS estimators, e.g. for the effect of an additional year of schooling, are biased. – Interpretation of the various information sets for computing the expectation of βˆeduc: ∗ First consider

˜ ability , E[βˆeduc|educ, exper, ability] = βeduc + δβ

where





ability = 1 educ exper δ + ε. Then the conditional expectation above indicates the average of βˆeduc computed over many different samples where each 132

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

sample of workers is drawn in the following way: You always guarantee that each sample has the same number of workers with e.g. 10 years of schooling, 15 years of experience, and 150 units of ability and the same number of workers with 11 years of schooling, etc., so that for each combination of characteristics there is the same amount of workers although the workers are not identical. ∗ Next consider

˜ E[βˆeduc|educ, exper] = βeduc + E[δ|educ, exper] βability .

When drawing a new sample you only guarantee that the amounts of workers with a specific number of years of schooling and experience stay the same. In contrast to above, you do not control ability.

133

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

∗ Finally consider

˜ βability . E[βˆeduc] = βeduc + E[δ]

Here you simply draw new samples where everything is allowed to vary. If you had, let’s say 50 workers with 10 years of schooling in one sample, you may have 73 workers with 10 years of schooling in another sample. This possibility is excluded in the two previous cases.

134

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

• Effect of omitted variables on the conditional mean: – General terminology: ∗ If E[y|xA, xa] 6= E[y|xA], then the smaller model omitting xa is misspecified and estimation will suffer from omitted variable bias. ∗ If E[y|xA, xa] = E[y|xA], then the variable xa in the larger model is redundant and should be eliminated from the regression. ∗ Trade Example Continued: Assume that the population regression model only contains the variables gdp and distance. Then a simple regression model with gdp is misspecified and a multiple regression model with gdp, distance, and language contains the redundant variable language. 135

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

– It can happen that for a misspecified model Assumptions MLR.1 to MLR.4 are fulfilled. To see this, consider only one variable in XA E[y|xA, xa] = β0 + βAxA + βaxa. Then, by the law of iterated expectations one obtains E[y|xA] = β0 + βAxA + βaE[xa|xA].

If, in addition, E[xa|xA] is linear in xA xa = α0 + α1xA + ε, E[ε|xA] = 0, one obtains

E[y|xA] = β0 + βAxA + βa(α0 + α1xA) = γ0 + γ1 x with γ0 = β0 + βaα0 und γ1 = βA + βaα1 being the parameters of the best linear predictor, see Section 2.4. 136

Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig

– Note that in this case SLR.1 and SLR.4 are fulfilled for the smaller model although it is not the population model. However E[y|xA, xa] 6= E[y|xA] if βa 6= 0 and α1 6= 0. – Thus, model choice matters, see Section 3.5. If controlling for xa is important (controlled random experiments, see Section 1.3), then the smaller model is of not much use if the differences between the expected values are large for some values of the regressors. If one needs a model for prediction, the smaller model may be preferable since it exhibits smaller estimation variance, see Sections 3.4.3 and 3.5. Reading: Section 3.3 in Wooldridge (2009). 137

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

3.4.2 The Variance of Parameter Estimates • Assumption MLR.5 (Homoskedasticity):

V ar(ui|xi1, . . . , xik ) = σ 2 i = 1, . . . , n

• Assumptions MLR.1 bis MLR.5 are frequently called Gauss-MarkovAssumptions. • Note that by the Random Sampling assumption MLR.2 one has Cov(ui, uj |xi1, . . . , xik , xj1, . . . , xjk ) = 0 for all i 6= j, 1 ≤ i, j ≤ n, Cov(ui, uj ) = 0 for all i = 6 j, 1 ≤ i, j ≤ n, where for the latter equations the LIE was used. Because of MLR.2 one may also write V ar(ui|xi1, . . . , xik ) = V ar(ui|X), 138

Cov(ui, uj |X) = 0,

i 6= j.

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

One writes all  n variances and all covariances in a matrix  V ar(u1|X) Cov(u1, u2|X) · · · Cov(u1, un|X)    Cov(u , u |X) V ar(u |X) · · · Cov(u , u |X)   2 1 2 2 n V ar(u|X) ≡   . . . .   .. . . .   Cov(un, u1|X) Cov(un, u2|X) · · · V ar(un|X) 

1  0  2 =σ   .. 

0 ··· 0



(3.13)

 1 · · · 0  .. . . . ..   

0 0 ··· 1

or short (MLR.2 and MLR.5 together) V ar(u|X) = σ 2I. 139

(3.14)

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

• Variance of the OLS Estimator Under the Gauss-Markov Assumptions MLR.1 to MLR.5 we have 2 σ , xj not constant, (3.15) V ar(βˆj |X) = 2 SSTj (1 − Rj ) where SSTj is the total sample variation (total sum of squares) of the j-th regressor, n X SSTj = (xij − x¯j )2, i=1

and the coefficient of determination Rj2 is taken from a regression of the j-th regressor on all other regressors xij = δ0xi0 + · · · + δj−1xi,j−1 + δj+1xi,j+1 + vi, i = 1, . . . , n. (See Appendix 3A.5 in Wooldridge (2009) for the proof.) 140

(3.16)

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

Interpretation of the variance of the OLS estimator: – The larger the error variance σ 2, the larger is the variance of βˆj . Note: This is a property of the population so that this variance component cannot be influenced by sample size. (In analogy to the simple regression model.) – The larger the total sample variation SSTj of the j-th regressor xj is, the smaller is the variance V ar(βˆj |X). Note: The total sample variation can always be increased by increasing sample size since adding another observation increases SSTj . – If SSTj = 0, assumption MLR.3 fails to hold.

141

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

– The larger the coefficient of determination Rj2 from regression (3.16) is, the larger is the variance of βˆj . – The larger Rj2, the better the variation in xj can be explained by variation in the other regressors because in this case there is a high degree of linear dependence between xj and the other explanatory variables. Then only a small part of the sample variation in xj is specific for the j-th regressor (precisely the error variation in (3.16)). The other part of the variation can be explained equally well by the estimated linear combination of all other regressors. This effect is not well attributable by the estimator to either variable xj or the linear combination of all the remaining variables and thus the estimator suffers from a larger estimation variance.

142

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

– Special cases: ∗ Rj2 = 0: Then xj and all other explanatory variables are empirically uncorrelated and the parameter estimator βˆj is unaffected by all other regressors. ∗ Rj2 = 1: Then MLR.3 fails to hold.

∗ Rj2 near 1: This situation is called multi- oder near collinearity. In this case V ar(βˆj |X) is very large. – But: The multicollinearity problem is reduced in larger samples because SSTj rises and hence variance decreases for a given value of Rj2. Therefore multicollinearity is always a problem of small sample sizes, too.

143

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

• Estimation of the error variance σ 2

– Unbiased estimation of the error variance σ 2: ′u ˆ ˆ u 2 . σˆ = n − (k + 1)

– Properties of the OLS estimator (continued): q Call sd(βˆj |X) = V ar(βˆj |X) the standard deviation, then σˆ ˆ b sd(βj |X) =  1/2 SSTj (1 − Rj2)

is the standard error of βˆj .

144

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

• Variance-covariance-matrix of the OLS estimator: Basics: The covariance of jointly estimating βj and βl — between the estimators of the j-th and the l-th parameter — is written as Cov(βˆj , βˆl |X) = E[(βˆj − βj )(βˆl − βl )|X], j, l = 0, 1, . . . , k + 1, where unbiasedness of the estimators is assumed. Rewrite all variances and covariances in a ((k + 1) × (k + 1))-matrix: ˆ V ar(β|X) ≡   Cov(βˆ0, βˆ0|X) Cov(βˆ0, βˆ1|X) · · · Cov(βˆ0, βˆk |X)    Cov(βˆ , βˆ |X) Cov(βˆ , βˆ |X) · · · Cov(βˆ , βˆ |X)    1 0 1 1 1 k =  ...   . . .   Cov(βˆk , βˆ0|X) Cov(βˆk , βˆ1|X) · · · Cov(βˆk , βˆk |X) 145

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

Rewriting yields ˆ V ar(β|X) =   E[(βˆ0 − β0)(βˆ0 − β0)|X] · · · E[(βˆ0 − β0)(βˆk − βk )|X]    E[(βˆ − β )(βˆ − β )|X] · · · E[(βˆ − β )(βˆ − β )|X]    1 1 0 0 1 1 k k =  .. .. ...     E[(βˆk − βk )(βˆ0 − β0)|X] · · · E[(βˆk − βk )(βˆk − βk )|X]    (βˆ0 − β0)       · · ·  (βˆ − β ) · · · (βˆ − β ) X . =E 0 0 k k    ˆ (βk − βk )

Next it will be shown that it holds: h i ˆ ˆ − β)(β ˆ − β)′|X = σ 2(X′X)−1. V ar(β|X) = E (β 146

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

Proof: Remember that correct model specification implies ˆ = (X′X)−1X′y = (X′X)−1X′(Xβ + u) = β + (X′X)−1X′u, β ˆ −β = (X′X)−1X′u, which inserted into V ar(β|X) ˆ hence β yields  i  ′  h ˆ − β)(β ˆ − β)′|X = E (X′X)−1X′u (X′X)−1X′u |X E (β h i = E (X′X)−1X′uu′X(X′X)−1|X ′|X] X(X′X)−1 = (X′X)−1X′ E[uu | {z } σ 2In

= σ 2(X′X)−1X′X(X′X)−1 = σ 2(X′X)−1.

ˆ From the definition of V ar(β|X) above it can be seen that the diagonal elements are the variances V ar(βˆj |X), j = 0, . . . , k. 147

Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig

• Efficiency of OLS Note: The OLS estimator is a linear estimator with respect to the dependent variable because it holds for given X that ! n X vˆi ˆ βj = Pn 2 yi, ˆi i=1 v i=1 where vˆi are the residuals from regression (3.16). (For a derivation without matrix algebra see Appendix 3A.2 in Wooldridge (2009).) Further OLS is unbiased so E[βˆj ] = βj .

Gauss-Markov Theorem: Under assumptions MLR.1 through MLR.5 the OLS estimator is the best linear unbiased estimator (BLUE). “Best” means that the OLS estimator, that is unbiased since E[βˆj ] = βj , has minimal variance among linear unbiased estimators. 148

Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig

3.4.3 Trade-off between Bias and Multicollinearity • Example: Let the population model be

y = β0 + β1x1 + β2x2 + u. – For a given sample let R12 be close to 1. Then β1 is estimated with a large variance by (3.15). – A possible solution? Leaving out the regressor x2 and estimation of the simple regression. But then, as already shown, the estimator of β1 is biased. Hence: If there is correlation between x1 and x2 near 1 or -1, then — for given sample size — one faces a trade-off between variance and bias. – What we observe is kind of a statistical uncertainty relation: The sample does not provide sufficient information to precisely 149

Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig

answer the formulated question. – The only good solution: Increasing sample size. – Alternative solution: Combining highly correlated variables. • Variance of parameter estimates in misspecified models: Again, there are different possibilities how incorrect regression models might be chosen (cf. Section 3.4.1): – Too many variables: Parameters are estimated for variables that do not play a role in the “true” data generation mechanism (redundant variables). – Too few variables: One or more variables are missing which are relevant in the population regression model (omitted variables).

150

Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig

– Wrong variables: A combination of both. Effect on the variance of parameter estimators: – Case 1 (redundant variables): Consider the population model y = Xβ+u. Assume that instead the following sample specification is chosen: y = Xβ + zα + w, where the vector z contains all sample observations of the variable z. The variance of the parameter estimator βˆj is 2 σ V ar(βˆj |X) = , 2 SSTj (1 − Rj,X,z)

2 is the coefficient of determination of a rewhere now Rj,X,z gression of xj on all other variables in X and on z. It is easily 2 seen that Rj,X,z ≥ Rj2 because less variables are included in the 151

Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig

regression yielding the second R2. Therefore: Including additional variables in a regression model increases estimation variance or leaves it unchanged. – Case 2 (omitted variables): The converse of case 1 holds: If a variable is omitted, it can be shown that the estimation variance is smaller than when using the true model. – Case 3 (redundant and omitted variables): Should really be avoided. Correct model specification is crucial!

152

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

3.5 Model Specification I: Model Selection Criteria • Goal of model selection: – In principle: find the population model. – In practice: find the “best” model for the purpose of the analysis. – More specific: Under the assumption that the population model is a multiple linear regression model find all regressors that are included in the regression and their appropriate transformations (log or level or ...). Avoid omitting variables and including irrelevant variables.

153

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

• Brief theory of model selection: – There are two issues: a) the variable (model) choice, b) the estimation variance. – Consider a): Choose a goal function to evaluate different models. A popular goal function is the mean squared error (MSE). For fixed parameters it is defined as i h (3.17) M SE = E (y − β0x0 − β1x1 − · · · − βk xk )2 , see also equation (2.17) for the simple regression case. Choose the model for which the MSE is minimal.

154

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

Important cases: ∗ If x0, . . . , xk include all relevant variables, the population model is a multiple linear regression, and MSE is minimized with respect to the parameters, then h i M SE = E u2 = σ 2.

∗ If relevant variables are missing, it can be shown that the M SE decomposes into variance and squared bias. For simplicity, omit all variables except x1 and fit the simple linear regression y = γ0 + γ1x1 + v. Then

h

M SE1 = E {(y − E[y|x1, . . . , xk ]) + (E[y|x1, . . . , xk ] − E[y|x1])}2 = σ 2 + (E[y|x1, . . . , xk ] − E[y|x1])2 . 155

i

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

Since the squared bias term is positive, (E[y|x1, . . . , xk ] − E[y|x1])2 > 0, one clearly has M SE < M SE1. – Consider a) and b): If parameters have to estimated, a further term enters the mean squared error, namely the variances and covariances for estimating the model parameters. One has M SE = V ariance of population error + Bias of chosen model2 + Estimation variance, where the estimation variance in general increases with the number of variables. Now it can happen that for minimizing MSE it is optimal to choose a model that omits variable(s). A typical case is prediction. – Therefore, a reliable method for estimating the MSE is needed.

156

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

• What does not work: – Selecting the model with the smallest standard error of the regression σˆ does not work. ∗ Why? It is always possible to select a model for which every residual is zero, that is uˆi = 0 for all i = 1, . . . , n. Then σˆ = 0 as well although the error variance σ 2 > 0 in the true model. ∗ How? Simply take k + 1 = n regressors into the sample regression model which fulfil MLR.3 and solve the normal equations (3.5). Then you obtain a perfect fit since you have a linear equation system with n equations and n unknown parameters. ∗ Note that you can add any regressors that fulfil MLR.3 even if they have nothing to do with the population regression model. ∗ Note also that SSR remains constant or decreases if for a given 157

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

sample of size n a further regressor variable is added since the linear equation system obtains more flexibility to fit the sample observations. Therefore σ˜ 2 = SSR n remains constant or decreases as well. SSR there are opposing ∗ For the variance estimator σˆ 2 = n−k−1 effects: a decrease in SSR maybe offset by the decrease in n − k − 1.

In sum, the standard error of regression tends to decrease when additional regressors are added so that it is not suited for selecting those variables that are part of the population model. – Selecting the model with the largest R2 does not work either. Why?

158

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

– Although the adjusted R2 may fall or increase with adding another ¯ 2 = 1 as well in this regressor, it screws up for k + 1 = n since R case. • Solution: Use model selection criteria – Basic idea: ˆ ˆ ′u u + (k + 1) · penalty f unction(n) Selection criterion = ln n ∗ First term: a variance estimator for ln(σ 2) of the chosen model. ˆ ′u ˆ /n is reduced by Note that the estimated variance σ˜ 2 = u every additionally included independent variable. ∗ Second term: is a penalty term punishing the number of parameters to avoid models that include redundant variables. 159

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

Because the true error variance is typically underestimated using σ˜ 2, the penalty term penalizes the inclusion of additional regressors. The penalty term increases with k and the penalty function must be chosen such that is decreases with n such that a large number of parameters matters less in large samples. Why? ∗ This implies a trade-off: Regressors are taken in the model, if the penalty is smaller than the decrease in estimated MSE. When choosing a criterion one determines how the trade-off is shaped. ∗ Rule: Choose among all considered candidate models the specification for which the criterion is minimal.

160

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

– Popular model selection criteria: ∗ the Akaike Criterion (AIC) ˆ ˆ ′u 2 u + (k + 1) , AIC = ln n n ∗ the Hannan-Quinn Criterion (HQ) ˆ ′u ˆ u 2 ln(ln n) HQ = ln + (k + 1) , n n

(3.18)

(3.19)

∗ the Schwarz / Bayesian Information Criterion (SC/BIC) ˆ ′u ˆ u ln n SC = ln + (k + 1) . (3.20) n n It is advised always to check all criteria although the researcher decides which to use. In nice cases, all criteria deliver the same result. Note that for standard sample sizes SC punishes additional parameters more than HQ, and HQ more than AIC 161

Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig

• Trade Example Continued: – Model 1

LOG(TRADE_0_D_O) = -3.4610 + 0.7701*LOG(WDI_GDPUSDCR_O) AIC = 4.282, HQ = 4.310, SC = 4.357 – Model 2 LOG(TRADE_0_D_O) = 4.8009 + 1.0885*LOG(WDI_GDPUSDCR_O) - 1.9708*LOG(CEPII_DIST) AIC = 4.025, HQ = 4.068, SC = 4.138 – Model 3 LOG(TRADE_0_D_O) = -9.5789+ 1.3566*LOG(WDI_GDPUSDCR_O) - 1.1442*LOG(CEPII_DIST) + 3.1265*CEPII_COMCOL_REV AIC = 3.694, HQ = 3.751, SC = 3.844 – Model 4 Y_LOG = -13.0268 + 1.3176*LOG(WDI_GDPUSDCR_O) - 0.6249*LOG(CEPII_DIST) + 3.3512*CEPII_COMCOL_REV + 2.1096*CEPII_COMLANG_OFF AIC = 3.679, HQ = 3.751, SC = 3.867 – Comparing all four models, SC selects model 3 with regressors gdp, distance and common colonizer while AIC selects model 4 with additional regressor common official language. See Appendix 10.4 for more details on variables. One can nicely see that SC punishes additional variables more than AIC. Statistical tests may provide further information on which model to choose, see Sections 4.3 onwards.

162

4 Multiple Regression Analysis: Hypothesis Testing and Confidence Intervals

4.1 Basics of Statistical Tests Foundations of statistical hypothesis testing • In general: Statistical hypothesis tests allow statistically sound and unambiguous answers to yes-or-no questions: – Do men and women earn equal income in Kazakhstan?

163

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

– Do certain political attempts lead to a decrease in unemployment in 2010? – Are imports to Kazakhstan influenced by the gdp of exporting countries?

164

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• Elements of a statistical test: 1. Two disjoint hypotheses about the value(s) of (a) parameter(s) θ in a population. That means that one of the two competing hypotheses has to hold in the population: – Null hypothesis H0 – Alternative hypothesis H1 2. A test statistic T that is a function of some or all sample values (X, y). We will denote it as t(X, y). 3. A decision rule, stating for which values of t(X, y) the null hypothesis H0 is rejected and for which values the null is not rejected.

165

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

More precisely: Partition the domain of the test statistic T in two disjoint regions: – Rejection region, critical region C If the test statistic t(X, y) is located in the critical region, H0 is rejected: Reject H0 if t(X, y) ∈ C – Non-rejection region If the test statistic t(X, y) falls into the non-rejection region, H0 is not rejected: Do not reject H0 if t(X, y) 6∈ C – Critical value c: Boundary between rejection and non-rejection region.

166

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• Properties of a test: – Type I error, α error: The type I error measures the probability (evaluated before the sample is taken) of rejecting H0 though H0 is correct in the population, α = P (Reject H0 |H0 is true) = P (T ∈ C|H0 is true). The type I error is frequently called the significance level or size of a test. – Type II error, β error: The type II error gives the probability of not rejecting H0 though it is wrong, β = P (Not reject H0|H1 is true).

167

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

– Size of a test: The significance level or size equal to α has to be fixed by the researcher before the test is carried out. – Power of a test: The power of a test gives the probability of rejecting a wrong null hypothesis. power = π = P (Reject H0 |H1 is true), that is π = 1 − P (Not reject H0|H1 is true) = 1 − β. To calculate C for a given α one has to know the probability distribution of the test statistic under H0.

168

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

Deriving Tests about the Sample Mean: 1. Consider two disjoint hypotheses about the mean of a sample. (For example, the mean µ of hourly wages in the US in 1976.) a) Null hypothesis H0 : µ = µ 0 (In our example: mean hourly wage is 6 US-$, thus H0 : µ = 6) b) Alternative hypothesis H1 : µ 6= µ0 (In the example: mean hourly wages are not 6 US-$, thus H1 : µ 6= 6)

169

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

2. Test statistic: a) Choice of an estimator for the unknown mean µ, e.g. the OLS estimator of a regression of hourly wages w on a constant: Compute the sample mean n

1X wi . µˆ = n i=1

out of a sample w1, . . . , wn with n observations. b) Obtain the probability distribution of the estimator: For simplicity assume that individual wages wi are jointly normally distributed 2 , that is with expected value µ and variance σw 2 ). wi ∼ N (µ, σw

170

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

From the properties of jointly normally distributed random variables it follows that   µˆ ∼ N µ, σµ2ˆ , P 2 −1 2. where σµˆ = V ar(ˆ µ) = V ar(n wi) = n−1σw

c) In order to obtain a test statistic t(w1, . . . , wn) all unknown parameters have to be removed from the distribution. In this simple case this can be achieved by standardizing µˆ µˆ − µ t(w1, . . . , wn) = ∼ N (0, 1). σµˆ d) The test statistic t(w1, . . . , wn) can be calculated if we know µ and σµˆ . Assume for the moment that σµˆ is known.

171

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

Which value takes µ under H0? H0 : µ = µ0. Under H0 we can compute the test statistic for a given sample as µˆ − µ0 t(w1, . . . , wn) = ∼ N (0, 1). σµˆ 3. Decision rule: When should we reject H0 and in which case shouldn’t we? (Now the significance level α has to be chosen!) If the deviation of µˆ from the null hypothesis value µ0 is large enough one would reject H0.

172

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

f(t)

Critical value c Probability of error a/2

Probability of error a/2

0 Region of rejection under H 0

Non-rejection region under H0

t Region of rejection under H 0

Intuition: If t is very large (or very small) then a) the estimated mean µˆ is far from µ0 (under H0) and / or b) the standard deviation σµˆ of the estimated deviation is small relative to µˆ − µ0.

173

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• When is |t| large enough (to reject H0)? • Note: Under H0 it holds that

µˆ − µ0 t(w1, . . . , wn) = ∼ N (0, 1) σµˆ

and hence for given α the rejection region C can be determined (see figure). • Formally: P (T < −c|H0) + P (T > c|H0) = α or in this case due to the symmetry of the normal distribution α α P (T < −c|H0) = und P (T > c|H0) = . 2 2 The values of −c and c are tabulated — they are the α/2 and 1 − α/2 quantiles of the standard normal distribution. 174

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• Under H1 it holds that

µˆ − µ ∼ N (0, 1). σµˆ

Expanding yields µˆ − µ + µ0 − µ0 µˆ − µ0 µ0 − µ µˆ − µ0 µ − µ0 = + = − σµˆ σµˆ σµˆ σµˆ σµˆ | {z } | {z } t(w1,...,wn)

and therefore we have under H1

µˆ − µ0 t(w1, . . . , wn) = ∼N σµˆ

µ − µ0 ,1 σµˆ

m

!

since X ∼ N (m, 1) is equivalent to X − m ∼ N (0, 1). • Conclusion: If H1 is true, then the density of t(w1, . . . , wn) is shifted by (µ − µ0)/σµˆ . 175

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• In the figure exhibiting the density under H1 (for a specific value of µ 6= µ0) the power can be seen as the sum of the two shaded areas because π = P (t < −c|H1) + P (t > c|H1). Power = Sum of the probabilites of rejection f(t)

Critical value c

Probability of rejection

Probability of rejection

0 Rejection region of H0

m-m0 sm^

Non-rejection region of H 0

t Rejection region of H0

• For a given σµˆ , the power of the test increases with the distance between the null hypothesis µ0 and the true value µ. • Recall that if H0 is true, then (µ − µ0)/σµˆ = 0 holds and one 176

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

obtains the distribution under H0. • It can further be seen that the type II error — given as 1 − π = 1 − (1 − β) = β — does not equal zero! 4. There remains one problem: In real world applications we do not √ know the standard deviation of the mean estimator σµˆ = σw / n. Remedy: Estimation by σˆw σˆµˆ = √ . n Then one has the popular t statistic µˆ − µ0 t(w1, . . . , wn) = , σˆ µˆ however, watch out!

177

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

The test statistic is no longer normally distributed but follows a t distribution with n − 1 degrees of freedom (short tn−1). Therefore µˆ − µ0 t(w1, . . . , wn) = ∼ tn−1. σˆµˆ To obtain the critical values

α α P (T < −c|H0) = und P (T > c|H0) = , 2 2 the tables of the t distribution have to be considered (see Appendix G, Table G.2 in Wooldridge (2009)). Wage Example Continued: Hourly wages wi, i = 1, . . . , 526 of US employees: 1. Hypotheses: a) Null hypothesis: H0 : µ = 6 b) Alternative hypothesis: H1 : µ 6= 6 178

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

2. Estimation and calculation of the t statistic in EViews: ==================================================================== Dependent Variable: WAGE, Method: Least Squares Sample: 1 526, Included observations: 526 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 5.896103 0.161026 36.61580 0.0000 ==================================================================== R-squared 0.000000 Mean dependent var 5.896103 Adjusted R-squared 0.000000 S.D. dependent var 3.693086 S.E. of regression 3.693086 Akaike info criterion 5.452701 Sum squared resid 7160.414 Schwarz criterion 5.460810 Log likelihood -1433.060 Durbin-Watson stat 1.817647 ====================================================================

Thus µˆ = 5.896103, and

σˆµˆ = 0.161026

5.896103 − 6 t(w1, . . . , w526) = = −0.64521878. 0.161026 179

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

3. Determination of critical values: Suppose a significance level of α = 5%. Then the critical value c = t525,0.05 can be obtained from the table for the t distribution with n − 1 = 525 degrees of freedom: c = t525,0.05 = 1.96. 4. Test decision: Do not reject H0 : µ = 6 since −c = −1.96 < t = −0.645 < c = 1.96, and therefore t ∈ / C (the test statistic is not contained in the rejection region). 5. However: Do hourly wages wi really follow a normal distribution as assumed? Examine the histogram of the sample observations wi (see the descriptive statistics option in EViews):

180

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

Result: 140 Series: WAGE Sample 1 526 Observations 526

120 100 80 60 40 20 0 0

2

4

6

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

5.896103 4.650000 24.98000 0.530000 3.693086 2.007325 7.970083

Jarque-Bera Probability

894.6195 0.000000

8 10 12 14 16 18 20 22 24

• The normality condition for our test does not seem to be fulfilled. The test result could be misleading! • There are also tests that work without the normality assumption, see Section 5.1. 181

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

One- and two-sided hypothesis tests • Two-sided tests H0 : θ = θ0 versus H1 : θ 6= θ0 • One-sided tests – Tests with left-sided alternative hypothesis H0 : θ ≥ θ0 versus H1 : θ < θ0 Notice: Often, also in Wooldridge (2009), you can read H0 : θ = θ0 versus H1 : θ < θ0. This notation, however, is somewhat imprecise since either H0 or H1 has to be true. This is not made clear by the latter notation.

182

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

H0 : θ ≥ θ0 versus H1 : θ < θ0 f(t)

Critical value c

Probability of error a

0 Non-rejection region under H0

Region of rejection under H0

t

∗ Decision rule: t
⇒ Reject H0.

∗ You do not need a rejection region on the right hand side since all θ > θ0 are elements of H0 and thus fall into the nonrejection region. 183

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

∗ The critical value is obtained on basis of the density for θ = θ0 since then for a given critical value c the shaded area is larger than for any θ > θ0 and one prefers a test for which the maximum of the type I error is controlled.

184

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

Wage Example Continued: (In the following we ignore that wages are not normally distributed.) ∗ The null hypothesis states that mean hourly wages are US-$ 6 or more (H1 says it is less than US-$ 6): H0 : µ ≥ 6 versus H1 : µ < 6 ∗ Calculation of the test statistic: as in the two-sided case, because again µ0 is the boundary between null and alternative hypothesis: 5.896103 − 6 t(w1, . . . , w526) = = −0.64521878 0.161026

185

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

∗ Calculation of the critical value: For α = 0.05 the critical value (note: one-sided test) from the t distribution with 525 degrees of freedom (df) is 1.645. ∗ Decision: Since t = −0.64521878 > c = −1.645 the null hypothesis is not rejected.

186

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

– Test with right-sided alternative H0 : θ ≤ θ0 versus H1 : θ > θ0 f(t)

Critical value c Probability of error a

0 t Non-rejection region under H0 Region of rejection under H 0

As with left-sided alternatives, but reversed. • Why do we carry out one-sided tests? Consider the following issue: Provide statistical evidence that the mean wage is above $ 5.60. – Since by using statistical tests we can never confirm but only 187

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

reject a hypothesis, we have to choose the alternative hypothesis such that it reflects our conjecture. Here, this is a mean wage larger than $ 5.60. Rejecting the null hypothesis then provides statistical evidence for the alternative hypothesis. However, there are exceptions to this rule, see e.g. Sections 4.6 and 4.7. – We thus have to test if the mean wage is statistically significantly larger than $ 5.60. We therefore need a test with a one-sided alternative. Our pair of hypotheses is H0 : µ ≤ 5.60 versus H1 : µ > 5.60. – For α = P (T > c|H0) = 0.05 the critical value is c = 1.645. – Decision:

5.896103 − 5.60 t= = 1.8388521 > c = 1.645 0.161026 188

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

⇒ Reject H0 (for size 5%) that means data confirm that the mean wage is statistically significantly above $ 5.60. – If, on the contrary, we want to examine whether mean wages deviate from $ 5.60 in any direction, the pair of hypotheses is: H0 : µ = 5.60 versus H1 : µ 6= 5.60. Given the chosen significance level, α = 0.05, the critical values are -1.96 and 1.96, respectively, and hence −1.96 < 1.84 < 1.96. Thus, the null hypothesis cannot be rejected. – It is therefore easier to reject if one has knowledge about the location of the alternative because then the region of rejection can be made smaller and it is “easier” to reject the null hypothesis if it is false. 189

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

p-values • For every test statistic one can calculate the largest significance level for which — given a sample of observations — the computed test statistic would have just not led to a rejection of the null. This probability is called p-value (probability value). In case of a one-sided test with right-hand alternative one has (Wooldridge 2009, Appendix C.6, p. 776) P (T ≤ t(y)|H0) ≡ 1 − p • Since P (T > t(y)|H0) = 1 − P (T ≤ t(y)|H0), one also has P (T > t(y)|H0) = p and thus it is common to say that the p-value is the smallest significance level at which the null can be rejected. Cf. Section 4.2, p. 133 in Wooldridge (2009) 190

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• The decision rule of a test can also be stated in terms of pvalues: Reject H0 if the p-value is smaller than the significance level α.

Note: In the figure tˆ is shorthand for t(y). Left-sided test:

p = P (T < t(X, y)),

Right-sided test: p = P (T > t(X, y)) Two-sided test: p = P (T < −|t(X, y)|) + P (T > |t(X, y)|) 191

Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig

• Software packages (e.g. EViews) often give p-values for H0 : θ = 0 versus θ 6= 0. Reading: Appendix C.6 in Wooldridge (2009).

192

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

4.2 Probability Distribution of the OLS Estimator For the multiple regression model y = Xβ + u we assume MLR.1 to MLR.5, as we did in Sections 3.2 and 3.4. • Recall from Section 3.4.1 that under MLR.1 the OLS estimator ˆ = (X′X)−1X′y β

can be written as ˆ = β + (X′X)−1X′ u. β | {z } W

193

(4.1)

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

• In order to derive the probability distribution of a test statistic one needs the probability distribution of the underlying estimators since the former is a function of the latter. Furthermore, the probability distribution of the OLS estimator is necessary to construct interval estimators, see Section 4.5. Conditioning on the regressor matrix X, it follows from (4.1) that the probability distribution of the OLS estimator only depends on the error vector u. Similarly to the case of testing the mean we make the assumption that the relevant random variables are normally distributed.

194

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

• Assumption MLR.6 (Normality of Errors): Conditionally on the regressor matrix X, the vector of sample errors u is stochastically independently and identically normally distributed as ui|xi1, . . . , xik ∼ i.i.d.N (0, σ 2), i = 1, . . . , n.

Jointly with MLR.2, it can be equivalently written that u is multivariate normal with mean zero and variance-covariance matrix σ 2I u|X ∼ N (0, σ 2I).

• Of course, one could assume for the errors u any other probability distribution. However, assuming normally distributed errors has two advantages: 1. The probability distribution of the OLS estimator and derived test statistics can easily be derived, see the remaining sections. 195

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

2. Under certain conditions the resulting probability distribution for the OLS estimator holds even if the errors are not normally distributed. Then it is called asymptotic distribution, see Chapter 5. See Appendix B and D in Wooldridge (2009) for rules and properties of normally distributed random variables and vectors.

196

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

• Properties of the multivariate normal distribution: – If Z ∼ N (µ, σ 2), then aZ + b ∼ N (aµ + b, a2σ 2).

– If the random numbers Z and V are jointly normally distributed, then Z and V are stochastically independent if and only if Cov(Z, V ) = 0. – Every linear combination of a vector of identically and independently normally distributed random variables z ∼ N (µ, σ 2I) is also normally distributed. Let     w1 z1     . ..  .   w =  . , z =    wn

Then

zn

Pn



′z ∼ N w′µ, σ 2 w′w . w z = w j j j=1 197

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

More generally, it holds for z = (z1, . . . , zn)′ ∼ N (µ, σ 2I) and   w01 w02 · · · w0n    w11 w12 · · · w1n      W =  w21 w22 · · · w2n     .. .. ..    wk1 wk2 · · · wkn P  n w0j zj j=1     ..  = Wz ∼ N Wµ, σ 2WW′ .  (4.2)   Pn j=1 wkj zj

• The property (4.2) for linear combinations of normally distributed random numbers is very helpful for us since the OLS estimator (4.1) is just such a linear combination. 198

Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig

Thus, one obtains

  ˆ − β = Wu ∼ N 0, σ 2WW′ . β

Since WW′ = (X′X)−1X′X(X′X)−1 = (X′X)−1, one obtains   ˆ ∼ N β, σ 2(X′X)−1 . β Similarly one can show that

βˆj ∼ N



βj , σ 2ˆ

βj



(4.3)

2 σ 2 with σ ˆ = (see (3.15) in Section 3.4). 2 SST (1−R ) βj j j

• Note that (4.3) generalizes the example of Section 4.1 for testing hypotheses on the mean. If X is a column vector of ones, then βˆ0 = µˆ .

199

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

4.3 The t Test in the Multiple Regression Model • Derivation of the test  statistic  and its distribution – From (4.3) βˆj ∼ N βj , σ 2ˆ . βj

– Standardizing leads to

βˆj − βj ∼ N (0, 1) . σβˆ j

For estimated σ 2 (no proof) the test statistic follows a t gdistribution with n − k − 1 degrees of freedom. Estimating k + 1 regression parameters implies k + 1 restrictions from the normal equations βˆj − βj t(X, y) = ∼ tn−k−1. σˆβˆ j

200

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

• Critical region and decision rule – Two-sided test ∗ Hypotheses: H0 : βj = βj0 versus H1 : βj 6= βj0. For a given significance level one obtains the critical values from the table of the t distribution such that P (T < −c|H0) = α/2 and P (T > c|H0) = α/2 or equivalently 2 · P (T > c|H0) = α. ∗ Decision rule: · Reject H0 if |t(X, y)| > c, otherwise do not reject H0. · Alternatively: Calculate p-value p = P (|T | > |t(X, y)||H0) = 2 · P (T > t(X, y)|H0) and reject H0 if p < α, otherwise do not reject H0. 201

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

– One-sided test with left-sided alternative ∗ Hypotheses: H0 : βj ≥ βj0 versus H1 : βj < βj0. For a given significance level one obtains the critical value from the table of the t distribution such that P (T < c|H0) = α. ∗ Decision rule: · Reject H0 if t(X, y) < c, otherwise do not reject H0. · Alternatively: Calculate p-value p = P (T < t(X, y)|H0). and reject H0 if p < α, otherwise do not reject H0. 202

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

– One-sided test with right-sided alternative ∗ Hypotheses: H0 : βj ≤ βj0 versus H1 : βj > βj0. For a given significance level one obtains the critical value from the table of the t distribution such that P (T > c|H0) = α. ∗ Decision rule: · Reject H0 if t(X, y) > c, otherwise do not reject H0. · Alternatively: Calculate p-value p = P (T > t(X, y)|H0) and reject H0 if p < α, otherwise do not reject H0. 203

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

• Economic versus statistical significance – For a given (statistical) significance level α, the power of a test increases with increasing sample size since σβˆ in the denominator j of the test statistic decreases with sample size. – Not being able to reject a null hypothesis may thus be simple caused by a too small sample size (if the null hypothesis is wrong in the population). – On the other hand, if a variable has only weak influence in the population, its parameter will be significantly different from zero if the sample size is large enough. Thus, even if βj xj only has small economic impact on the dependent variable, the variable is statistically significant. – Be careful: In order to avoid estimation bias due to too small 204

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

models, significant variables must be kept in the model, see Section 3.4.1. • Choice of significance level – Two reasons for decreasing the significance level α with increasing sample size n: ∗ Larger sample sizes make tests more powerful. Thus, one can decide whether the benefits of a larger sample size is only attributed to reducing the Type II error β = 1 − π or whether one wants also to decrease the Type I error as well. In case of standard significance testing, the type I error represents the probability to include a variable in the model although it is irrelevant in the population model. Thus, it makes sense to reduce this probability as well.

205

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

∗ In general one selects relevant variables from a large number of possibly relevant variables. Since for each statistically significant variable a significance level α holds, one includes erroneously on average about αK redundant variables where K denotes the total number of variables considered. Since frequently K is allowed to increase with sample size n, the significance level α should fall in order to avoid αK to increase. – If one uses the Hannan-Quinn (HQ) (3.19) or the Schwarz (SC) (3.20) model selection criterion, then the significance level decreases with sample size. This is not the case for the AIC criterion (3.18).

206

Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig

• Insignificance, multicollinearity, and sample size – Recall: The test statistic t(X, y) is small since ∗ the deviation between the true value and the null hypothesis is small, for example between βj and βj0 ∗ or the estimated standard error σˆβˆ of βj is large. j The latter can also be caused by multicollinearity in X. Thus: A high degree of multicollinearity makes it more unlikely to reject the null hypothesis (since |t(X, y)| is small on average). – For this reason one may keep insignificant variables in the regression. However, corresponding parameter estimates have then to be interpreted with care. Reading: Appendices C.5, E.3 in Wooldridge (2009) if needed.

207

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

4.4 Example of an Empirical Analysis I: A Simplified Gravity Equation Trade Example Continued (from Section 3.5): Compare steps of an econometric analysis, see Section 1.2. 1. Question of interest: Quantify impact of changes of gdp in exporting country and changes in imports to Kazakhstan. 2. Economic model: Under idealized assumptions including complete specialization in production and identical consumption preferences among countries, no trading costs, and focusing exclusively on imports, economic theory implies (see Section II, equation (5) in Fratianni (2007)) β

importsi = A gdpi distancei 2 , 208

β2 < 0.

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

This implies a unit elasticity (elasticity of 1) of gdp on imports. This means that a 1% change in gdp in the exporting country increases imports by 1% as well. This hypothesis can be statistically tested. 3. Econometric model: The simplest econometric model is obtained by taking logs of the economic model and adding an error term. This delivers ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui. 4. Collecting data: see Appendix 10.4. 5. Selection and estimation of an econometric model: In practice, there may be further variables influencing imports. Thus, further control variables have to be added. Based on the Schwarz criterion the model selection exercise in Section 3.5 suggested to 209

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

add the control variable common colonizer since 1945 (Model 3), ln(importsi) = β0+β1 ln(gdpi)+β2 ln(distancei)+β3comcoli+ui. ==================================================================== Dependent Variable: LOG_IMP Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C -9.578911 4.273583 -2.241424 0.0297 LOG(WDI_GDPUSDCR_O) 1.356566 0.128793 10.53290 0.0000 LOG(CEPII_DIST) -1.144269 0.441293 -2.592991 0.0126 CEPII_COMCOL_REV 3.126534 0.674896 4.632616 0.0000 ==================================================================== R-squared 0.698665 Mean dependent var 15.97292 Adjusted R-squared 0.679832 S.D. dependent var 2.613094 S.E. of regression 1.478578 Akaike info criterion 3.693842 Sum squared resid 104.9372 Schwarz criterion 3.843937 Log likelihood -92.03988 Durbin-Watson stat 2.044512 ====================================================================

210

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

6. Model diagnostics: Check possible violation of MLR.5 (Homoskedasticity) by plotting the residuals against the fitted values and possible violation of MLR.6 (Normal errors) by plotting a histogram of the residuals: 3 12 RES_OLS_IMP_KAZ

2

Series: RES_OLS_IMP_KAZ Sample 1 55 Observations 52

10

1 0

8

-1

6

-2

4

-3 2 -4 10

12

14

16

18

FIT_OLS_IMP_KAZ

20

22

0 -4

-3

-2

-1

0

1

2

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

-2.95e-15 -0.103912 2.909987 -3.625761 1.434431 -0.295179 3.017182

Jarque-Bera Probability

0.755772 0.685309

3

The scatter plot does not indicate a violation of MLR.5. Why? Inspecting the box right to the histogram shows that the estimated kurtosis is close to 3 which is the theoretical value implied by the 211

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

standard normal distribution. Thus, we may continue to use this model. 7. Usage of the model: Conduct tests: A two-sided test • Now we can formulate the pair of statistical hypotheses: H0 : The elasticity of imports to gdp is 1. versus H1 : The elasticity is unequal to 1. H0 : β1 = 1 versus H1 : β1 6= 1. • Compute t statistic from the relevant line of the output Variable Coefficient Std. Error LOG(WDI_GDPUSDCR_O) 1.356566 0.128793

t-Statistic Prob. 10.53290 0.0000

βˆ1 − β10 1.356566 − 1 t(X, y) = = = 2.76852 σˆβˆ 0.128793 1

212

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

• Choose a significance level, e.g. α = 0.05. Compute critical values: The degrees of freedom are n − k − 1 = 52 − 3 − 1 = 48. One may obtain an approximate critical value from Table G.2 in Wooldridge (2009) or a precise critical value e.g. from – EViews using scalar crit = @qtdist(1-alpha/2,n-k-1) in the command window or – Excel using c =(TINV(alpha;n-k-1))=2.0106. (Note that the Excel function already assumes a two-sided test.) • Since

t(X, y) = 2.76852 > 2.0106 = c

one rejects the null hypothesis. • p-values can be computed in EViews using 213

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

scalar pval = 2*([email protected](t,n-k-1))= 0.0080. Thus, one can reject H0 even at the 1% significance level. The pvalue means that we would observe a t statistic of at least 2.787 in absolute value only in about 8 samples out of 1000 samples drawn. One-sided test • Now we can formulate the pair of statistical hypotheses with respect to the sign of β2, that is the impact of distance on imports. Since we want to provide evidence for β2 < 0, we put this into H1 : H0 : β2 ≥ 0 versus H1 : β2 < 0. • Compute t statistic from the relevant line of the output Variable LOG(CEPII_DIST)

Coefficient Std. Error -1.144269 0.441293

214

t-Statistic Prob. -2.592991 0.0126

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

βˆ2 − β20 −1.144269 − 0 = −2.592991 t(X, y) = = σˆβˆ 0.441293 2

• Choosing again α = 0.05, we compute the critical value using the EViews function scalar crit = @qtdist(alpha,n-k-1)=1.6772. • Since

t(X, y) = −2.592991 < −1.6772 = c

one rejects the null hypothesis. Thus, log distance has a statistically significant negative impact on imports at the given significance level. • The corresponding p-value using EViews is scalar pval = @ctdist(t,n-k-1)= 0.0063. Thus, distance has a negative impact even at the 1% significance level. 215

Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig

Interpretation of β3: In principle, this parameter can be interpreted like in a log-level model, see Section 2.6. However, since βˆ3 is very different from 0, and because the regressor is a dummy variable, one should use an exact formula to compute the relative change, see Section 6.3: ˆ3 β the precise value is e − 1 = 21.8. Thus, imports from countries with colonial ties are about 22 times larger than from countries from other countries keeping everything else fixed! These are very likely the trading patterns from the former Soviet Union. Note that we already considered other model specifications in Section 3.5. It might be interesting to check whether these test results are robust if other model specifications are used such as Model 2 or Model 4.

216

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

4.5 Confidence Intervals • How large is the probability that the estimated parameter value corresponds to the true value? • A parameter estimator — to be more precise, a point estimator — does not allow any conclusions how “close” the estimate is to the true value of the population. • Following the position of Sir Karl Popper who advocated the critical rationalism in the philosophy of science, point estimates are not very useful since it cannot be falsified. Instead, an empirical hypothesis is only scientific if it is falsifiable. • Example: Assume we predicted on basis of an econometric model a price index and obtained a predicted value of 5.12. the realized value, however, will be 5.24. → Then we made a wrong prediction 217

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

since it did not realize exactly. This “error” can only have three reasons: – The random error of the population regression model. – The estimation error of the sample regression model. – The regression model is not correct or (more realistic) it is a bad approximation. At least one of our assumptions is not justified. Problem: From an subjective point of view one can have different opinions about these “explanations”: – One believes that the deviation is due to the random error. – Another claims that the model is wrong.

218

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

Solution One should specify objective criteria such that one can make a scientific decision. These criteria should be determined before any predicted value realizes. Then one cannot escape a potential falsification of a hypothesis afterwards. This makes a hypothesis scientific in the sense of Popper. • Let’s be more precise: How large is the probability that the estimated value βˆj corresponds exactly to the true value βj if, as was shown in Section 4.3,   βˆj ∼ N βj , σ 2ˆ βj

219

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

and (βˆj − βj )/σβˆ ∼ N (0, 1), or if σβˆ is estimated, j j βˆj − βj ∼ tn−k−1 ? σˆ βˆ j

• Alternative question: How large is the probability that the true value βj lies in the interval [βˆj − c · σˆ ˆ , βˆj + c · σˆ ˆ ] βj

βj

where c is given? Note that the endpoints of the interval are random prior to obtaining a sample. Its location is random through βˆj and its length is random through σˆβˆ j

This interval is the most well known example of an interval estimator. 220

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

• Answer for given σβˆ : j How large is the probability that the true value βj is contained in the interval [βˆj − c · σβˆ , βˆj + c · σβˆ ] where the value c is chosen j j by you?

221

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

– It is 2Φ(c) − 1 since     P βˆj − cσβˆ ≤ βj ≤ βˆj + cσβˆ = P −cσβˆ ≤ βj − βˆj ≤ cσβˆ j j j j   βj − βˆj = P −c ≤ ≤ c σβˆ j   βˆj − βj = P −c ≤ ≤ c σβˆ j

= Φ(c) − Φ(−c) = Φ(c) − (1 − Φ(c)) = 2Φ(c) − 1.

– Example: For c = 1.96 one obtains Φ(1.96) − Φ(−1.96) = 0.975 − 0.025 = 0.95: 222

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

The true value βj will be with 95% probability within the interval βˆj ± c · σβˆ . One also relates this probability to α by writing j 0.95 = 1 − α. Thus one has α = 0.05. • Answer for estimated σβˆ : The true value βj lies in the interval j βˆj ±c· σˆβˆ with probability 1−α. Note, however, that for computing j the probability one has to use the tn−k−1 distribution since     ˆj − βj β σβˆ ≤ βj ≤ βˆj + cˆ σβˆ = P −c ≤ ≤ c . P βˆj − cˆ j j σˆβˆ j

• The interval

[βˆj − c · σˆβˆ , βˆj + c · σˆ βˆ ] j j

is called confidence interval. One says that the confidence interval contains the true value with a probability of confidence of 223

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

(1 − α)100%. The value (1 − α) is also called confidence level or coverage probability of the confidence interval. • In practice one determines the confidence level 1−α and then computes the value c using the appropriate distribution: either N (0, 1) or tn−k−1. • Note: – The constant c corresponds to the (upper) critical value of a two-sided test with significance level α. – Since the confidence interval is a random interval, its location and length is in general different for each sample. – The larger (1 − α), the smaller α, the larger is the confidence interval. In other words: the more you want to be on the safe side, the larger the confidence interval becomes. Why? 224

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

– A two-sided t test and a confidence interval contain the same amount of information. The null hypothesis of a two-sided t test is rejected if and only if the value of the null hypothesis lies outside the confidence interval. Draw a graph to make this clear. – If keep drawing new samples from a population, how many confidence intervals do not contain the true value on average? • Trade Example Continued (from Section 4.4): – Compute a 95% confidence interval for the elasticity βgdp of imports with respect to gdp. – From Section 4.4 it can be justified that MLR.1 to MLR.6 hold and imports are normally distributed. – Since σβˆ

gdp

has to be estimated, one has to use the t distribution

with n − k − 1 = 48 degrees of freedom. For a confidence level 225

Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig

of 0.95 one obtains α = 0.05 and thus c = 2.0106 (e.g. by using EViews scalar crit = @qtdist(1-0.05/2,52-3-1)). – The relevant line of output was, see Section 4.4: Variable Coeff. LOG(WDI_GDPUSDCR_O) 1.356566

Std.Err. 0.128793

t-Stat. 10.53290

Prob. 0.0000

– Therefore the 95% confidence interval is given by [βˆj − c · σˆ ˆ , βˆj + c · σˆ ˆ ] βj

βj

[1.3566 − 2.0106 · 0.1288 , 1.3566 + 2.0106 · 0.1288] [1.0976 , 1.6156].

– The elasticity of imports with respect to gdp falls with 95% probability within the range between 1.0976 and 1.6156. Note that 1 is not included in the confidence interval. This reflects the test result in Section 4.4 of rejecting H0 : βgdp = 1. 226

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

4.6 Testing a Single Linear Combination of Parameters • Example: Cobb-Douglas production function log Y = β0 + β1 log K + β2 log L + u, where Y denotes output, K and L denote the production factors capital and labor, respectively. Note that β1 and β2 are elasticities here. If the restriction β1 + β2 = 1 holds true, the production function has constant returns to scale, e.g. a 1% increase of labor and capital leads to a 1% increase of output on average.

227

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

For an empirical test of constant returns to scale, we employ the following pair of hypotheses: H0 : β1 + β2 = 1 versus H1 : β1 + β2 6= 1. • How to construct the test statistic: 1. First, define auxiliary parameters θ and θ0, where θ = β1 + β2,

θ0 = 1,

or, equivalently H0 : θ = θ0 versus H1 : θ 6= θ0.

228

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

2. Second, solve θ for one of the parameters βi, here β1 β1 = θ − β2 and insert it into the initial regression equation and reformulate the latter to log Y = β0 + (θ − β2) log K + β2 log L + u log Y = β0 + θ log K + β2 (log | L− {z log K)} +u. new variable Then estimate (4.4) and obtain the test statistic θˆ − θ0 tθ = σˆ θˆ

(4.4)

which can be directly calculated from the estimation of (4.4).

229

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

Example: In a classical marketing model we regress (the natural logarithm of) sales (S) of a consumer good on (the natural logarithm of) this good’s price (P ) as well as on (the natural logarithms of) cross prices (PK1, PK2) of competing goods. The following regression output is calculated from the data: Dependent Variable: log(S), Method: Least Squares, Obs: 6917 Variable

Coeff.

Std.Err.

C 4.407786 LOG(P) -3.955281 LOG(P_K1) 0.710274 LOG(P_K2) 1.154163 R-squared Adj. R-squared S.E. regression Sum squared resid F-statistic

t-Stat.

Prob.

0.079559 55.40268 0.0000 0.068095 -58.08499 0.0000 0.073912 9.609683 0.0000 0.079815 14.46046 0.0000

0.332264 0.331974 1.021815 7217.910 1146.631

Mean dependent var S.D. dependent var Akaike info crit. Schwarz criterion Prob(F-statistic)

230

2.824244 1.250189 2.881617 2.885573 0.000000

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

We wish to test the following statement: the cross price elasticities are identical, keeping everything else fixed (ceteris paribus) (though the competing goods come from different market segments). • The initial hypotheses are given by

H0 : βK1 = βK2 versus H1 : βK1 6= βK2. We reformulate them by re-parametrization according to θ = βK1 − βK2, θ0 = 0 H0 : θ = 0 versus H1 : θ 6= 0.

• Thus, due to βK1 = θ + βK2, the initial regression model ln(S) = β1 + β2 ln(P ) + βK1 ln(PK1) + βK2 ln(PK2) + u can be rendered to ln(S) = β1 + β2 ln(P ) + θ ln(PK1) + βK2(ln(PK2) + ln(PK1)) + u. 231

Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig

• Given the estimates of the last regression

Dependent Variable: log(S), Method: Least Squares, Obs: 6917 Variable C LOG(P) LOG(P_K1) LOG(P_K1)+LOG(P_K2) R-squared Adj. R-squared S.E. regression Sum squared resid F-statistic

Coeff.

Std.Err.

t-Stat.

Prob.

4.407786 -3.955281 -0.443889 1.154163

0.079559 0.068095 0.112543 0.079815

55.40268 -58.08499 -3.944165 14.46046

0.0000 0.0000 0.0001 0.0000

0.332264 0.331974 1.021815 7217.910 1146.631

Mean dependent var S.D. dependent var Akaike info crit. Schwarz criterion Prob(F-statistic)

calculate t statistic as t=

2.824244 1.250189 2.881617 2.885573 0.000000

−0.443889 − 0 ≈ −3.94. 0.112543

For a given significance level of α = 0.05, the critical values are -1.96 and 1.96. Thus, we have to reject H0. Reading: Sections 4.3-4.4 in Wooldridge (2009). 232

Intensive Course in Econometrics — Section 4.7 — UR March 2009 — R. Tschernig

4.7 Jointly Testing Several Linear Combinations of Parameters: The F Test Some examples of possible restrictions within the MLR framework: 1. H0 : β1 = 3 2. H0 : β2 = βk 3. H0 : β1 = 1, βk = 0 4. H0 : β1 = β3, β2 = β4 5. H0 : βj = 0, j = 1, . . . , k 6. H0 : βj + 2βl = 1, βk = 2 We can already check case 1. and case 2. by applying t tests. For all other cases we need the F test.

233

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

4.7.1 Testing of Several Exclusion Restrictions Trade Example Continued (from Section 4.5): Consider Model 4 in Section 3.5: ==================================================================== Dependent Variable: LOG_IMP, Method: Least Squares Sample: 1 55 Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C -13.02677 4.726426 -2.756157 0.0083 LOG(WDI_GDPUSDCR_O) 1.317565 0.129079 10.20740 0.0000 LOG(CEPII_DIST) -0.624909 0.542325 -1.152279 0.2550 CEPII_COMCOL_REV 3.351230 0.678912 4.936177 0.0000 CEPII_COMLANG_OFF 2.109579 1.319296 1.599018 0.1165 ==================================================================== R-squared 0.714213 Mean dependent var 15.97292 Adjusted R-squared 0.689890 S.D. dependent var 2.613094 S.E. of regression 1.455167 Akaike info criterion 3.679330 Sum squared resid 99.52303 Schwarz criterion 3.866950 Log likelihood -90.66258 F-statistic 29.36447 Durbin-Watson stat 2.195759 Prob(F-statistic) 0.000000

234

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Are the control variables common colonizer since 1945 (CEPII COMCOL REV) and common official language (CEPII COMLANG OFF) really needed in the specification of Model 4 ? To put it more precisely, are both parameters of two variables mentioned jointly significantly different from zero? H0 : βcomcol rev = 0 and βcomlang of f = 0 versus H1 : βcomcol rev = 6 0 and/or βcomlang of f 6= 0

235

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

How can one jointly test several hypotheses? • Note that SSR decreases (or stays constant) with an additional regressor. ⇒ Idea: Compare the SSR of a model on which the null hypotheses are imposed (restricted model) with the SSR of another model that does not impose the joint restrictions (unrestricted model). • The estimation under H0 is easy: simply exclude all regressors from the regression whose parameters under H0 are set to zero and reestimate the restricted model. In case of Model 4 for the trade example the OLS estimates are for the restricted model (that corresponds to Model 2 in Section 3.5):

236

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

==================================================================== Dependent Variable: LOG_IMP Method: Least Squares Sample: 1 55 Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 4.800950 3.497341 1.372743 0.1761 LOG(WDI_GDPUSDCR_O) 1.088546 0.137001 7.945508 0.0000 LOG(CEPII_DIST) -1.970804 0.480555 -4.101103 0.0002 ==================================================================== R-squared 0.563937 Mean dependent var 15.97292 Adjusted R-squared 0.546138 S.D. dependent var 2.613094 S.E. of regression 1.760423 Akaike info criterion 4.024946 Sum squared resid 151.8554 Schwarz criterion 4.137518 Log likelihood -101.6486 F-statistic 31.68448 Durbin-Watson stat 2.117895 Prob(F-statistic) 0.000000 ====================================================================

237

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Results: – The R2 of the unrestricted model is 0.7142 while the R2 of the restricted model is (only) 0.5640. – Correspondingly, the standard error of regression σˆ increases from 1.4551 to 1.7604. – Are these changes large? It looks like that but what does “large” really mean here? – Note that all three model selection criteria, AIC, HQ, and SC, “prefer” the unrestricted model, see Section 3.5. Will this finding be confirmed by the test?

238

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

• In order to be able to use a statistic (a function that can be computed from sample values) as a test statistic, one has to know its probability distribution under the null hypothesis H0. One can show (→ advanced econometrics course or Section 4.4 in Davidson & MacKinnon (2004)) that the following test statistic follows an F distribution (SSRH0 − SSRH1 )/q F = ∼ Fq,n−k−1. SSRH1 /(n − k − 1)

Therefore this test is called F test and the test statistic is abbreviated as F statistic.

• Note that the F distribution has two different degrees of freedom, q degrees of freedom for the random variable in the numerator, and n−k−1 degrees of freedom for the random variable in denominator. The value q contains the number of restrictions that are jointly 239

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

tested. • Details of the F statistic: – Its minimum is 0 since SSRH0 ≥ SSRH1 and SSRH1 > 0. (Therefore the F statistic cannot be normally distributed!) – There is no upper bound. • When should the joint null hypothesis be rejected? – The larger the absolute difference between the SSRs of the restricted and the unrestricted model, SSRH0 − SSRH1 , the more likely is a violation of the exclusion restrictions. – However, be aware that absolute differences do not say much. Why?

240

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

– It makes much more sense to consider the relative difference between the SSRs. This is exactly what the F statistic does. It scales the difference in SSRs by the SSR of the unrestricted model. If the relative difference is large, then the joint null hypothesis is likely to be violated. – On the other hand, if the relative difference is small, then it is likely that the excluded variables do not have any relevant impact in the unrestricted model since they can be neglected without any noticeable effect. • Decision rule: Reject H0 if the test statistic is larger than the critical value: Reject H0 if F > c. Thus, the critical region is (c, ∞). 241

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Calculation of the critical region: For a given significance level α, the critical value c is implicitly defined by the probability P (F > c|H0) = α. The corresponding value for c given α can be found in tables on the F distribution, e.g. Table G.3 in Appendix G in Wooldridge (2009) or be computed in EViews or Excel. (For the latter one has (Finv(0.05;q;n-k-1) for alpha=0.05).

242

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Trade Example Continued (from the beginning of this section): • The joint null hypothesis contains two exclusion restrictions, thus the degree of freedoms for the numerator are two, q = 2. The degrees of freedom for the denominator correspond to the degrees of freedom of Model 4, n − k − 1 = 52 − 4 − 1 = 47. Choosing a significance value of α = 0.05, we check Table G.3 in Appendix G in Wooldridge (2009) for the appropriate critical value. Listed values are F2,40 = 3.23 and F2,60 = 3.15. While the former implies a true significance level smaller than 0.05, the latter implies one above 0.05. If one is interested in an exact critical value, one can obtain it from Excel as Finv(0.05;2;47) = 3.1951. • Collecting the SSRs from the regression outputs for Model 4 and Model 2 at the beginning of the section, the F statistic can be

243

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

computed as (151.855 − 99.523)/2 = 12.3570. F = 99.523/(47) Since F = 12.3570 > 3.1951 = c, reject H0 on a significance level of 5%. • Check that the same decision holds for a significance level of 1%. The two variables common colonizer since 1945 (CEPII COMCOL REV) and common official language (CEPII COMLANG OFF) are statistically significant at the 1% significance level and thus at least one of the two variables has an impact on imports on the 5% as well as on the 1% significance level.

244

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Calculation of p-values for F statistics: • In empirical work one is frequently interested in the largest significance level for which the null hypothesis can be rejected given the observed test statistic. As explained in Section 4.1, this information is provided by the pvalue. Alternatively, it is the smallest significance level at which the null cannot be rejected. Given the significance level that was chosen prior to any calculations, the null hypothesis is rejected if the p-value is smaller than the given significance level α.

245

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

• Trade Example Continued: The p-value can be computed (in Excel =FVERT(K10;2;47)= 4.87099E-05= 4.87099·10−05. The p-value can also be calculated in EViews, see below. Thus, there is extremely strong statistical evidence against the null hypothesis. Direct Calculation of the F statistic in EViews: • In the Equation window one clicks on View and then on Representations where one can read how the parameter numbering relates to the regressor variables. • One again clicks on View and then on Coefficient Tests and further on Wald-Coefficient Restrictions .... Then one enters all restrictions into the opened EViews window Wald Test. For the test of the joint significance of the control variables in the

246

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

trade example one types C(4)=0,C(5)=0 and confirms: Wald Test: Equation: EQ_LN_LN_COL_R_LANG_OFF ==================================================================== Test Statistic Value df Probability ==================================================================== F-statistic 12.35704 (2, 47) 0.0000 Chi-square 24.71407 2 0.0000 ==================================================================== Null Hypothesis Summary: ==================================================================== Normalized Restriction (= 0) Value Std. Err. ==================================================================== C(4) 3.351230 0.678912 C(5) 2.109579 1.319296 ==================================================================== Restrictions are linear in coefficients.

Note that the Chi-square test statistic is kind of a variant of the F statistic which is useful in large samples. Here, it will not be further discussed. 247

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Remarks: • One can, of course, test the simple null hypothesis with a two-sided alternative H0 : βj = 0 versus H1 : βj 6= 0 by means of an F test.

It holds that the square of a random variable X that follows a t distribution with n − k − 1 degrees of freedom just corresponds to a random variable that follows an F distribution with (1, n − k − 1) degrees of freedom X ∼ tn−k−1

=⇒

X 2 ∼ F1,n−k−1.

Therefore, a two-sided t test and an F test lead to exactly the same result for the pair of hypotheses above.

248

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

• It may happen that each regressor tested by itself is not statistically significant but if they are jointly tested they are statistically significant (at the same significance level). This is a sign of multicollinearity between the regressors considered. Then, the given sample size is only sufficient for providing statistical significance jointly for both regressors. However, it is not sufficient for providing statistical evidence for each regressor separately. In such cases you may check the covariance between the parameter estimates that are included in the test (in EViews View → Covariance Matrix). • It may also happen that one variable is statistically significant but if jointly tested with other variables it becomes insignificant. This can happen if the other variables that are included in the joint hypothesis are redundant in the population regression. In this case, the power of a single hypothesis test is weakened by the other irrelevant variables.

249

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

• Thus, there is no general rule on whether to prefer joint or single tests results. • Trade Example Continued (from the middle of this section): Comparing four different model specifications using model selection criteria, see Section 3.5, HQ and AIC favor Model 4. Inspecting its parameter estimates at the beginning of this section, one finds two parameters to be statistically insignificant even at the 10% level: βdistance and βcommon of f language. Why, then, was this Model 4 found to be best by HQ and AIC? Answer: The parameter estimators for βdistance and βcom. of f. lang. might be highly correlated so that only a joint impact is significant. One reason could be that a lot of variation of distance can be explained by common of f. language, among other things. In this case, it 250

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

can be expected that if both parameters are jointly tested by means of an F test, they are statistically significant. Test the pair of hypotheses: H0 : βdistance = 0 and βcomlang of f = 0 H1 : βdistance 6= 0 and/or βcomlang of f 6= 0 The EViews output is: Wald Test: Equation: EQ_LN_LN_COL_LANG Test Statistic Value

df

Probability

F-statistic 4.749269 Chi-square 9.498538

(2, 47) 2 0.0087

0.0132

Thus, reject H0 at the 5% significance level. The previous conjecture is statistically confirmed. 251

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

The effect of multicollinearity can nicely be seen in 6 5 4

C(5)

3 2 1 0 -1 -2 -2.5 -2.0 -1.5 -1.0 -0.5

0.0

0.5

1.0

C(3)

Note that C(3) and C(5) correspond to βˆdistance and βˆcom.

of f. lang. ,

respectively. The ellipse

is a generalization of confidence intervals to two dimensions. Thus, all points outside the ellipse are joint null hypotheses that are rejected. Note that the origin also lies outside while the zero is included in each one-dimensional confidence interval. (Get the graph in EViews via Views → Coefficient Tests → Confidence Ellipse.) More on confidence ellipses, e.g. in Davidson & MacKinnon (2004).

252

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

• R2 version of the F statistic: If a regression model contains a constant, then the decomposition SSR = SST(1 − R2) holds. Inserting each SSR into the F statistic delivers 2 − R2 )/q (RH H0 1 F = ∼ Fq,n−k−1. 2 (1 − RH )/(n − (k + 1)) 1

Note: – SST is canceled if the dependent variable y is the same under H0 and H1 as, for example, in case of exclusion restrictions. However, this is not always true if general linear restrictions are tested. – There can be slight differences between both versions of the F statistic due to rounding errors.

253

Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig

Overall F Test Standard software packages (such as EViews) include in their OLS output for the multiple regression model y = β0 +β1x1 +. . .+βk xk +u the F statistic and its p-value for the pair of hypotheses: “None of the (non-constant) regressors has impact on the dependent variable and thus the corresponding parameters are all zero.” H0 : β1 = · · · = βk = 0 (and y = β0 + u) H1 : βj 6= 0 for at least one j = 1, . . . , k. If H0 is not rejected, this possibly indicates that - all regressors are possible badly/wrongly chosen, - or at least a substantial number of regressors has no impact on y, - or too many regressors were considered for given sample size n. This test is a first rough check for the validity of the model.

254

Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig

4.7.2 Testing of Several General Linear Restrictions • Generalization of the F test for exclusion restrictions. • Works equivalently by computing the relative change in the SSRs. • R2 version cannot be used in this case!

Examples of possible pairs of hypotheses: H0 : β2 = β3 = 1 versus H1 : β2 6= 1 and/or β3 6= 1, H0 : β1 = 1, βj = 2βl versus H1 : β1 6= 1 and/or βj 6= 2βl . Trade Example Continued (from previous subsection): • One may conjecture that due to the multicollinearity between the estimates for distance and common of f icial language the impact of distance might be underestimated in absolute value while the

255

Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig

impact of language is zero. Thus, consider the pair of hypotheses: H0 : βdistance = −1 and βcomlang of f = 0 H1 : βdistance 6= −1 and/or βcomlang of f 6= 0 In order to compute the SSR under H0 impose these restrictions on the regression as log(imports) − (−1) log(distance) = β0 + βgdp log(gdp) + βcomcol cepii comcol rev + u

256

Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig

The EViews output is: Dependent Variable: LOG_IMP+LOG(CEPII_DIST) Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C -10.49513 3.196796 -3.283015 0.0019 LOG(WDI_GDPUSDCR_O) 1.344714 0.122454 10.98137 0.0000 CEPII_COMCOL_REV 3.215739 0.611625 5.257696 0.0000 ==================================================================== R-squared 0.720026 Mean dependent var 24.24218 Adjusted R-squared 0.708599 S.D. dependent var 2.713964 S.E. of regression 1.465041 Akaike info criterion 3.657604 Sum squared resid 105.1709 Schwarz criterion 3.770176 Log likelihood -92.09771 F-statistic 63.00827 Durbin-Watson stat 2.053004 Prob(F-statistic) 0.000000 ====================================================================

257

Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig

This allows to compute the F statistic  SSRH0 − SSRH1 /q F = SSRH1 /(n − k − 1) (105.1709 − 99.5230)/2 = 1.333618 < c = 3.195. = 99.5230/47 Alternatively, one can conduct this general F test directly in EViews via the Wald test option C(3)=-1,C(5)=0: Wald Test: Equation: EQ_LN_LN_COL_LANG ==================================================================== Test Statistic Value df Probability ==================================================================== F-statistic 1.333603 (2, 47) 0.2733 Chi-square 2.667205 2 0.2635

→ The claim that a “common official language has no impact while distance has an elasticity of −1” cannot be rejected at any reasonable significance level since the p-value is about 27%. 258

Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig

4.8 Reporting Regression Results In general, empirical researchers investigate a number of different specifications of regression functions. In order to make visible how robust the conclusions are with respect to model choice it is good practice to report the results of the most important specifications so that each reader can evaluate the findings in her own manner. This is most easily achieved by summarizing the relevant results in a table, see the example below.

259

Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig

For each specification a minimum number of results should be: • OLS parameter estimates βˆj of the regression parameters βj , j = 0, 1, . . . , k (plus variable names), • Standard error of βˆj , σˆ ˆ , βj

• Number of observations n, • R2 and adjusted R2,

• Standard error of regression or estimated variance of the regression error σˆ 2. If possible, one should also report • Model selection criteria such as AIC, HQ or SC, • Sum of squared residuals (SSR). Based on the SSRs one can easily compute F tests. 260

Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig

Dependent Variable: ln(imports to Kazakhstan) Independent Variables/Model constant ln(gdp) ln(distance)

Trade Example Continued:

common ”colonizer” since 1945

(1)

(2)

(3)

(4)

-3.461

4.801

-9.579

-13.027

(3.280)

(3.497)

0.770

1.089

(0.130)

(0.137)

(0.129) (0.129)



-1.971

-1.144

(0.481)

(0.441) (0.542)





(4.274) (4.726) 1.357

3.127

1.318 -0.625 3.351

(0.675) (0.679) common of f icial language







2.110 (1.319)

Number of observations

52

52

52

52

R2

0.414

0.564

0.699

0.714

Standard error of regression

2.020

1.760

1.479

1.455

Sum of squared residuals

203.980 151.855 104.937 99.523

AIC

4.2816

4.0249

3.6938

3.6793

HQ

4.3103

4.0681

3.7514

3.7513

SC

4.3566

4.1375

3.8439

3.8670

Reading: Sections 4.5-4.6 in Wooldridge (2009). 261

5 Multiple Regression Analysis: Asymptotics

The assumption of a normal (or gaussian) distribution MLR.6 is frequently violated in empirical practice. How can we then proceed to calculate test statistics or confidence intervals? 5.1 Large Sample Distribution of the Mean Estimator • Example: Testing the mean of hourly wages: the empirical distribution is steep at the left and skewed to the right (as is typical for 262

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

prices and wages which are not generated additively). 140 Series: WAGE Sample 1 526 Observations 526

120 100 80 60 40 20 0 0

2

4

6

8

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

5.896103 4.650000 24.98000 0.530000 3.693086 2.007325 7.970083

Jarque-Bera Probability

894.6195 0.000000

10 12 14 16 18 20 22 24

• Examples of random variables with right-skewed distribution:

– A χ2(m) distributed random variable X is defined as the sum of m squared i.i.d. standard normal random variables m X X= u2j , uj ∼ i.i.d.N (0, 1). j=1

263

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

(Details on the χ2 distribution can be found in Appendix B in Wooldridge (2009).) 1.4 Chi-Quadrat(1)

1.2

1

0.8

0.6

0.4

0.2

0 0

1

2

3

4

5

6

7

8

Density of the χ2(1) distribution.

Moments of a χ2(1) distributed random variable: h i E[X] = E u2 = V ar(u) + E[u]2 = 1, h i V ar(X) = E X 2 − E[X]2 = E[u4] − 12 = 2, u2 − 1 X − 1 √ = √ ∼ (0, 1). 2 2

264

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

Note that for a standard normal random variable we have E[u4] = 3. – Linear functions of a χ2(1) distributed random variable, e.g. u2i − 1 yi = ν + σy √ , 2

ui ∼ i.i.d.N (0, 1).

(5.1)

Moments: E[yi] = ν, u2i − 1 V ar(yi) = V ar σy √ 2

265

!

= σy2 V ar

u2i √ 2

!

= σy2 .

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• Expectation and variance of mean estimators n X 1 yi . µˆ n = n i=1

E[ˆ µn ] =

n X 1

n

E[yi] = ν,

i=1 n X

1 V ar(yi) σy2 V ar (ˆ µn ) = 2 V ar(yi) = = , n n n i=1 σy sd (ˆ µn ) = √ . n In this example the estimator is unbiased and the variance decreases with rate n as sample size increases.

266

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• Consistency of an estimator θˆn: For every ǫ > 0 and δ > 0 there exists an N such that   P |θˆn − θ| < ǫ > 1 − δ for all n > N . Alternatively: – – –

  limn→∞ P |θˆn − θ| < ǫ = 1, plim θˆn = θ, p ˆ θn −→ θ.

The “plim” notation stands for probability limit. This concept of convergence is usually denoted as convergence in probability or (weak) consistency. Some notes on calculation rules for the “plim” are given in Appendix C.3 in Wooldridge (2009).

267

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

A consistent estimator θˆn has the properties h i – limn→∞ E θˆn = θ and   – limn→∞ V ar θˆn = 0.

If one of these conditions fails to hold, the estimator is called inconsistent. In general: • Weak law of large numbers (WLLN): For yi ∼ i.i.d. with −∞ < E[yi] = µ < ∞, the mean estimator Pn 1 µˆ n = n i=1 yi is weakly consistent, that is p

µˆ n −→ µ.

• Then we can consistently estimate the variance of i.i.d. random Pn 1 2 2 variables wi ∼ i.i.d.(µw , σw ) with σ˜ = n i=1(wi − µw )2. Why? 268

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• But how can we derive the asymptotic probability distribution of the mean estimator µˆ n? • Monte Carlo Simulation (MC): The EViews-program mcarlo1 est mu.prg allows us to iteratively draw R = 1000 samples of size n with elements {y1, . . . , yn}, where yi ∼ i.i.d.(ν, σy2 ) with ν = 3 and σy2 = 1 and yi is generated from (5.1). One frequently calls (5.1) the data generating process (DGP). For every sample {y1r , y2r , . . . , ynr } generated in this way, Pn r 1 r where r = 1, . . . , 1000, the mean estimator µˆ = n i=1 yi is calculated and stored. After all iterations, a histogram is calculated based on R estimates µˆ 1, µˆ 2, . . . , µˆ R.

269

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

First, the results for the simulated moments: Elements in sample average n

std. deviation true std. deviation

of estimated means

of MC DGP

10

2.993908 0.298423

0.3162278

30

3.003103 0.182773

0.1825742

50

3.001263 0.142403

0.1414214

100

3.001637 0.098750

0.1

500

2.999619 0.046355

0.04472136

1000

3.000503 0.031865

0.03162278

– The true moments are accurately estimated, – and we can observe how the LLN works.

270

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig n = 10

n = 30

70

140 Series: MU_HAT Sample 1 1000 Observations 1000

60 50 40 30 20 10 0 2.50

2.75

3.00

3.25

3.50

3.75

Series: MU_HAT Sample 1 1000 Observations 1000

120

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

2.993908 2.966680 4.061612 2.394145 0.298423 0.621673 3.250171

Jarque-Bera Probability

67.02061 0.000000

100 80 60 40 20 0

4.00

2.6

2.8

3.0

3.2

n = 50

3.4

3.6

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

3.003103 2.991643 3.795529 2.562306 0.182773 0.572018 3.450607

Jarque-Bera Probability

62.99434 0.000000

3.8

n = 100

160

120 Series: MU_HAT Sample 1 1000 Observations 1000

120

80

40

0 2.6

2.8

3.0

3.2

3.4

Series: MU_HAT Sample 1 1000 Observations 1000

100

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

3.001263 2.992116 3.650012 2.595097 0.142403 0.509651 3.745789

Jarque-Bera Probability

66.46577 0.000000

80 60 40 20 0

3.6

2.750

2.875

3.000

3.125

n = 500

3.250

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

3.001637 2.994952 3.434677 2.721445 0.098750 0.383232 3.350859

Jarque-Bera Probability

29.60699 0.000000

3.375

n = 1000

100

140 Series: MU_HAT Sample 1 1000 Observations 1000

80

60

40

20

0 2.90

2.95

3.00

3.05

3.10

3.15

Series: MU_HAT Sample 1 1000 Observations 1000

120

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

2.999619 2.998812 3.201741 2.861536 0.046355 0.198532 3.293122

Jarque-Bera Probability

10.14917 0.006254

3.20

271

100 80 60 40 20 0 2.90

2.95

3.00

3.05

3.10

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

3.000503 2.999499 3.108631 2.905122 0.031865 0.170954 2.950809

Jarque-Bera Probability

4.971676 0.083256

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• Results for simulated distributions: – Right-skewness decreases with increase in sample size n. – A test for normality (Jarque-Bera-Test): null hypothesis of normal distribution cannot be rejected for large n. Theoretical explanation of this phenomenon: a cental limit theorem holds under certain (rather weak) conditions that is one of the most important tools in statistics! • Central limit theorem (CLT): Pn 1 2 2 For yi ∼ i.i.d.(µ, σ ) with 0 < σ < ∞, µˆ n = n i=1 yi is asymptotically normally distributed: √ d n (ˆ µn − µ) −→ N (0, σ 2). .

272

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

– Interpretation: the larger the number of sample elements n, the more precise is the approximation of the exact distribution of µˆ n (see the MC results) by an exactly specified normal distribution. Hence the label large sample distribution. – But how good is the asymptotic approximation for a given sample size n? ∗ The CLT is not informative on this question, though we may get an answer by conducting MC simulations for certain cases or by using rather involved finite sample statistics. ∗ Experience: as the distribution of the yi approaches the normal distribution, smaller and smaller n suffice for a very good approximation. In some cases even n = 30 is enough. Do some experiments using mcarlo1 est mu.prg, and grad-

273

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

ually increase the degrees of freedom r of the χ2 distribution (and observe that the skewness decreases in this process)! – Alternative notations (Φ(z) is the cumulative distribution function of the standard normal distribution):   √ µˆ n − µ d n −→ N (0, 1)   σ  √ µˆ n − µ P n ≤ z −→ Φ(z) σ µˆ n − µ approx √ ∼ N (0, 1) σ/ n   2 σ approx µˆ n ∼ N µ, n

(5.2) (5.3) (5.4) (5.5)

Notation: the mean estimator is asymptotically normally distributed.

274

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• In large samples the standardized mean estimator is approximated by a standard normal distribution. Then, due to (5.4) t(w1, . . . , wn) = µˆσ−µ ∼ N (0, 1) µˆ µˆ −µ approx t(w1, . . . , wn) = σ ∼ N (0, 1)

wi ∼ i.i.d.N (µ, σ 2) wi ∼ i.i.d.(µ, σ 2)

µˆ

and it can be shown that wi ∼ i.i.d.N (µ, σ 2) wi ∼ i.i.d.(µ, σ 2)

t(w1, . . . , wn) = µˆσˆ−µ ∼ tn−k−1 µˆ µˆ −µ approx t(w1, . . . , wn) = σˆ ∼ N (0, 1) µˆ

and we get the following (very convenient) result: the (small sample) theory of t tests and confidence intervals for the mean estimator of i.i.d. variables holds approximately in large (enough) samples. 275

Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig

• Hence the test results in our empirical exercise are still approximately valid! • How about this concept of validity in a regression context?

276

Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig

5.2 Large Sample Inference for the OLS Estimator • The OLS-estimator

−1 ′ ′ ˆ X u = β + Wu β=β+ XX

depends on X or W. Hence, for the OLS estimator to be consistent and asymptotically normal, certain conditions must hold for the regressor variables as n → ∞. One of these conditions is that for all Pn 1 i, l = 0, 1, . . . , k we have plim n i=1 xij xil = E[xj xl ] = aij or 1 ′ p X X −→ A. n

277

(5.6)

Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig

• Asymptotic normality of the OLS estimator All necessary conditions for asymptotic normality are fulfilled if the standard assumptions MLR.1-MLR.5 hold true. Then (see a sketch of proof in Appendix E.4 in Wooldridge (2009)):    √  d ˆ − β −→ N 0, σ 2A . n β (5.7)

278

Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig

For the (asymptotic) distributions of the t statistics we get: βˆj −βj MLR.1-MLR.6 t (X, y) = σ ∼ N (0, 1) ˆ β j

βˆj −βj approx MLR.1-MLR.5 t (X, y) = σ ∼ N (0, 1) βˆ j

and it can be shown that βˆj −βj MLR.1-MLR.6 t (X, y) = ∼ tn−k−1 σˆ /(SSTj (1−Rj2)) approx βˆj −βj

MLR.1-MLR.5 t (X, y) =

σˆ /(SSTj (1−Rj2))

279



N (0, 1)

Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig

A frequent observation from many Monte Carlo simulations and empirical practice is that – for small n one proceeds as in the case of normally distributed errors and uses the critical values of the t distribution: approx βˆj −βj MLR.1-MLR.5 t (X, y) = ∼ tn−k−1 2 σˆ /(SSTj (1−Rj )) – and analogously for the F statistic the critical values are determined from the F distribution. – Note again: the critical values are valid only approximately, not exactly. Analogously, the p-values (calculated in EViews) are valid only approximately!

280

Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig

• Conclusion: – For the calculation of test statistics and confidence intervals (exception: forecast intervals) we proceed as hitherto. However, all statistical results hold only as an approximation. – If the assumption of homoskedasticity is violated, even the asymptotic results do not hold and models for heteroskedastic errors are required (with stronger assumptions for LLN and CLT), see Chapter 8.

Reading: Chapter 5 and Appendix C.3 in Wooldridge (2009).

281

6 Multiple Regression Analysis: Interpretation

6.1 Level and Log Models Recall section 2.6 on level-level, level-log, log-level, log-log models. All the results remain valid in the multiple regression model in a ceterisparibus analysis.

282

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

6.2 Data Scaling • Scaling the dependent variable: – Initial model: y = Xβ + u. – Variable transformation: yi∗ = a · yi with scale factor a. → New, transformed regression equation: ay = X |{z} aβ + |{z} au |{z} ∗ ∗ u y∗ β y∗ = Xβ ∗ + u∗

283

(6.1)

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

– OLS-estimator for β ∗ in (6.1): −1 ′ ∗ ′ ∗ ˆ β = XX Xy −1 ′ ′ ˆ =a XX X y = aβ. – Error variance:

V ar(u∗) = V ar(au) = a2V ar(u) = a2σ 2I. – Variance-covariance matrix: −1 −1 2 ∗ ′ 2 2 ′ ∗ ˆ ˆ V ar(β ) = σ XX =a σ XX = a2V ar(β) – t statistic:

ˆ∗j − 0 ˆj β a β t∗ = = = t. σβˆ∗ aσβˆ j j

284

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

• Scaling explanatory variables:

– Variable transformation: X∗ = X · a. New regression equation: y = Xa · a−1β + u = X∗β ∗ + u.

(6.2)

– OLS estimator for β ∗ in (6.2):  −1  −1 βˆ∗ = X∗′X∗ X∗′y = a2X′X X′ay −1 ′ −2 ′ ˆ =a a XX X y = a−1β.

– Result: The sole magnitude of βj is no indicator for the relevance of the impact of the j-th regressor. One always has to take the scale of the variable into account. – Example: In Section 2.3 a simple level-level model was estimated for imports on gdp. The parameter estimate βˆgdp = 2.16 · 10−05 appears very small. However, taking into account that gdp is 285

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

measured in dollars, this estimate is not small. Simply rescale gdp to millions of dollars with a = 10−6 and you obtain βˆ∗gdp = 106 · 2.16 · 10−05 = 21.6. • Scaling of variables in logarithmic form just alters the constant β0 since ln y ∗ = ln ay = ln a + ln y. • Standardized Coefficients: We just saw that it is not possible to deduce the relevance of explanatory variables from the magnitude of the corresponding coefficient. This is possible, however, if the regression is suitably standardized.

286

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

Deviation: First, consider the following sample regression model yi = βˆ0 + xi1βˆ1 + . . . + xik βˆk + uˆi,

(6.3)

and its representation after taking means over all n observations y¯ = βˆ0 + x¯1βˆ1 + . . . + x¯k βˆk .

(6.4)

Then we calculate the difference between (6.4) and (6.3) (yi − y¯) = (xi1 − x¯1)βˆ1 + . . . + (xik − x¯k )βˆk + uˆi.

(6.5)

Finally, we divide equation (6.5) by the estimated standard deviation of y, say σˆy , and expand every term on the right-hand side by the estimated standard deviations of the corresponding explanatory variables, say σˆxj , j = 1, . . . , k, (yi − y¯) (xi1 − x¯1) σˆ x1 ˆ (xik − x¯k ) σˆxk ˆ uˆi = · β1 + . . . + · βk + . σˆy σˆ y σˆx1 σˆy σˆxk σˆy 287

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

Simple algebra gives (xik − x¯k ) σˆxk ˆ uˆi (yi − y¯) (xi1 − x¯1) σˆx1 ˆ = β1 + . . . + βk + . σˆ σˆx1 σˆy σˆxk σˆ σˆy } | {z | {zy } | {z } | {z } | y{z } |{z} zi,y

zi,x1

zi,xk

ˆb1

ˆbk

ξi

In the literature the transformed variables zi,y and zi,x1 , . . . , zi,xk are usually denoted as z-scores. In compact notation we get zi,y = zi,x1ˆb1 + · · · + zi,xk ˆbk + ξi. where ˆbj are denoted as standardized coefficients (or simply beta coefficients). The magnitudes of the standardized coefficients can be compared to each other. Hence, the explanatory variable with the largest parameter βˆj has the relatively largest impact on the dependent variable. 288

Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig

Interpretation: a one standard deviation increase in xj changes y by ˆbj standard deviations. Standardized coefficients can be calculated in SPSS (see Example 6.1 in Wooldridge (2009)).

289

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

6.3 Dealing with Nonlinear or Transformed Regressors • Further details on logarithmic variables: Consider the following log-level regression model ln y = β0 + β1x1 + β2x2 + u,

(6.6)

where x2 is a dummy variable (it is either equal to 0 or 1). – How can we determine the exact impact of x2, that is, how should we interpret β2? From (6.6) follows y = eln y = eβ0+β1x1+β2x2+u = eβ0+β1x1+β2x2 · eu and for the conditional expectation E[y|x1, x2] = eβ0+β1x1+β2x2 · E[eu|x1, x2].

290

(6.7)

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

Inserting the two possible values of x2 into (6.7) delivers E[y|x1, x2 = 0] = eβ0+β1x1 · E[eu|x1, x2]

E[y|x1, x2 = 1] = eβ0+β1x1 · E[eu|x1, x2] · eβ2 = E[y|x1, x2 = 0] · eβ2 .

– Thus, if E[eu|x1, x2] is constant (for x2), the relative mean change of the dependent variable with respect to a unit change in x2 is equal to E[y|x1, x2 = 1] − E[y|x1, x2 = 0] ∆E[y|x1, x2] = E[y|x1, x2 = 0] E[y|x1, x2 = 0] E[y|x1, x2 = 0] · eβ2 − E[y|x1, x2 = 0] = E[y|x1, x2 = 0] = eβ2 − 1.

291

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

This implies

  %∆E[y|x1, x2] = 100 eβ2 − 1 .

– In the general case of k regressors:

  %∆E[y|x1, x2, . . . , xk ] = 100 eβj ∆xj − 1 .

(6.8)

Obviously (6.8) represents the exact partial effect, whereas the interpretation as an approximate semi-elasticity may be rather crude in some cases. – Trade Example Continued (from Section 4.8 and specifically from Section 4.4): For Model 3 we obtained the sample regression LOG(TRADE_0_D_O) = -9.5789 + 1.3566*LOG(WDI_GDPUSDCR_O) - 1.1443*LOG(CEPII_DIST) + 3.1265*CEPII_COMCOL_REV + RESIDUAL

Recall that CEPII COMCOL REV denotes a dummy variable. 292

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

∗ The approximate interpretation of βˆcomcol is that 1 unit change changes imports by 100βˆcomcol = 313%.   ˆ ∗ The exact partial effect is 100 eβcomcol − 1 = 2179.5%, almost 7 times as large! ∗ Of course, the difference between the approximate and exact effect are smaller if βˆ is closer to zero.

293

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

• Models with quadratic regressors: – For example, consider the multiple regression y = β0 + β1x1 + β2x2 + β3x22 + u. The marginal effect of a change in x2 on the conditional expectation of y is equal to ∂E[y|x1, x2] = β2 + 2β3x2. ∂x2 Therefore a change of ∆x2 in x2 changes ceteris paribus the dependent variable y on average by (β2 + 2β3x2)∆x2. Clearly, this effect depends on the level of x2 (and an interpretation of β2 alone does not make any sense!).

294

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

– In some empirical applications regressor variables are considered using quadratics and logarithms, in order to approximate a nonlinear regression function. Example: we can approximate non-constant elasticities using the model ln y = β0 + β1x1 + β2 ln x2 + β3(ln x2)2 + u. Then the elasticity of y with respect to x2 equals β2 + 2β3 ln x2 and is constant if and only if β3 = 0.

295

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

– Trade Example Continued: So far we only considered multiple regression models that are log-log or log-level in the original variables. Now consider a further specification for modeling imports where a log regressors also enters as square. Model 5: ln(imports) = β0 + β1 ln(gdp) + β2 (ln(gdp))2 + β3 ln(distance) + β4 com colonizer + β5 com language + u. Using the previous result, the elasticity of imports with respect to gdp is β1 + 2β2 ln(gdp). (6.9)

296

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

Estimation of Model 5 delivers: ==================================================================== Dependent Variable: LOG(TRADE_0_D_O) Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C -85.82647 27.89947 -3.076276 0.0035 LOG(WDI_GDPUSDCR_O) 7.087930 2.186465 3.241731 0.0022 (LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.042366 -2.643219 0.0112 LOG(CEPII_DIST) -0.761431 0.513375 -1.483187 0.1448 CEPII_COMCOL_REV 3.911289 0.673603 5.806523 0.0000 CEPII_COMLANG_OFF 2.312292 1.244898 1.857414 0.0697 ==================================================================== R-squared 0.751895 Mean dependent var 15.97292 Adjusted R-squared 0.724927 S.D. dependent var 2.613094 S.E. of regression 1.370499 Akaike info criterion 3.576394 Sum squared resid 86.40031 Schwarz criterion 3.801537 Log likelihood -86.98624 Hannan-Quinn criter. 3.662709 F-statistic 27.88113 Durbin-Watson stat 2.094393 Prob(F-statistic) 0.000000 ====================================================================

297

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

Comparing the AIC, HQ, and SC of Model 5 with those of Models 1 to 4, see Section 4.4, one finds that Model 5 exhibits the lowest values throughout. In addition, the (approximate) p-value of β2 is 0.011 and the quadratic term is statistically significant at the 5% significance level. This provides also evidence for a nonlinear elasticity. Inserting the parameter estimates into (6.9) delivers η(gdp) = 7.088 − 0.224 ln(gdp).

One may plot the elasticity η(gdp) versus gdp for each observed value of gdp. In EViews this can be done by a little program ’ Model 5 equation model5.ls log(trade_0_d_o) c log(wdi_gdpusdcr_o) (log(wdi_gdpusdcr_o))^2 log(cepii_dist) cepii_comcol_rev cepii_comlang_off ’ generate elasticities for gdp genr elast_gdp = [email protected](2) + 2*[email protected](3)*log(wdi_gdpusdcr_o) group group_elast_gdp (wdi_gdpusdcr_o) elast_gdp ’ create group group_elast_gdp.scat ’make scatter plot

298

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

2.4

2.0

ELAST_GDP

1.6

1.2

0.8

0.4

0.0 0.0E+00

4.0E+12

8.0E+12

1.2E+13

WDI_GDPUSDCR_O

The import elasticity with respect to gdp is much larger for small economies in terms of gdp than for large economies. Warning: Nonlinearities are sometimes due to missing variables. Can you think of any control variables left out that should be included in Model 5? 299

Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig

• Interactions: Example: y = β0 + β1x1 + β2x2 + β3x2x1 + u. The marginal effect of a change in x2 is given by ∆E[y|x1, x2] = (β2 + β3x1)∆x2. Hence, in this case the marginal effect also depends on the level of x1 ! • Selection between non-nested models of the same dependent variable: Definition: Non-nested means, that one model cannot be represented as a special case of the other model. Example: the choice between models y = β0 + β1x1 + β2x21 + u and ¯ 2). Why? y = β0 + β1 ln x1 + u can be based on SC or AIC (or R 300

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

6.4 Regressors with Qualitative Data Dummy variables or binary variables A binary variable can take exactly two different values and allows to describe two qualitatively different states. Examples: female vs. male, employed vs. unemployed, etc. • In general these values are coded as 0 and 1. This allows for a very easy and straightforward interpretation. Example: y = β0 + β1x1 + β2x2 + · · · + βk−1xk−1 + δD + u, where D equals 0 or 1.

301

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

• Interpretation (well known by now): E[y|x1, . . . , xk−1, D = 1] − E[y|x1, . . . , xk−1, D = 0] = β0 + β1x1 + β2x2 + · · · + βk−1xk−1 + δ − (β0 + β1x1 + β2x2 + · · · + βk−1xk−1) = δ The coefficient of a dummy variable is equal to an intercept shift of size δ in the case D = 1. All slope parameters βi, i = 1, . . . , k − 1 remain unchanged. • Wage Example Continued: – Question of interest: Do females earn significantly less than males? – Data: a sample of n = 526 U.S. workers obtained in 1976. (Source: Examples 2.4, 7.1 in Wooldridge (2009)).

302

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

∗ wage in dollars per hour, ∗ educ: years of schooling of each worker, ∗ exper: years of professional experience, ∗ tenure: years of employment in current firm, ∗ female: dummy=1 if female, dumm=0 otherwise. ==================================================================== Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.416691 0.098928 4.212066 0.0000 FEMALE -0.296511 0.035805 -8.281169 0.0000 EDUC 0.080197 0.006757 11.86823 0.0000 EXPER 0.029432 0.004975 5.915866 0.0000 EXPER^2 -0.000583 0.000107 -5.430528 0.0000 TENURE 0.031714 0.006845 4.633035 0.0000 TENURE^2 -0.000585 0.000235 -2.493365 0.0130 ==================================================================== R-squared 0.440769, Adjusted R-squared 0.434304, Akaike info crit. Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion S.E. of regression 0.399785, Sum squared resid 82.95065 F-statistic 68.17659, Prob(F-statistic) 0.000000

303

1.017438 1.074200

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

– Note: In order to be able to interpret the coefficients of dummy variables one has to know the reference group. The reference group is given by the group for which the dummy equals zero. – Prediction: How much earns a woman with 12 years of schooling, 10 years of experience, and 1 year tenure? (Or course, you can insert any other numbers here.) E[ln(wage)|f emale = 1, educ = 12, exper = 10, tenure = 1] = 0.4167 − 0.2965 · 1 + 0.0802 · 12 + 0.0294 · 10 − 0.0006 · (102) + 0.0317 · 1 − 0.0006 · (12) = 1.35

Thus, the expected hourly wage is approximately exp(1.35) = 3.86 US dollar. – We already know that in case of a log-level model the expected 304

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

value of y given the regressors x1, x2 is given by E[y|x1, x2] = eβ0+β1x1+β2x2 · E[eu|x1, x2].

The true value of E[eu|x1, x2] depends on the probability distribution of u. It holds that: If u is normally distributed with variance σ 2, then 2/2 u E[u|x ,x ]+σ 1 2 E[e |x1, x2] = e .

The precise prediction is therefore 2/2 β +β x +β x +E[u|x ,x ]+σ 0 1 1 2 2 1 2 E[y|x1, x2] = e .

305

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

The exact prediction of the desired hourly wage is E[wage|f emale = 1, educ = 12, exper = 10, tenure = 1] = exp(0.4167 − 0.2965 · 1 + 0.0802 · 12 + 0.02943 · 10

− 0.0006 · (102) + 0.0317 · 1 − 0.0006 · (12) + 0.39982/2) = 4.18.

Thus, the precise value of the mean hourly wage for the specified person is about 4.18$ and thus 30 Cent larger than the approximate value. – The parameter δ corresponds to the difference between the log income of female and male workers keeping everything else constant (e.g. years of schooling, experience, etc.). Question: How large is the exact wage difference? Answer: 100(e−0.2965 − 1) = 34.51%. 306

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

Note that ceteris paribus analysis is much more informative than the comparison of the unconditional means of male and female wages. Assuming normal errors one has E[ln(wagef )]+σf2 /2

E[wagef ] − E[wagem] e = E[wagem] Inserting estimates one obtains

2 eE[ln(wagem)]+σm/2

2/2 2/2 1.416+0.44 1.814+0.53 e −e 2 e1.814+0.53 /2

2 /2 E[ln(wage )]+σ m m −e

.

= −0.3570,

which, by the way, is very similar to inserting estimates for (E[wagef ]− E[wagem])/E[wagem] leading to -0.3538. Females earn 36% less than males if one does not control for other effects.

307

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

Several subgroups • Example: A worker is female or male and married or unmarried =⇒ 4 subgroups: 1. female and not married 2. female and married 3. male and not married 4. male and married How to proceed: – Choose one subgroup to be the reference group, for example: female and not married – Define dummy variables for the other subgroups. For example, in EViews with the Command “generate series” (genr): 308

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

∗ FEMMARR = FEMALE*MARRIED ∗ MALESING = (1-FEMALE) * (1-MARRIED) ∗ MALEMARR = (1-FEMALE) * MARRIED

Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.211028 0.096644 2.183548 0.0294 FEMMARR -0.087917 0.052348 -1.679475 0.0937 MALESING 0.110350 0.055742 1.979658 0.0483 MALEMARR 0.323026 0.050114 6.445759 0.0000 EDUC 0.078910 0.006694 11.78733 0.0000 EXPER 0.026801 0.005243 5.111835 0.0000 EXPER^2 -0.000535 0.000110 -4.847105 0.0000 TENURE 0.029088 0.006762 4.301613 0.0000 TENURE^2 -0.000533 0.000231 -2.305552 0.0215 ==================================================================== R-squared 0.460877, Adjusted R-squared 0.452535, Akaike info crit. 0.988423 Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion 1.061403 S.E. of regression 0.393290, Sum squared resid 79.96799 F-statistic 55.24559, Prob(F-statistic) 0.000000

309

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

Examples for Interpretation: – Married women earn about 8.8% less than unmarried women. However, this effect is only significant at the 10% significance level (for a two-sided test). – The wage difference between married men and women is about 32.3 − (−8.8) = 41.1%. A t test cannot be directly applied. (Solution: Choose a new reference group with one of the two subgroups as the reference group.) Remarks: – Using dummies for all subgroups is not recommended since then differences with respect to the ref. group cannot be tested directly. – If you use dummies for all subgroups you cannot include a constant. Otherwise MLR.3 is violated. Why? 310

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

• Using ordinal information in regression Example: Ranking of universities The quality difference between ranks 1 and 2 and ranks 11 and 12, respectively, may be dramatically different. Hence, ranks should not be used as regressors. Instead, we have to assign a dummy variable Dj for all but one (the “reference category”) of the universities, inducing several new parameters which have to be estimated. Note: Then, the coefficient of a dummy variable Dj denotes the intercept shift between university j and the reference university. Sometimes there are too many ranks and hence too many parameters to be estimated. Then it proves useful to group the data, e.g., ranks 1-10, 11-20, etc.

311

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

Interactions and Dummy Variables • Interactions between dummy variables: – May be used to define sub-groups (e.g., married males). – Note that a useful interpretation and comparison of sub-group effects crucially depends on a correct setup of dummies. For example, let us include the dummies male and married and their interaction in a wage equation y = β0 + δ1male + δ2married + δ3male · married + . . . Then, a comparison between male-married and male-single is given by E[y|male = 1, married = 1] − E[y|male = 1, married = 0] = β0 + δ1 + δ2 + δ3 + . . . − (β0 + δ1 + . . .) = δ2 + δ3 312

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

• Interactions between dummies and quantitative variables: – Allows different slope parameters for different groups y = β0 + β1D + β2x1 + β3(x1 · D) + u. Note: here β1 denotes the difference between both groups only for the case x1 = 0. If x1 6= 0, then this difference is equal to E[y|D = 1, x1] − E[y|D = 0, x1] = β0 + β1 · 1 + β2x1 + β3(x1 · 1) − (β0 + β2x1) = β1 + β3x1 Even if β1 is negative, the total effect may be positive!

313

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

– Wage Example Continued: Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526 ==================================================================== Variable Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.388806 0.118687 3.275892 0.0011 FEMALE -0.226789 0.167539 -1.353643 0.1764 EDUC 0.082369 0.008470 9.724919 0.0000 EXPER 0.029337 0.004984 5.885973 0.0000 EXPER^2 -0.000580 0.000108 -5.397767 0.0000 TENURE 0.031897 0.006864 4.646956 0.0000 TENURE^2 -0.000590 0.000235 -2.508901 0.0124 FEMALE*EDUC -0.005565 0.013062 -0.426013 0.6703 ==================================================================== R-squared 0.440964, Adjusted R-squared 0.433410, Akaike info crit. 1.020890 Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion 1.085761 S.E. of regression 0.400100, Sum squared resid 82.92160 F-statistic 58.37084, Prob(F-statistic) 0.000000

Are returns to schooling sensitive to gender?

314

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

• Testing for differences between groups – Can be done with F tests. – Chow Test: Allows to test whether there is a difference between groups in a sense that there may be group specific intercepts and/or (at least one) slope parameter. Illustration: y = β0 + β1D + β2x1 + β3(x1 · D) + β4x2 + β5(x2 · D) + u. (6.10) Pair of hypotheses: H0 :β1 = β3 = β5 = 0 vs. H1 :β1 6= 0 and/or β3 6= 0 and/or β5 6= 0

315

Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig

Application of F tests: ∗ Estimate the regression equation for each group l y = β0l + β2l x1 + β4l x2 + u, and calculate SSR1 and SSR2.

l = 1, 2,

∗ Then estimate this regression for both groups together and calculate SSR. ∗ Compute the F statistic SSR − (SSR1 + SSR2) n − 2(k + 1) F = SSR1 + SSR2 (k + 1) where the degrees of freedom for the F distribution are equal to k + 1 and n − 2(k + 1). Reading: Chapter 6 (without Section 6.4) and Chapter 7 (without Sections 7.5 und 7.6) in Wooldridge (2009). 316

7 Multiple Regression Analysis: Prediction

7.1 Prediction and Prediction Error • Consider the multiple regression model y = Xβ + u, i.e. yi = β0 + β1xi1 + · · · + βk xik + ui,

1 ≤ i ≤ n.

• We search for a predictor yˆ0 for y0 given x01, . . . , x0k . • Define the prediction error y0 − yˆ0. 317

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

• We assume that MLR.1 to MLR.5 hold for the prediction sample (x0, y0). Then y0 = β0 + β1x01 + · · · + βk x0k + u0

(7.1)

and so that

E[u0|x01, . . . , x0k ] = 0,

E[y0|x01, . . . , x0k ] = β0 + β1x01 + · · · + βk x0k = x′0β, where x′0 = (1, x01, . . . , x0k ). MLR.4 guarantees that for known parameters the predictions are unbiased. Then, the prediction is, loosely speaking, correct on average (if averaged over many samples). It can be shown that the conditional expectation is optimal in the sense of minimizing the mean squared prediction error. 318

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

• In practice, the true regression coefficients βj , j = 0, . . . , k, are unknown. Inserting the OLS estimators βˆj gives b 0|x01, . . . , x0k ] = βˆ0 + βˆ1x01 + · · · + βˆk x0k . yˆ0 = E[y

Using compact notation the prediction rule is: ˆ yˆ0 = x′0β

(7.2)

• This prediction rule only makes sense if (y0, x′0) belongs to the population as well. Otherwise the population regression model is not valid for (y0, x′0) and the prediction based on the estimated version possibly strongly misleading.

319

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

• General decomposition of the prediction error uˆ0 = y0 − yˆ0

(7.3)

   ′β ′ β − x′ β ˆ = |(y0 − E[y |x ]) + E[y |x ] − x + x 0 0 0 0 0 0 0 {z } | {z } {z } | unavoidable error v0 possible specification error estimation error

– If MLR.1 and MLR.4 are correct for the population and if the prediction sample also belongs to the population, then the specification error is zero. Then v0 = u0 in (7.1). ˆ = β, then the estimation – If the estimator is consistent, plim β error becomes negligible in large samples.

320

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

– Using the OLS estimator, the estimation error is ˆ = x′ (β − β) ˆ x′0β − x′0β 0   = x′0β − x′0 (X′X)−1X′y   = x′0β − x′0 β + (X′X)−1X′u = −x′0(X′X)−1X′u.

(7.4)

Thus, the estimation error only depends on the estimation sample. – The OLS prediction error under MLR.1 to MLR.5 is given by (using (7.3) and (7.4)): ˆ uˆ0 = u0 + x′0(β − β)

= u0 − x′0(X′X)−1X′u.

321

(7.5)

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

• Variance of the prediction error: – Extension of Assumption MLR.2 (Random Sampling): u0 and u are uncorrelated. – Conditional variance of (7.5) given X and x0:   ˆ V ar(ˆ u0|X, x0) = V ar(u0|X, x0) + V ar x′0(β − β)|X, x0 ˆ = σ 2 + x′0V ar(β − β|X)x 0 = σ 2 + x′0σ 2(X′X)−1x0 or





V ar(ˆ u0|X, x0) = σ 2 1 + x′0(X′X)−1x0 .

(7.6)

– Relevant in practice: Estimated variance of the prediction error   Vd ar(ˆ u0|X, x0) = σˆ 2 1 + x′ (X′X)−1x0 . 0

322

Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig

• Prediction interval: A prediction interval is (given an a priori chosen confidence probability 1 − α) for the multiple regression model given by   q q yˆ0 − tn−k−1 Vd ar(ˆ u0|X, x0) , yˆ0 + tn−k−1 Vd ar(ˆ u0|X, x0) . Notes:

– Derivation and structure are analogous to the case of confidence intervals for the parameter estimates. – Prediction intervals are in contrast to confidence intervals even in large samples only valid if the prediction errors are normally distributed. This is because there is no averaging of the true ˆ − β = Wu due to the prediction error u0 as it occurs for β central limit theorem.

323

Intensive Course in Econometrics — Section 7.2 — UR March 2009 — R. Tschernig

7.2 Statistical Properties of Linear Predictions Apparently the prediction rule is linear (in y) since ˆ = x′ (X′X)−1X′y. yˆ0 = x′0β 0 Gauss-Markov property of linear prediction ˆ is the BLU estimator for β, then If β ˆ yˆ0 = x′0β is the BLU prediction rule. Among all linear prediction rules with a mean prediction error of zero it exhibits the smallest prediction error variance. Reading: Section 6.4 in Wooldridge (2009).

324

8 Multiple Regression Analysis: Heteroskedasticity

• In this chapter Assumptions MLR.1 through MLR.4 continue to hold. • If MLR.5 fails to hold such that

V ar(ui|xi1, . . . , xik ) = σi2 6= σ 2,

i = 1, . . . , n,

the errors of the regression model exhibit heteroscedasticity. More precisely (instead of MLR.5) we have

325

Intensive Course in Econometrics — Chapter 8 — UR March 2009 — R. Tschernig

– Assumption GLS.5: Heteroskedasticity V ar(ui|xi1, . . . , xik ) = σi2(xi1, . . . , xik )

= σ 2h(xi1, . . . , xik ) = σ 2hi,

i = 1, . . . , n.

The error variance of the i-th sample observation σi2 is a function h(·) of the regressors. • Examples: – The variance of net rents depends on the size of the flat. – The variance of consumption expenditures depends on the level of income. – The variance of log hourly wages depends on years of education.

326

Intensive Course in Econometrics — Chapter 8 — UR March 2009 — R. Tschernig

• The covariance matrix of the errors of the regression is given by:    σ 2h1 0 · · · 0 h1 0 · · ·     0 σ 2h · · · 0   0 h ···   2 2 2 = σ V ar(u|X) = E[uu′|X] =  . . .. . . . ..   ..   . . ...    0 0 · · · σ 2hn 0 0 ··· | {z Ψ Thus, we have y = Xβ + u,

V ar(u|X) = σ 2Ψ,

(8.1)

which will be referred to as the original model in matrix notation.

327

0 0 .. hn



   .   }

Intensive Course in Econometrics — Section 8.1 — UR March 2009 — R. Tschernig

• When estimating models with heteroskedastic errors three cases have to be distinguished: 1. Function h(·) is known, see Section 8.3. 2. Function h(·) is only partially known, see Section 8.4. 3. Function h(·) is completely unknown, see Section 8.2. 8.1 Consequences of Heteroskedasticity for OLS • The OLS estimator is unbiased and consistent. • Variance of the OLS estimator in the presence of heteroskedastic errors (compare Section 3.4.2):

328

Intensive Course in Econometrics — Section 8.1 — UR March 2009 — R. Tschernig

ˆ − β = (X′X)−1X′u it can be derived that From β   ′  ˆ ˆ −β β ˆ − β |X V ar(β|X) =E β h i = E (X′X)−1X′uu′X(X′X)−1|X  ′  ′ −1 ′ = (X X) X E uu |X X(X′X)−1 | {z } σ 2Ψ = (X′X)−1X′σ 2ΨX(X′X)−1.

(8.2)

• Note that with homoskedastic errors one has Ψ = I. Then (8.2) yields the usual OLS covariance matrix, namely σ 2(X′X)−1. • If heteroskedasticity is present, using the usual covariance matrix σ 2(X′X)−1 is misleading and leads to faulty inference. • The problem with using (8.2) directly is that Ψ is unknown. The next section introduces an appropriate estimator. 329

Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig

• Even if Ψ is known, OLS is not the best linear unbiased estimator, and thus not efficient. One has to use the GLS estimator instead, see Section 8.3. 8.2 Heteroskedasticity-Robust Inference after OLS • Derivation of heteroskedasticity-robust standard errors

Let x′i = (1, xi1 . . . , xik ). Note that the middle term in the variancecovariance matrix (8.2) with dimension (k + 1) × (k + 1) can be written as n X X′σ 2ΨX = σ 2hixix′i. i=1

Because E[u2i |X] = σ 2hi, one can estimate σ 2hi by the “one ob330

Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig

servation average” u2i . Of course this is not a good estimator but for the present purpose it is doing well enough. Since ui is not known, one takes the residual uˆi. Hence one can estimate the covariance matrix (8.2) of the OLS estimator in presence of heteroskedasticity by   n X ˆ Vd ar(β|X) = (X′X)−1  uˆ2i xix′i (X′X)−1. (8.3) i=1

• Comments:

– Standard errors obtained from (8.3) are called heteroskedasticityrobust standard errors or also White standard errors named after Halbert White, an econometrician at the University of California in San Diego.

331

Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig

– For single βˆj heteroskedasticity-robust standard errors can be smaller or larger than the usual OLS standard errors. – If heteroskedasticity-robust standard errors are used, it can be ˆ has no longer a known finite shown that the OLS estimator β sample distribution. However, it is asymptotically normally distributed. Thus, critical values and p-values remain approximately valid if (8.3) is used. – The OLS estimator with White standard errors is unbiased and consistent since MLR.1 to MLR.4 are unaffected by heteroskedasticity. – However, the OLS estimator is not efficient. Efficient estimators will be presented in the next sections.

332

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

8.3 The General Least Squares (GLS) Estimator • Original model (8.1): yi = β0 + β1xi1 + . . . + βk xik + ui,

(8.4)

V ar(ui|xi1, . . . , xik ) = σ 2h(xi1, . . . , xik ) = σ 2hi. • Basic idea: Weighted estimation of (8.4): Transformation of the initial model to a model that satisfies all assumptions, including MLR.5. This is achieved by kind of standardizing the regression error ui. This amounts to dividing ui and thus the whole regression equation (8.4) by the square root of hi: yi 1 xi1 xik ui √ = β0 √ +β1 √ + . . . + βk √ + √ . hi hi hi hi |{z} hi |{z} |{z} |{z} |{z} yi∗

x∗i0

x∗i1

333

x∗ik

u∗i

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

The resulting model is yi∗ = β0x∗i0 + β1x∗i1 + . . . + βk x∗ik + u∗i .

(8.5)

Note: For the transformed error u∗i we have   ui ∗ V ar(ui |xi1, . . . , xik ) = V ar √ xi1, . . . , xik " hi # u2i =E xi1, . . . , xik hi 1 2 1 2 = E[ui |xi1, . . . , xik ] = σ hi = σ 2. hi hi Result: We have transformed the original regression (8.4) in such a way that the homoskedasticity assumption MLR.5 holds for the resulting regression model (8.5).

334

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

• Therefore the OLS estimator based on the transformed model (8.5) has all desirable properties: BLU (best linear unbiased). • The OLS estimator of the transformed model (8.5) is based on the minimization of a weighted sum of squared residuals n X (yi − β0 − β1xi1 − . . . − βk xik )2/hi. i=1

Therefore, it is called a weighted least squares (WLS) procedure. Note in its current form it requires that h(·) is known. √ • The transformed model does not contain a constant term if hi is not identical to one of the regressors in model (8.4).

• Next we derive the transformed model in matrix notation.

335

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

• Explicit statement of y∗, X∗, and u∗ in    −1/2 h1 0 ··· y1∗    y ∗   0 h−1/2 · · ·  2  2 . = .. ...  .   ..    yn∗ 0 0 ··· | {z } | {z y∗



P



 x∗10 x∗11 · · · x∗1k 1     x∗ x∗ · · · x∗  1  20 21  2k  =P·  .  .. ..   .  ..    x∗n0 x∗n1 · · · x∗nk 1 | {z } | X∗

336

matrix notation:   0 y1   y  0    2   ..   ..    −1/2 yn hn } | {z } y

 

x11 · · · x1k  x21 · · · x2k   , .. ..   





u    1  u∗  u   2  2  .  =P· .  .   .     

xn1 · · · xnk {z } | X

u∗1



u∗n {z } u∗

un | {z } u

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

• For the transformation matrix P it holds that P′P = Ψ−1

and hence E[uu′|X] = σ 2Ψ = σ 2(P′P)−1. • Therefore, the transformed model (8.5) in matrix notation is given by Py = PXβ + Pu, or y∗ = X∗β + u∗,

E[u∗(u∗)′|X∗] = σ 2I.

(8.6)

• Obviously (8.6) is obtained by multiplying the original model (8.1) y = Xβ + u by the transformation matrix P from the left. • What is the explicit formula for the OLS estimator in terms of the transformed model (8.6) and the original model (8.1)? 337

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

GLS (generalized least squares) estimator • OLS estimation of (8.6) yields

−1 ∗′ ∗ ∗′ ∗ ˆ GLS = X X β X y −1 ′ = (PX) PX (PX)′Py

and therefore

−1 ′ ′ ′ ′ = X P PX X P Py

 −1 ˆ GLS = X′Ψ−1X X′Ψ−1y. β

(8.7)

ˆ GLS in (8.7) is called the GLS estimator. β

In case of heteroskedasticity Ψ is a diagonal matrix and each of the √ n observations is weighted by 1/ hi.

338

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

• Properties for known h(·): ˆ GLS Under MLR.1 to MLR.4 and GLS.5 the GLS-estimator β – is unbiased and consistent, – is BLUE (best linear unbiased), and thus efficient,

−1 2 ′ −1 ˆ GLS |X) = σ X Ψ X – has variance-covariance matrix V ar(β , – is unbiased and consistent even if Ψ is misspecified since Ψ is a function of X and not of u and thus  −1 ′Ψ−1X ′Ψ−1E[u|X] = 0. ˆ E[β − β|X] = X X GLS

As a consequence, OLS is inefficient since OLS and GLS are both linear estimators. OLS variances are larger than or equal to those of the GLS estimator. This can be shown using matrix algebra. • Analogously to MLR.6 in Section 4.2 above, we assume 339

Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig

– Assumption GLS.6: Normal Distribution ui|xi ∼ N (0, σ 2hi),

i = 1, . . . , n,

which, together with MLR.2 (Random Sampling) implies the multivariate normal distribution   u|X ∼ N 0, σ 2Ψ .

Note GLS.6 implies that ui given xi is independently but not identically distributed since the variance changes with i.

All test statistics based on the transformed model (8.6) and appropriately modified for the original model (8.1) exhibit the exact distributions of Chapter 4 (normal, t, F ). • Frequent problem in practice: hi is not known. In this case, the feasible GLS estimator has to be used −→ Case 2. 340

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

8.4 Feasible Generalized Least Squares (FGLS) • In general, the variance function hi is not known and has to be estimated. Frequently neither the relevant factors nor the functional relationship are known. • Hence, one needs a specification that flexibly captures a large range of possibilities, e.g. hi = h(xi1, . . . , xik ) = exp (δ1xi1 + . . . + δk xik ) and thus V ar(ui|xi1, xi2, . . . , xik ) = σ 2hi = σ 2 exp (δ1xi1 + . . . + δk xik ) . Remark: On pp. 282, Wooldridge (2009) considers in hi additionally the factor exp δ0. As this factor is constant, it can also be captured by σ 2. 341

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

• How can one estimate the unknown parameters δ1, . . . , δk ? √ Standardizing ui delivers vi = ui/(σ hi) with E[vi|X] = 0 and √ V ar(vi|X) = 1. Therefore ui = σ hivi and u2i = σ 2hivi2,

i = 1, . . . , n.

Taking logarithms leads to ln u2i = ln σ 2 + ln hi + ln vi2

= ln σ 2 + ln exp (δ1xi1 + · · · + δk xik ) + ln vi2

= ln σ 2 + E[ln vi2] +δ1xi1 + · · · + δk xik + ln vi2 − E[ln vi2] | {z } | {z } α0

ei

ln u2i = α0 + δ1xi1 + · · · + δk xik + ei.

(8.8)

For the regression equation (8.8) the assumptions MLR.1-MLR.4 are satisfied. Hence, the OLS estimator for δj is unbiased and consistent. 342

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

In practice, the u2i ’s in the variance regression (8.8) are replaced by the squared OLS residuals uˆ2i ’s from the sample regression y = ˆ +u ˆ of (8.1). The resulting δˆj ’s are used to get the fitted values Xβ ˆ i’s which are inserted into the GLS estimator (8.7) in step II. h • Outline of the FGLS-method: Step I ˆ by OLS a) Regress y on X and compute the residual vector u estimation of the original specification (8.1). b) Calculate ln uˆ2i , i = 1, . . . , n, that are used as regressand in the variance regression (8.8). c) Estimate the variance regression (8.8) by OLS.   ˆ i = exp δˆ1xi1 + · · · + δˆk xik , i = 1, . . . , n. d) Compute h 343

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

Step II ˆ F GLS is obtained analogously to the The FGLS estimator β GLS procedure. The original regression (8.1) is multiplied from the left with the matrix   ˆ −1/2 · · · h 0 1   ˆ =  .. ..  . ... P   ˆ −1/2 0 ··· h n This delivers a variant of the transformed regression y# = X#β + u#.

(8.9)

Hence, OLS estimation of (8.9) leads to the FGLS estimator  −1 −1 −1 ′ ′ ˆ ˆ ˆ β F GLS = X Ψ X X Ψ y, (8.10)

−1 ˆ ˆ ′P. ˆ with Ψ = P

344

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

• Estimation properties of the FGLS estimators: – They are consistent, that is, they converge in probability to the true parameters for n → ∞ ˆ F GLS = β. plim β

– The FGLS estimator is asymptotically efficient: For a correctly specified hi and a sufficiently large sample, the FGLS estimator is preferable to the OLS estimator as the former one has a lower estimation-variance. (This is plausible, as FGLS also uses information on the functional form of the heteroskedasticity while OLS with heteroskedasticity-robust standard errors does not.) – If the variance function hi is misspecified, then the FGLS estimator is inefficient. – Be aware that there may be considerable differences between the 345

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

FGLS estimates and the OLS estimates. • Comparing OLS with heteroskedasticity-robust standard errors and FGLS – If you know something about the variance function hi, then FGLS is preferable. If you have no idea about it, then OLS with heteroskedasticity-robust standard errors may be better. – It is always a good idea to run an OLS regression also with heteroskedasticity-robust standard errors in order to see whether the significance of parameters depends on the presence of heteroskedasticity. – Since any estimator taking into account heteroskedasticity should be avoided if there is no heteroskedasticity, one should test for the presence of heteroskedasticity, see Section 9.2. 346

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

• Trade Example Continued – Consider Model 5 of Section 6.3 and compare OLS estimates, FGLS estimates, and OLS estimates with heteroskedasticity-robust standard errors. – EViews program to run OLS, FGLS with both steps, and OLS with White standard errors, and scatter plots of residuals against fitted values for both estimators. ’EViews program for FGLS estimation, Chapter 8 Heteroskedasticity ’RT, 2009_02_26 ’requires workfile IMPORTS_KAZAKHSTAN_2004 ’ define variables ’ define log of dependent variable genr log_imp = log(trade_0_d_o) ’ define group of regressors (right hand side variables) ’ Model 5 group rhs_model5_base c log(wdi_gdpusdcr_o) (log(wdi_gdpusdcr_o))^2 log( cepii_dist) cepii_comcol_rev cepii_comlang_off

347

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

’ potential regressors to add group rhs_model5_add log(cepii_area_o) log(weo_pop_o) cepii_comlang_ethno_rev cepii_col45_rev cepii_contig ’ rhs_model5_add (can be added) group rhs_model5 rhs_model5_base ’

Step I: OLS regression ’ ols regression equation eq_ols_model5.ls log_imp rhs_model5 ’ compute residuals eq_ols_model5.makeresids res_ols_model5 ’ compute fitted values eq_ols_model5.fit fit_ols_model5 ’ plot residuals versus fitted log imports in order to check for heteroskedasticity group group_ols_res fit_ols_model5 res_ols_model5 group_ols_res.scat ’Step II: FGLS regression ’ square residuals and take logs genr ln_u_hat_sq = log(res_ols_model5^2) ’ estimate variance equation

348

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

equation eq_h_model5.ls ln_u_hat_sq rhs_model5 ’ predicted squared residuals eq_h_model5.fit ln_u_hat_sq_hat ’ compute exponential of fitted values of variance regression genr h_hat = exp(ln_u_hat_sq_hat) ’ estimate FGLS using w=h_hat^(-1/2) equation eq_fgls_model5.ls(w=h_hat^(-1/2)) log_imp rhs_model5 ’ compute fitted values based on FGLS eq_fgls_model5.fit fit_fgls_model5 ’ compute residuals based on FGLS eq_fgls_model5.makeresids res_fgls_model5 ’ standardize residuals with weight function series res_fgls_model5_star = res_fgls_model5*h_hat^(-1/2) ’ plot residuals versus fitted group group_fgls_res fit_fgls_model5 res_fgls_model5_star group_fgls_res.scat ’ OLS regression with heteroskedasticity-robust standard errors equation eq_white_model5.ls(h) log_imp rhs_model5

349

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– OLS output with usual standard errors ==================================================================== Dependent Variable: LOG_IMP Method: Least Squares Sample: 1 55; Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C -85.82647 27.89947 -3.076276 0.0035 LOG(WDI_GDPUSDCR_O) 7.087930 2.186465 3.241731 0.0022 (LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.042366 -2.643219 0.0112 LOG(CEPII_DIST) -0.761431 0.513375 -1.483187 0.1448 CEPII_COMCOL_REV 3.911289 0.673603 5.806523 0.0000 CEPII_COMLANG_OFF 2.312292 1.244898 1.857414 0.0697 ==================================================================== R-squared 0.751895 Mean dependent var 15.97292 Adjusted R-squared 0.724927 S.D. dependent var 2.613094 S.E. of regression 1.370499 Akaike info criterion 3.576394 Sum squared resid 86.40031 Schwarz criterion 3.801537 Log likelihood -86.98624 Hannan-Quinn criter. 3.662709 F-statistic 27.88113 Durbin-Watson stat 2.094393 Prob(F-statistic) 0.000000 ====================================================================

350

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– FGLS - Step I: estimate variance regression (8.8) ==================================================================== Dependent Variable: LN_U_HAT_SQ Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 10.63404 44.84623 0.237122 0.8136 LOG(WDI_GDPUSDCR_O) -0.848029 3.514572 -0.241290 0.8104 (LOG(WDI_GDPUSDCR_O))^2 0.004631 0.068099 0.068007 0.9461 LOG(CEPII_DIST) 0.863174 0.825210 1.046006 0.3010 CEPII_COMCOL_REV -0.946104 1.082764 -0.873786 0.3868 CEPII_COMLANG_OFF 1.187134 2.001077 0.593248 0.5559 ==================================================================== R-squared 0.177909 Mean dependent var -0.847962 Adjusted R-squared 0.088551 S.D. dependent var 2.307505 S.E. of regression 2.202971 Akaike info criterion 4.525657 Sum squared resid 223.2417 Schwarz criterion 4.750801 Log likelihood -111.6671 Hannan-Quinn criter. 4.611972 F-statistic 1.990976 Durbin-Watson stat 1.711027 Prob(F-statistic) 0.097787 ====================================================================

351

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– Estimate FGLS - Step II: estimate (8.10) ==================================================================== Dependent Variable: LOG_IMP Method: Least Squares Sample: 1 55, Included observations: 52 Weighting series: H_HAT^(-1/2) ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C -88.19314 22.59938 -3.902459 0.0003 LOG(WDI_GDPUSDCR_O) 7.425860 1.675365 4.432383 0.0001 (LOG(WDI_GDPUSDCR_O))^2 -0.117133 0.030951 -3.784479 0.0004 LOG(CEPII_DIST) -1.107230 0.347434 -3.186879 0.0026 CEPII_COMCOL_REV 3.704807 0.678132 5.463250 0.0000 CEPII_COMLANG_OFF 2.012321 0.989031 2.034640 0.0477 ==================================================================== Weighted Statistics ==================================================================== R-squared 0.780089 Mean dependent var 16.88216 Adjusted R-squared 0.756186 S.D. dependent var 10.32249 S.E. of regression 1.034215 Akaike info criterion 3.013329 Sum squared resid 49.20159 Schwarz criterion 3.238472 Log likelihood -72.34654 Hannan-Quinn criter. 3.099643

352

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

F-statistic 32.63518 Durbin-Watson stat 2.577604 Prob(F-statistic) 0.000000 ==================================================================== Unweighted Statistics ==================================================================== R-squared 0.746834 Mean dependent var 15.97292 Adjusted R-squared 0.719316 S.D. dependent var 2.613094 S.E. of regression 1.384409 Sum squared resid 88.16300 Durbin-Watson stat 2.055925 ====================================================================

353

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– OLS with heteroskedasticity-robust standard errors ==================================================================== Dependent Variable: LOG_IMP Method: Least Squares Sample: 1 55, Included observations: 52 White Heteroskedasticity-Consistent Standard Errors & Covariance ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C -85.82647 28.84426 -2.975513 0.0046 LOG(WDI_GDPUSDCR_O) 7.087930 2.230037 3.178391 0.0026 (LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.041483 -2.699469 0.0097 LOG(CEPII_DIST) -0.761431 0.456362 -1.668480 0.1020 CEPII_COMCOL_REV 3.911289 0.777904 5.027986 0.0000 CEPII_COMLANG_OFF 2.312292 0.789404 2.929161 0.0053 ==================================================================== R-squared 0.751895 Mean dependent var 15.97292 Adjusted R-squared 0.724927 S.D. dependent var 2.613094 S.E. of regression 1.370499 Akaike info criterion 3.576394 Sum squared resid 86.40031 Schwarz criterion 3.801537 Log likelihood -86.98624 Hannan-Quinn criter. 3.662709 F-statistic 27.88113 Durbin-Watson stat 2.094393 Prob(F-statistic) 0.000000 ====================================================================

354

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

3

4

2

3

RES_FGLS_MODEL5_STAR

RES_OLS_MODEL5

– Diagnostic plots: (standardized) residuals against fitted values

1 0 -1 -2 -3 -4

2 1 0 -1 -2 -3 -4

10

12

14

16

18

20

FIT_OLS_MODEL5

22

8

10

12

14

16

18

FIT_FGLS_MODEL5

355

20

22

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– Output table for Model 4 and Model 5 using various estimators (compare Section 4.8):

356

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig Notes: OLS or FGLS standard errors in paren-

Dependent Variable: ln(imports to Kazakhstan) Independent Variables/Model

(4)-OLS

(5)-OLS

(5)-FGLS

constant

-13.027

-85.827

-88.193

(4.726)

(27.900)

(22.599)

[4.778]

[28.844]

1.318

7.088

7.426

(0.129)

(2.187)

(1.675)

[0.158]

[2.230]



-0.112

-0.117

(0.042)

(0.031)

ln(gdp)

(ln(gdp))2

[0.042] ln(distance)

-0.625

-0.761

-1.107

(0.542)

(0.513)

(0.347)

[0.438]

[0.456]

3.351

3.911

3.705

(0.679)

(0.674)

(0.678)

[0.782]

[0.778]

2.110

2.312

2.012

(1.319)

(1.245)

(0.989)

[1.020]

[0.789]

52

52

52

R2

0.714

0.752

0.747

Standard error of regression

1.455

1.371

1.384

Sum of squared residuals

99.523

86.40

88.16

AIC

3.6793

3.576

HQ

3.7513

3.663

SC

3.8670

3.802

common ”colonizer” since 1945

common of f icial language

Number of observations

357

theses, White standard errors in brackets

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

– Results and Interpretation: ∗ OLS and FGLS parameter estimates for ln(distance) and com. of f icial language are quite different: the FGLS estimates imply less impact of language, more impact of distance. In addition, log(distance) is statistically significant in the FGLS estimate while insignificant in the OLS estimate with heteroskedasticity-robust standard errors. This makes the FGLS estimates more plausible. ∗ When taking into account heteroskedasticity, based on FGLS there is no insignificant parameter at the 5% significance level and based on heteroskedasticity-robust OLS standard errors only log(distance) is insignificant. ∗ Comparing the usual OLS standard errors and White standard errors shows that the multicollinearity problem for the param358

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

eter estimates of distance and language decreases. ∗ Inspecting the scatter plots of OLS and standardized FGLS residuals against fitted values does not automatically suggest heteroskedasticity. Thus heteroskedasticity tests are useful, see Section 9.2. • Cigarette Example (Wooldridge 2009, Example 8.7) (with EViewsoutputs/commands): Step I 1. OLS-estimation Dependent Variable: CIGS, Method: Least Squares, Sample: 1 807 Variable C LINCOME LCIGPRIC EDUC AGE AGE^2

Coefficient -3.639855 0.880268 -0.750855 -0.501498 0.770694 -0.009023

Std. Error 24.07866 0.727783 5.773343 0.167077 0.160122 0.001743

t-Statistic -0.151165 1.209520 -0.130056 -3.001596 4.813155 -5.176494

359

Prob. 0.8799 0.2268 0.8966 0.0028 0.0000 0.0000

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig RESTAURN

-2.825085

R-squared Mean dependent var S.E. of regression F-statistic

1.111794 0.052737, 8.686493, 13.40479, 7.423062,

-2.541016

Adjusted R-squared S.D. dependent var Sum squared resid Prob(F-statistic)

0.0112 0.045632, 13.72152, 143750.7, 0.000000,

Akaike info crit. 8.037737 Schwarz criterion 8.078448 Log likelihood -3236.227 Durbin-Watson stat 2.012825

2. Save the residuals using genr u hat = resid 3. Taking the logarithm of the squared residuals by using genr ln u sq = log(u hat^ 2)

360

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

4. Estimation of variance regression (8.8) with OLS yields Dependent Variable: LN_U_SQ, Method: Least Squares, Sample: 1 807 Variable

Coefficient

Std. Error

C LINCOME LCIGPRIC EDUC AGE AGE^2 RESTAURN

-1.920704 0.291541 0.195421 -0.079704 0.204005 -0.002392 -0.627012

2.563034 0.077468 0.614539 0.017784 0.017044 0.000186 0.118344

R-squared Mean dependent var S.E. of regression F-statistic

0.247362, 4.207485, 1.426862, 43.82126,

t-Statistic -0.749387 3.763351 0.317996 -4.481656 11.96927 -12.89313 -5.298212

Adjusted R-squared S.D. dependent var Sum squared resid Prob(F-statistic)

Prob. 0.4538 0.0002 0.7506 0.0000 0.0000 0.0000 0.0000 0.241717, 1.638575, 1628.749, 0.000000,

Akaike info crit. 3.557469 Schwarz criterion 3.598179 Log likelihood -1428.439 Durbin-Watson stat 2.024587

ˆ i, i = 1, . . . , n, using – Saving the h genr h hat = exp(ln u sq - resid).

361

Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig

Step II Weighted LS estimate (see Options) with weights h hat^ (-1/2) Dependent Variable: CIGS, Method: Least Squares, Sample: 1 807, Weighting series: WEIGHTS Variable Coefficient Std. Error t-Statistic Prob. C 5.635433 17.80314 0.316541 0.7517 LINCOME 1.295240 0.437012 2.963855 0.0031 LCIGPRIC -2.940305 4.460145 -0.659240 0.5099 EDUC -0.463446 0.120159 -3.856953 0.0001 AGE 0.481948 0.096808 4.978378 0.0000 AGE^2 -0.005627 0.000939 -5.989707 0.0000 RESTAURN -3.461064 0.795505 -4.350776 0.0000 Weighted Statistics R-squared Mean dependent var S.E. of regression F-statistic

0.002751, 7.158227, 11.69611, 17.05549,

Adjusted R-squared S.D. dependent var Sum squared resid Prob(F-statistic)

-0.004728, 11.66855, 109439.1, 0.000000,

Unweighted Statistics R-squared 0.045739, Mean dependent var 8.686493,

Adjusted R-squared S.D. dependent var

0.038582, 13.72152,

Akaike info crit. 7.765025 Schwarz criterion 7.805736 Log likelihood -3126.188 Durbin-Watson stat 2.049719

S.E. of regression Sum squared resid

13.45421 144812.7

(Remark: The Unweighted Statistics are based on the residˆ uals y − Xβ GLS ; see EViews-helpfile.) – Compare with the OLS estimator based on White standard errors. 362

9 Multiple Regression Analysis: Model Diagnostics

9.1 The RESET Test RESET Test (regression specification error test) Idea and implementation: • If the original model

y = x0β0 + . . . + xk βk + u = x′β + u

363

Intensive Course in Econometrics — Section 9.1 — UR March 2009 — R. Tschernig

satisfies assumption MLR.4 E[u|x0, . . . , xk ] = 0, it holds that E[y|x0, . . . , xk ] = x0β0 + . . . + xk βk + u = x′β. • Then, any further term added to the model should not be significant. Thus, any nonlinear function of the independent variables should be insignificant. • Thus, the null hypothesis of the RESET test is formulated such that one can test the significance of nonlinear functions of the fitˆ that are added to the model. Note that the ted values yˆ = x′β fitted values are a linear function of the regressors of the original specification.

364

Intensive Course in Econometrics — Section 9.1 — UR March 2009 — R. Tschernig

• In practice it turned out that for implementing the RESET test it is sufficient to include quadratic and cubic terms of yˆ only y 2 + γ yˆ3 + ε. y = x′β + αˆ The pair of hypotheses is H0 : α = 0, γ = 0 (linear model is correctly specified) H1 : α 6= 0 and/or γ 6= 0. The null hypothesis is tested using an F test with 2 degrees of freedom in the numerator and n − k − 3 in the denominator. • Be aware that the null hypothesis may also be rejected because of omitting relevant regressor variables.

365

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

9.2 Heteroskedasticity Tests • As already noted, it does not make sense to “automatically” use the FGLS estimator. If the errors are homoskedastic, the OLS estimator with OLS standard errors should be used. • Thus, one should test if there is statistical evidence for heteroskedasticity. • In the following, two different test for heteroskedasticity are discussed: the Breusch-Pagan test and the White test. For both, the null hypothesis is “homoskedastic errors”. • Both tests are implemented in EViews 6.0, however, for the White test without cross terms the level variables are included only in earlier EViews versions.

366

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

It is assumed that for the multiple linear regression y = β0 + x1β1 + . . . + xk βk + u assumptions MLR.1 to MLR.4 hold. The pair of hypotheses that has to be tested is H0 : V ar(ui|xi) = σ 2 (homoskedasticity),

H1 : V ar(ui|xi) = σi 6= σ 2 (heteroskedasticity). The general idea underlying heteroskedasticity tests is that under the null hypothesis no regressor should have any explanatory power for V ar(ui|xi). If the null hypothesis is not true, V ar(ui|xi) can be a (nearly arbitrary) function of the regressors xj , (1 ≤ j ≤ k). Note: The Breusch-Pagan test and the White test differ with respect to the specification of their alternative hypothesis.

367

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

Breusch-Pagan Test • Idea: Consider the regression

u2i = δ0 + δ1xi1 + · · · + δk xik + vi,

i = 1, . . . , n.

(9.1)

Under assumptions MLR.1 to MLR.4 the OLS estimator for the δj ’s is unbiased. The pair of hypotheses is: H0 : δ1 = δ2 = · · · = δk = 0 versus H1 : δ1 6= 0 and/or δ2 6= 0 and/or . . .,

since under H0 it holds that E[u2i |X] = δ0.

368

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• Difference from the previous application of the F test:

– The squares of the errors u2i are by no means normally distributed since they are squared quantities and thus cannot take negative values. Hence, the vi cannot be normally distributed and the F distribution of the F statistic does not hold exactly in finite samples. However, the central limit theorem (CLT) works here as well, see Section 5.2, and the F statistic follows approximately an F distribution in large samples. – The errors ui are unknown.They can be replaced by the OLS residuals uˆi. In doing so, the F test remains asymptotically valid (proof is formally sophisticated).

369

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• The R2 version of the test statistic can be used. Note that for a regression including only a constant, it holds that R2 = 0 since SSR = SST (there are no regressors that show a variation). Call the coefficient of variation of the OLS estimation of (9.1) Ru2ˆ2 then F =

Ru2ˆ2 /k

(1 − R2 2 )/(n − k − 1) uˆ

.

The F statistic for testing the joint significance of all regressors is generally given by the appropriate software. • H0 is rejected if F exceeds the critical value for a chosen significance level (or equivalently if the p-value is smaller than the significance level).

370

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• Cigarette Example Continued: (from Section 8.4): Dependent Variable: U_HAT_SQ, Method: Least Squares, Sample: 1 807 Variable C LINCOME LCIGPRIC EDUC AGE AGE^2 RESTAURN

Coefficient -636.3031 24.63849 60.97656 -2.384226 19.41748 -0.214790 -71.18138

R-squared Mean dependent var S.E. of regression F-statistic

Std. Error 652.4946 19.72180 156.4487 4.527535 4.339068 0.047234 30.12789

0.039973, 178.1297, 363.2491, 5.551687,

t-Statistic -0.975185 1.249302 0.389754 -0.526606 4.475034 -4.547398 -2.362641

Adjusted R-squared S.D. dependent var Sum squared resid Prob(F-statistic)

Prob. 0.3298 0.2119 0.6968 0.5986 0.0000 0.0000 0.0184 0.032773, 369.3519, 1.06E+08, 0.000012,

Akaike info crit. 14.63669 Schwarz criterion 14.67740 Log likelihood -5898.905 Durbin-Watson stat 1.937302

The F statistic for the above H0 is 5.55 and the corresponding pvalue is smaller than 1%. The null hypothesis of homoskedastic errors thus is rejected at a level of 1%.

371

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• Note: – If one conjectures that the heteroskedasticity is caused by specific variables that have not been included previously, they can be included in regression (9.1). – If H0 is not rejected, this does not mean automatically that the ui’s are homoskedastic. If the specification (9.1) does not contain all relevant variables causing heteroskedasticity, then it may happen that all δj , j = 1, . . . , k, are jointly insignificant. – A variant of the Breusch-Pagan test is a test for multiplicative heteroskedasticity, i.e. the variance is of the form σi2 = σ 2 · h(x′iβ). If, for example, the case h(·) = exp(·) is assumed, the test equation ln(ˆ u2i ) = ln(σ 2) + x′iβ + v results.

372

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

White Test • Background: For deriving the asymptotic distribution of the OLS estimator the assumption of homoskedastic errors MLR.5 is not necessary. It is enough that the squared errors u2i are uncorrelated with all regressors and the squares and cross products of the latter. This can easily be tested using the following regression, where the errors are already replaced by the residuals: uˆ2i = δ0 + δ1xi1 + · · · + δk xik

+ δk+1x2i1 + · · · + δJ1 x2ik + δJ1+1xi1xi2 + · · · + δJ2 xik−1xik + vi, i = 1, . . . , n.

373

(9.2)

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• The pair of hypotheses is: H0 : δj = 0 for all j = 1, 2, . . . , J2,

vs. H1 : δj 6= 0 for at least one j.

Again, a F test can be used whose distribution is approximated by the F distribution (asymptotic distribution). • With many regressors, it is tedious to implement the F test for (9.2) manually. However, most software packages provide the White test. • When implementing the White test, a large number of parameters has to be estimated if the original model exhibits large k. This is hardly possible in small samples. Then one only includes the squares x2ij into the regression and neglects all cross products. • Note: If the null hypothesis is rejected, this may also be due to violation of MLR.1 or MLR.4. Then, the original regression is misspecified! 374

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• Cigarette Example Continued: White Heteroskedasticity Test: F-statistic Obs*R-squared

2.159257 52.17244

Probability Probability

0.000905 0.001140

Test Equation: Dependent Variable: RESID^2, Method: Least Squares, Sample: 1 807

Variable C LINCOME LINCOME^2 LINCOME*LCIGPRIC LINCOME*EDUC LINCOME*AGE LINCOME*(AGE^2) LINCOME*RESTAURN LCIGPRIC LCIGPRIC^2 LCIGPRIC*EDUC LCIGPRIC*AGE LCIGPRIC*(AGE^2) LCIGPRIC*RESTAURN EDUC EDUC^2 EDUC*AGE EDUC*(AGE^2) EDUC*RESTAURN

Coefficient Std. Error 29374.75 -1049.627 -3.941187 329.8888 -9.591844 -3.354564 0.026704 -59.88701 -10340.67 668.5282 32.91400 62.88178 -0.622372 862.1558 -117.4717 -0.290344 3.617047 -0.035558 -2.896491

20559.14 963.4360 17.07122 239.2417 8.047067 6.682195 0.073025 49.69040 9754.559 1204.316 59.06252 55.29011 0.594730 720.6219 251.2852 1.287605 1.724659 0.017664 10.65709

t-Statistic 1.428793 -1.089462 -0.230867 1.378893 -1.191968 -0.502015 0.365689 -1.205203 -1.060086 0.555110 0.557274 1.137306 -1.046479 1.196405 -0.467484 -0.225492 2.097253 -2.012988 -0.271790

375

Prob. 0.1535 0.2763 0.8175 0.1683 0.2336 0.6158 0.7147 0.2285 0.2894 0.5790 0.5775 0.2558 0.2957 0.2319 0.6403 0.8217 0.0363 0.0445 0.7859

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig AGE AGE^2 AGE*(AGE^2) AGE*RESTAURN (AGE^2)^2 (AGE^2)*RESTAURN RESTAURN

-264.1467 3.468605 -0.019111 -4.933195 0.000118 0.038446 -2868.188

R-squared Mean dependent var S.E. of regression F-statistic

0.064650, 178.1297, 362.8853, 2.159257,

235.7624 3.194651 0.028655 10.84029 0.000146 0.120459 2986.776

-1.120394 1.085754 -0.666935 -0.455080 0.807552 0.319160 -0.960296

Adjusted R-squared S.D. dependent var Sum squared resid Prob(F-statistic)

0.2629 0.2779 0.5050 0.6492 0.4196 0.7497 0.3372

0.034709, 369.3519, 1.03E+08, 0.000905,

Akaike info crit. Schwarz criterion Log likelihood Durbin-Watson stat

Result: With the White test H0 is also rejected.

376

14.65774 14.80895 -5888.398 1.933288

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

Trade Example Continued (from Section 8.4): • Breusch-Pagan test for heteroskedasticity using OLS residuals ==================================================================== Heteroskedasticity Test: Breusch-Pagan-Godfrey ==================================================================== F-statistic 4.139262 Prob. F(5,46) 0.0035 Obs*R-squared 16.13595 Prob. Chi-Square(5) 0.0065 Scaled explained SS 11.61688 Prob. Chi-Square(5) 0.0404 ==================================================================== Test Equation: Dependent Variable: RESID^2 Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 59.29291 40.51240 1.463575 0.1501 LOG(WDI_GDPUSDCR_O) -4.634815 3.174932 -1.459815 0.1511 (LOG(WDI_GDPUSDCR_O))^2 0.075281 0.061518 1.223720 0.2273 LOG(CEPII_DIST) 1.382417 0.745464 1.854439 0.0701

377

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

CEPII_COMCOL_REV -1.642264 0.978128 -1.678986 0.0999 CEPII_COMLANG_OFF 0.394264 1.807698 0.218103 0.8283 ==================================================================== R-squared 0.310307 Mean dependent var 1.661544 Adjusted R-squared 0.235340 S.D. dependent var 2.275813 S.E. of regression 1.990081 Akaike info criterion 4.322395 Sum squared resid 182.1794 Schwarz criterion 4.547538 Log likelihood -106.3823 Hannan-Quinn criter. 4.408709 F-statistic 4.139262 Durbin-Watson stat 2.285868 Prob(F-statistic) 0.003459 ====================================================================

378

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• White test (without cross terms) for heteroskedasticity using OLS residuals Heteroskedasticity Test: White ==================================================================== F-statistic 4.085786 Prob. F(5,46) 0.0037 Obs*R-squared 15.99159 Prob. Chi-Square(5) 0.0069 Scaled explained SS 11.51295 Prob. Chi-Square(5) 0.0421 ==================================================================== Test Equation: Dependent Variable: RESID^2 Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 18.75859 10.91313 1.718900 0.0924 (LOG(WDI_GDPUSDCR_O))^2 -0.055924 0.031777 -1.759911 0.0851 ((LOG(WDI_GDPUSDCR_O))^2)^2 3.05E-05 2.34E-05 1.307676 0.1975 (LOG(CEPII_DIST))^2 0.090498 0.049960 1.811423 0.0766 CEPII_COMCOL_REV^2 -1.582928 0.992436 -1.594992 0.1176 CEPII_COMLANG_OFF^2 0.291977 1.766624 0.165274 0.8695 ==================================================================== R-squared 0.307531 Mean dependent var 1.661544

379

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

Adjusted R-squared 0.232262 S.D. dependent var 2.275813 S.E. of regression 1.994082 Akaike info criterion 4.326412 Sum squared resid 182.9127 Schwarz criterion 4.551555 Log likelihood -106.4867 Hannan-Quinn criter. 4.412726 F-statistic 4.085786 Durbin-Watson stat 2.285412 Prob(F-statistic) 0.003749 ====================================================================

380

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• Breusch-Pagan test for heteroskedasticity using standardized FGLS residuals ==================================================================== Heteroskedasticity Test: Breusch-Pagan-Godfrey ==================================================================== F-statistic 0.683750 Prob. F(5,46) 0.6381 Obs*R-squared 3.597320 Prob. Chi-Square(5) 0.6087 Scaled explained SS 1.922069 Prob. Chi-Square(5) 0.8598 ==================================================================== Test Equation: Dependent Variable: WGT_RESID^2 Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C -0.323375 1.044251 -0.309672 0.7582 LOG(WDI_GDPUSDCR_O)*WGT 0.122435 0.229469 0.533560 0.5962 (LOG(WDI_GDPUSDCR_O))^2*WGT -0.010813 0.007708 -1.402801 0.1674 LOG(CEPII_DIST)*WGT 0.678927 0.586156 1.158269 0.2527 CEPII_COMCOL_REV*WGT -0.781193 0.641851 -1.217095 0.2298 CEPII_COMLANG_OFF*WGT 0.299789 1.228329 0.244062 0.8083 ====================================================================

381

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

R-squared 0.069179 Mean dependent var 0.946185 Adjusted R-squared -0.031997 S.D. dependent var 1.116472 S.E. of regression 1.134193 Akaike info criterion 3.197887 Sum squared resid 59.17414 Schwarz criterion 3.423031 Log likelihood -77.14507 Hannan-Quinn criter. 3.284202 F-statistic 0.683750 Durbin-Watson stat 1.973137 Prob(F-statistic) 0.638076 ====================================================================

382

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

• White test (without cross terms) for heteroskedasticity using FGLS residuals Heteroskedasticity Test: White ==================================================================== F-statistic 0.538645 Prob. F(6,45) 0.7760 Obs*R-squared 3.484361 Prob. Chi-Square(6) 0.7460 Scaled explained SS 1.861714 Prob. Chi-Square(6) 0.9320 ==================================================================== Test Equation: Dependent Variable: WGT_RESID^2 Method: Least Squares Sample: 1 55, Included observations: 52 ==================================================================== Coefficient Std. Error t-Statistic Prob. ==================================================================== C 0.568776 0.520032 1.093733 0.2799 WGT^2 6.158506 9.676452 0.636443 0.5277 (LOG(WDI_GDPUSDCR_O))^2*WGT^2 -0.014641 0.023638 -0.619379 0.5388 ((LOG(WDI_GDPUSDCR_O))^2)^2*WGT^2 6.39E-06 1.41E-05 0.454559 0.6516 (LOG(CEPII_DIST))^2*WGT^2 0.021214 0.022901 0.926322 0.3592 CEPII_COMCOL_REV^2*WGT^2 -0.941910 1.019790 -0.923632 0.3606 CEPII_COMLANG_OFF^2*WGT^2 -0.421654 1.220702 -0.345420 0.7314 ====================================================================

383

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

R-squared 0.067007 Mean dependent var 0.946185 Adjusted R-squared -0.057392 S.D. dependent var 1.116472 S.E. of regression 1.148063 Akaike info criterion 3.238680 Sum squared resid 59.31224 Schwarz criterion 3.501347 Log likelihood -77.20568 Hannan-Quinn criter. 3.339380 F-statistic 0.538645 Durbin-Watson stat 1.955059 Prob(F-statistic) 0.775963 ====================================================================

Results: – Note that the specification of the White test without cross terms follows EViews 6.0 and does not include level terms (in contrast to (9.2)). – Both, the Breusch-Pagan and the White test reject the null hypothesis of homoskedastic errors for the OLS residuals at the 1% significance level. Thus, using OLS with heteroskedasticity-robust standard errors or FGLS in Section 8.4 was justified.

384

Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig

– Both, the Breusch-Pagan and the White test do not reject the null hypothesis of homoskedastic standardized errors in the FGLS framework. Both p-values are above 60%. Thus, the variance regression in Section 8.4 does not seem to be misspecified. – In sum, among all models and estimation procedures considered, the FGLS estimates of Model 5 seem to be the most reliable ones. Reading: Chapter 8 in Wooldridge (2009) (without Section 8.5 concerning linear probability models).

385

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

9.3 Model Specification II: Useful Tests 9.3.1 Comparing Models with Identical Regressand Starting point: two non-nested models (M1) (M2)

y = x0β0 + . . . + xk βk + u = x′β + u, y = z0γ0 + . . . + zmγm + v = z′γ + v,

where k = m does not have to hold. Decision between (M1) and (M2): using • information criteria (AIC, SC, HQ, ...), • encompassing test, • non-nested F test, • J test. 386

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

All three tests can be constituted on the encompassing principle. Encompassing Principle Let two non-nested models be given: (M1) y = x′β + u, (M2) y = z′γ + v. For clarifying the non-nested relationship between (M1) and (M2), define     x′ = w′ x′B , β = β A β B ,     z′ = w′ z′B , γ = γ A γ B ,

such that w contains all common regressors

(M1) y = w′β A + x′B β B + u, (M2) y = w′γ A + z′B γ B + v. 387

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

Idea of the encompassing principle: • If (M1) is correctly specified, it must be able to explain the results of an estimation of (M2) (and vice versa). • If not, (M1) has to be rejected (and vice versa). Derivation: Consider the “artificial nesting model” (ANM) y = w′a + x′B bx + z′B bz + ε,

E[ε|w, xB , zB ] = 0.

Different settings: • (ANM) correctly specified model such that (M1) and (M2) are misspecified. Model (M2) is estimated. • (M1) correctly specified model. Model (M2) is estimated. • (M2) correctly specified model. Model (M1) is estimated. 388

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

In general an omitted variable bias results for all cases. Details: • (ANM) correctly specified model such that (M1) and (M2) are misspecified. Model (M2) is estimated. ⇒ xB omitted. E[y|w, zB ] = E[w′a + x′B bx + z′B bz + ε|w, zB ] = E[w′a|w, zB ] + E[x′B bx|w, zB ] + E[z′B bz|w, zB ] + E[ε|w, zB ] = w′a + E[x′B |w, zB ]bx + z′B bz + E[ε|w, zB ].

For simplicity it is assumed that xB is scalar. Then it holds that xB = w′q + z′B p + ν, E[xB |w, zB ] = w′q + z′B p. It also holds that

E [E[ε|w, xB , zB ]|w, zB ] = E[ε|w, zB ]. 389

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

Since (ANM) is correct, it holds that E[ε|w, xB , zB ] = 0 and thus E[0] = 0 = E[ε|w, zB ]. When estimating (M2) instead of (ANM), one gets E[y|w, zB ] = w′a + [w′q + z′B p]bx + z′B bz ′ [b +pb ] . = w′ [a+qb ] +z | {z x} B | z {z x} γA γB

(9.3)

Note that the biases qbx and pbx are caused by omitting the variable xB . These effects bias the direct impact of w via a and of zB via bz on y.

390

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

• (M1) correctly specified model. Model (M2) is estimated. Then bz = 0 and from (9.3) the following restriction results: pbx = γ B . Now it can be seen that knowing the correctly specified model (M1) is enough for deriving model (M2), thus predicting γ B or the expectation of the OLS estimator. In other words: Since (M2) is “smaller” than (M1) with respect to the relevant variables, the behavior of (M2) can be predicted with the help of (M1) when an unbiased estimator is used for the latter. Then one says “(M1) encompasses (M2)”. (Knowing (M1) is not enough here if (ANM) is the correct model, bz 6= 0.) • (M2) correctly specified model. Model (M1) is estimated. Can be derived just as in the above case. 391

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

Thus, for the null hypothesis “(M1) encompasses (M2)” two equivalent hypotheses can be tested: • H0 : pb − γ B = 0 - more complicated, no details here. (This version is often termed encompassing test and sometimes has advantages in more general models.) • H0 : bz = 0 in (ANM) - easy: by the help of a non-nested F test. Proceeding for more than two alternatives: selection procedure • Based on this same principle, the remaining model competes with further alternative models as long as it is not rejected. • Problem of this principle: it can happen that both null hypotheses have to be rejected.

392

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

Non-nested F test Idea and implementation: • Hypotheses: “H0: model (M1) is correct” versus “H1: model (M1) incorrect”. • Again, partition z′ = (w′, z′B ), where the kA regressors from w are contained in x but the kB regressors from zB are not contained. • Formulate the artificial nesting model (ANM) y = x′β + z′B bz + ε.

• Based on this ANM test H0 where H0 : bz = 0 using an F test with kB degrees of freedom in the numerator and n − m − kB in the denominator. 393

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

• For the test of (M2) vs. (M1) proceed analogously with partition x′ = (w′, x′B ) ... J test (Davidson-MacKinnon test) Idea and implementation: • For the J test the ANM is formulated such that both (M1) and (M2) are nested in the ANM: y = (1 − λ)x′β + λz′γ + ε.

For the case λ = 0 the model (M1) results, for λ = 1 model (M2). • Problem: λ, β and γ are not identified in the above approach. • Solution: replace γ by the OLS estimator from (M2) γˆ . yM 2 + η, where I.e. test H0 : λ = 0 with test equation y = x′β ∗ + λˆ β ∗ = (1 − λ)β and yˆM 2 = z′γˆ is the fitted value from the OLS estimation of (M2). 394

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

• For testing whether (M2) is valid, proceed analogously ... • Interpretation of the logic of the test: For testing model (M1) it is enlarged by the fitted values of model (M2); these (i.e. the by the regressors in (M2) explained part of y) are tested for their significance in the test equation. • Advantages of the J test compared to the non-nested F test: – only one single restriction has to be tested, – higher power, if kB or respectively mB are very large, – in case of kB = 1 or respectively mB = 1 the tests are equivalent.

395

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

9.3.2 Comparing Models with differing Regressand Idea and implementation (of the P test): Example linear model versus log-log alternative • Step 1: Run an OLS estimation for both models. • Step 2: Compute the corresponding fitted values \ ) (log-log model). yˆlin (linear model) and ln(y log • Step 3a: Test the linear approach against the log-log alternative using the ANM X \)] + u, y= xj βj,lin + δlin[ln(ˆ ylin) − ln(y log by a t test with the null hypothesis

H0 : δlin = 0 (linear model is correct).

396

Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig

• Step 3b: Test the log-log approach against the linear alternative using the ANM X \ ylin − exp(ln ylog )] + v, ln(y) = ln(xj )βj,log + δlog [ˆ by a t test with the null hypothesis

H0 : δlog = 0 (log-log model is correct). Problem: it is possible that both hypotheses are rejected (i.e. another functional form is relevant) or both cannot be rejected (i.e. the problem of lacking power or something else). Note: in this case a comparison using the information criteria is not possible. Reading: Chapter 9 in Wooldridge (2009).

397

10 Appendix

10.1 A Condensed Introduction to Probability Preliminary Statement: The following pages are not considered as deterrence, but as supplement to the illustrations found in introductory textbooks for econometrics. This supplement is intended to explain the intuition underlying the large amount of definitions and concepts in probability theory. Nevertheless it is not possible to completely avoid formulas, although it may take some time to clarify your mind. I

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

• Sample space, outcome space: The set Ω contains all possible outcomes of a random experiment. This set can contain (countably) finite or infinite outcomes. Examples: – Urn with 4 balls of different color: Ω = {yellow, red, blue, green} – Monthly income of a household in the future: Ω = [0, ∞) Remark: – If there is a finite number of outcomes, they are often denoted as ωi. For S outcomes, Ω appears as Ω = {ω1, ω2, . . . , ωS }. – If there is an infinite number of outcomes, each one is often denoted as ω.

II

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

• Event: Every set of possible outcomes = every subset of the set Ω including Ω itself. Examples: – Urn-example: possible events are for example {yellow, red} or {red, blue, green}. – Household income: possible events are all possible subintervals and combinations of them, e.g. (0, 5000], [1000, 1001), (400, ∞), 4000, and so on. Remark: By using the general point of view with the ω’s, one has – for the case of S outcomes: {ω1, ω2}, {ωS }, {ω3, . . . , ωS }, and so on. – for the case of infinitely many outcomes located inside an interval III

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

Ω = (−∞, ∞): (a1, b1], [a2, b2), (0, ∞), and so on, where the lower bound always has to be lower or equal the upper bound (ai ≤ bi). • Random variable: A random variable is a function that assigns a real number X(ω) to each outcome ω ∈ Ω. Urn example: X(ω1) = 0, X(ω2) = 3, X(ω3) = 17, X(ω4) = 20. • Density function – Preliminary statement: As we have already seen, it gets complicated if Ω contains infinitely many outcomes. Consider for example Ω = [0, 4]. If one wants to compute the probability for the number π to appear, this probability is equal to zero. If it were not equal to zero, we had the problem that a sum of all probabilities for all (infinitely many) numbers could not be equal IV

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

to 1. What to do? – A back door is the following trick: Consider the probability for the outcome of the random variable X being located in the interval [0, x], with x < 4. This probability can be written as P (X ≤ x). Now determine how the probability changes by extending the size of this interval [0, x] by h. The solution to this is: P (X ≤ x + h) − P (X ≤ x). By relating this change in probability to the interval length one gets P (X ≤ x + h) − P (X ≤ x) . h For a decreasing interval length h that approaches zero, one obtains the following limit: P (X ≤ x + h) − P (X ≤ x) lim = f (x). h h→0 This limit is called probability density function or shortly V

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

density function that belongs to the probability function P . – How to interpret a density function? By using the sloppy formulation P (X ≤ x + h) − P (X ≤ x) ≈ f (x) h and rewriting as P (X ≤ x + h) − P (X ≤ x) ≈ f (x)h, one can see that f (x) determines the rate of change for the probability that X falls into the interval [0, x] if the interval length is extended by h. Hence, the density function is a rate. – As the density function is a derivative, we get conversely for our example Z x

0

f (u)du = P (X ≤ x) = F (x). VI

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

Here, F (x) = P (X ≤ x) is called probability distribution function. Certainly, in this example we get Z 4 f (u)du = P (X ≤ 4) = 1. 0

In general, the integral of the density function over the full support of the random variable yields a value of 1. Consider for example X(ω) ∈ R: Z ∞ f (u)du = P (X ≤ ∞) = 1. −∞

VII

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

• Conditional probability function Let’s begin with an example: Let the random variable X ∈ [0, ∞) be the payoff in a lottery. The probability function (distribution function) P (X ≤ x) = F (x) is the probability for a maximum payoff x. Additionally, we know that there are two machines (machine A and B) that determine the payoff. Question: What is the probability for a maximum payoff of x if machine A is used? In other words, what is the probability of interest if the condition “Machine A is used” is applied? Hence, the probability under consideration is also called conditional probability and written as P (X ≤ x|A). VIII

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

Accordingly one writes P (X ≤ x|B), if the condition “Machine B is used” is applied. Question: What is the relationship between the unconditional probability P (X ≤ x) and the conditional probabilities P (X ≤ x|A) and P (X ≤ x|B)? To answer this question one has to clarify what the corresponding probabilities of using machine A or B are. Denoting these probabilities by P (A) and P (B) we have: P (X ≤ x) = P (X ≤ x|A)P (A) + P (X ≤ x|B)P (B) F (x) = F (x|A)P (A) + F (x|B)P (B) In this example there are two outcomes. The corresponding relation-

IX

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

ship can be extended to n discrete outcomes Ω = {A1, A2, . . . , An}: F (x) = F (x|A1)P (A1) + F (x|A2)P (A2) + · · · + F (x|An)P (An) (10.1) Until now we defined the conditions in terms of events and not in terms of random variables. An example for the latter one were if the payoff is determined by only one machine, but where the mode of operation for this machine is conditioned upon the payoffs’ magnitude Z. In this case, the conditional distribution function is F (x|Z = z), with Z = z meaning that the random variable Z exactly takes the value z. For relating the unconditional and conditional probability we have to replace the sum by an integral, and the probability of the conditioning event by the corresponding density function, as Z can have infinitely many values. For our

X

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

example we obtain: Z ∞ Z ∞ F (x|Z = z)f (z)dz = F (x|z)f (z)dz F (x) = 0

or generally

F (x) =

Z

0

F (x|Z = z)f (z)dz =

Z

F (x|z)f (z)dz

(10.2)

Another important property: If the random variables X and Z are stochastically independent, we have F (x|z) = F (x). • Conditional density function The conditional density function can be heuristically derived from the conditional distribution function in the same way as for the case of the unconditional density function: one simply replaces the unconXI

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

ditional probabilities by conditional probabilities. The conditional density function arises from P (X ≤ x + h|A) − P (X ≤ x|A) = f (x|A). lim h h→0 For finitely many conditions equation (10.1) becomes f (x) = f (x|A1)P (A1) + f (x|A2)P (A2) + · · · f (x|An)P (An).

The relationship Z (10.2) turns to Z f (x) = f (x|Z = z)f (z)dz = f (x|z)f (z)dz.

(10.3)

• Expectation

Consider again the payoff example. Question: Which payoff would you expect “on average”? R∞ Answer: 0 xf (x)dx. For a payoff paid in n different discrete Pn amounts, one would expect i=1 xiP (X = xi) on average. Each XII

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

possible payoff is multiplied by its probability of entry and added up. It is not surprising that the result is denoted as expectation. In general the expectation is defined as Z E[X] = xf (x)dx, X E[X] = xiP (X = xi),

XIII

continuous X, discrete X.

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

• Rules for the expectation e.g. Appendix B in Wooldridge (2009). 1. For each constant c it holds that E[c] = c. 2. For all constants a and b and all random variables X and Y it holds that E[aX + bY ] = aE[X] + bE[Y ]. 3. If the random variables X and Y are independent, it holds that E[Y X] = E[Y ]E[X]. • Conditional expectation So far we did not care for the machine that was used to create the payoff. If we are interested in the expected payoff of using machine

XIV

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

A, we have to calculate the conditional expectation Z ∞ xf (x|A)dx. E[X|A] = 0

This is easily achieved by replacing the unconditional density f (x) by the conditional density f (x|A) and stating the condition in the notation of expectations accordingly. Analogously the expected payoff for machine B is determined as Z ∞ E[X|B] = xf (x|B)dx. 0

In general one has for discrete conditioning events Z E[X|A] = xf (x|A)dx, continuous X, X E[X|A] = xiP (X = xi|A), discrete X, XV

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

and for continuous conditions Z E[X|Z = z] = xf (x|Z = z)dx, X E[X|Z = z] = xiP (X = xi|Z = z),

continuous X, discrete X.

Remark: Frequently, the short versions are used as in Wooldridge (2009). Z E[X|z] = xf (x|z)dx, continuous X, X E[X|z] = xiP (X = xi|z), discrete X. In accordance to the relationship of unconditional and conditional probabilities there is a similar relationship for unconditional and conditional expectations. The relationship is E[X] = E [E[X|Z]] which is denoted as law of iterated expectations (LIE). XVI

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

Sketch of proof: Z E[X] = xf (x)dx  Z Z f (x|z)f (z)dz dx (insert (10.3)) = x Z Z = xf (x|z)f (z)dzdx Z Z xf (x|z)dx f (z)dz (interchange dx and dz) = | {z } E[X|z] Z = E[X|z]f (z)dz =E [E[X|Z]]

In our example with 2 machines, the law of iterated expectations

XVII

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

yields E[X] = E[X|A]P [A] + E[X|B]P (B). This example also shows that the conditional expectations E[X|A] and E[X|B] are random variables. If they are weighted by the corresponding probabilities of entry P (A) and P (B), they yield E[X]. Suppose that, prior to the lottery, you only know both conditional expectations but not which machine is used. Then the expected payoff is equal to E[X] and both conditional expectations are considered as random variables. After knowing what machine is used, the corresponding conditional expectation is the outcome of the random variable. This is a general property of conditional expectations.

XVIII

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

• Rules for conditional expectations e.g. Appendix B in Wooldridge (2009). 1. For each function c(·) it holds that E[c(X)|X] = c(X). 2. For all functions a(·) and b(·) it holds that E[a(X)Y + b(X)|X] = a(X)E[Y |X] + b(X). 3. If the random variables X and Y are independent, it holds that E[Y |X] = E[Y ]. 4. Law of iterated expectations (LIE) E[E[Y |X]] = E[Y ]. 5. E[Y |X] = E[E[Y |X, Z]|X]. XIX

Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig

6. If it holds that E[Y |X] = E[Y ], then it also holds that Cov(X, Y ) = 0.  2 7. If E Y < ∞ and E[g(X)2] < ∞ for an arbitrary function g(·), then the following inequalities hold: E{[Y − E[Y |X]]2|X} ≤ E{[Y − g(X)]2|X} E{[Y − E[Y |X]]2} ≤ E{[Y − g(X)]2}.

XX

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

10.2 Important Rules of Matrix Algebra Matrix addition   a a . . . a1K   11 12  a a ... a   21 22 2K  A= , . . .   . . .   aT 1 aT 2 . . . aT K



   C=  

c11 c12 . . . c21 c22 . . . .. ..

c2K ..

c T 1 c T 2 . . . cT K

If A and C are of the same dimension  a11 + c11 a12 + c12 · · · a1K   a +c  21 21 a22 + c22 · · · a2K A+C= .. ..   aT 1 + cT 1 aT 2 + cT 2 · · · aT K XXI

c1K

+ c1K





   .  

 + c2K   . ..   + cT K

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

Matrix multiplication   a11 a12 · · · a1K    a a ··· a   21 22 2K  A= , . . .   . . .   aT 1 aT 2 · · · aT K





b11 b12 · · · b1L   b   21 b22 · · · b2L  B= . . . .  .  . .   bK1 bK2 · · · bKL

If the number of columns in A is equal to the number of rows in B, then the product C = AB is defined and the following equality holds for every element in C 



b1j K X  ..  = ai1b1j + · · · + aiK bKj = ail blj . cij =  l=1 bKj Caution: In general it holds that AB 6= BA. 

 ai1 · · · aiK  

XXII

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

Transpose of a matrix Given the (2 × 3)-matrix (i.e. 2 rows, 3 columns) ! a11 a12 a13 , A= a21 a22 a23 the transpose of A is the (3 × 2)-matrix   a11 a21   ′ . A = a a 12 22   a13 a23 It holds that

(AB)′ = B′A′.

XXIII

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

Inverse of a matrix Let A be the (K × K)-matrix  a11  a  21 A=  ..  aK1

a12 · · · a1K



 a22 · · · a2K   , .. ..    aK2 · · · aKK

then the inverse of A is A−1 and is defined by   1 0 ... 0   0 1  0   AA−1 = A−1A = IK =   . . ...    0 0 ... 1 with IK as identity matrix of dimension (K × K). XXIV

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

The matrix A is invertible if the rows respectively columns are linearly independent. In other words: No row (column) can be described as linear combination of the other rows (columns). Technically this is satisfied whenever the determinant of A is unequal to zero. Frequently, a noninvertible matrix is called singular. The calculation of an inverse is better left to a computer. Only for matrices of 2 or 3 columns/rows, the calculation is of moderate complexity. Hence a manual calculation can be useful.

XXV

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

Special issue of a (2 × 2) matrix: For a square (2 × 2) matrix b11 b12 b21 b22

B=

!

the determinant is computed as det(B) = b11b22 − b21b12 and the inverse as B−1 =

1 det(B)

b22 −b12 −b21 b11

1 = b11b22 − b21b12 XXVI

!

!

b22 −b12 . −b21 b11

Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig

Example: C=

0 2 1 −1

!

,

with

1 −1 C =

−2

det(C) = 0 · (−1) − 1 · 2 = −2 ! −1 −2 −1 0

Check: CC−1 =

0 2 1 −1

!

1 1 2 1 0 2

1 1 2 1 0 2

= !

=

!

1 0 0 1

!

Reading: As supplement for matrix algebra and its implementation in the multiple linear regression framework see Appendices D, E.1 in Wooldridge (2009).

XXVII

Intensive Course in Econometrics — Section 10.3 — UR March 2009 — R. Tschernig

10.3 Rules for Matrix Differentiation •



   c=  

c1 c2 .. cT



   ,  



   w=  

w1 w2 .. wT 

    z = c′ w = c1 c2 · · · cT    ∂z =c ∂w XXVIII

      

w1 w2 ..

wT

      

Intensive Course in Econometrics — Section 10.3 — UR March 2009 — R. Tschernig





a11 a12 · · · a1T



   a a ··· a   21 22 2T  A=   .................    aT 1 aT 2 · · · aT T  a a · · · a1T  11 12   a a ··· a  21 22 2T z = w ′Aw = w1 w2 · · · wT   .................  aT 1 aT 2 · · · aT T ∂z = (A′ + A)w ∂w

XXIX

      

w1 w2 .. wT

      

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig

10.4 Data for Estimating Gravity Equations Legend for data in gravity data kaz.wf1 • Countries and country codes 1

ALB

Albania 17

GBR United Kingdom 33

NLD

Netherlands

2

ARM

Armenia

18

GEO

Georgia

34

NOR

Norway

3

AUT

Austria

19

GER

Germany

35

POL

Poland

4

AZE

Azerbaijan

20

GRC

Greece

36

PRT

Portugal

5

BEL

Belgium and 21

HRV

Croatia 37 ROM

Romania

Luxembourg 6

BGR

Bulgaria 22

HUN

Hungary 38

RUS

Russia

7

BLR

Belarus 23

IRL

Ireland 39

SVK

Slovakia

8

CAN

Canada 24

ISL

Iceland 40

SVN

Slovenia

9

CHE

25

ITA

Italy 41

SWE

Sweden

10

CYP

Cyprus 26

KAZ

Kazakhstan

42

TKM

Turkmenistan

11

CZE Czech Republic

27

KGZ

Kyrgyzstan

43

TUR

Turkey

12

DNK

28

LTU

Lithuania 44

UKR

Ukraine

13

ESP

Spain 29

LVA

Latvia 45

14

EST

MDA

Moldova 46

Switzerland

Denmark Estonia

30

USA United States YUG

Serbia and Montenegro

15

FIN

16

FRA

Finland 31 MKD France

32

MLT

Macedonia Malta

XXX

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig Notes: Table is based on Table 1 in “Explanatory notes on gravity data.wf1 ” .

Countries that feature only as origin countries: BIH Bosnia and Herzegovina TJK

Tajikistan

UZB

Uzbekistan

CHN

China

HKG

Hong Kong

JPN

Japan

KOR

South Korea

TWN

Taiwan

THA

Thailand

XXXI

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig

• Endogenous variable: – TRADE 0 D O: Imports of country d from country o (i.e., exports of country o to country d) in current US dollars – Commodity classifications: Trade flows are based on aggregating disaggregate trade flows according to the Standard International Trade Classification, Revision 3 (SITC, Rev.3) at the lowest aggregation levels (4- or 5-digit). Source: UN COMTRADE – Without fuels and lubricants (i.e., specifically without petrol and natural gas products). Cut-off value for underlying disaggregated trade flows (at SITC Rev.3 5-digit level) is 500 US dollars. • Explanatory variables:

XXXII

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig Origin country WDI GDPUSDCR O

Origin country GDP data; in current US dollars

World Bank - World Development Indicato

WDI GDPPCUSDCR O

Origin country GDP per capita data; in current US dollars

World Bank - World Development Indicato

WEO GDPCR O

Destination and origin country GDP data; in current US dollars

IMF - World Economic Outlook database

WEO GDPPCCR O

Destination and origin country GDP per capita data; in current US dollars IMF - World Economic Outlook database

WEO POP O

Origin country population data 2

IMF - World Economic Outlook database

CEPII AREA O

area of origin country in km

CEPII

CEPII COL45

dummy; d and o country have had a colonial relationship after 1945

CEPII

CEPII COL45 REV

dummy; revised by “expert knowledge”

CEPII COLONY

dummy; d and o country have ever had a colonial link

CEPII

CEPII COMCOL

dummy; d and o country share a common colonizer since 1945

CEPII

CEPII COMCOL REV

dummy; revised by “expert knowledge”

CEPII COMLANG ETHNO

dummy; d and o country share a language

CEPII

CEPII COMLANG ETHNO REV at least spoken by 9% of each population CEPII COMLANG OFF

dummy; d and o country share common official language

CEPII

CEPII CONTIG

dummy; d and o country are contiguous (neighboring countries)

CEPII

CEPII DISINT O

internal distance in origin country

CEPII

CEPII DIST

geodesic distance between d and o country

CEPII

CEPII DISTW

p distance between d and o country based on capitals 0.67 area/π

weighted distances, see CEPII for details

CEPII

CEPII DISTWCES

weighted distances, see CEPII for details

CEPII

CEPII LAT O

latitute of the city

CEPII

CEPII LON O

longitute of the city

CEPII

CEPII SMCTRY REV

dummy; d and o country were/are the same country

CEPII, revised

ISO O

ISO codes in three characters of origin country

CEPII

EBRD TFES O

EBRD measure of foreign trade and payments liberalisation of o country

EBRD

CEPII DISTCAP

XXXIII

CEPII

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig Destination country WDI GDPUSDCR D

Destination country GDP data; in current US dollars

World Bank - World Development Indicators

WDI GDPPCUSDCR D Destination country GDP per capita data; in current US dollars

World Bank - World Development Indicators

WEO GDPCR D

Destination and origin country GDP data; in current US dollars

IMF - World Economic Outlook database

WEO GDPPCCR D

Destination and origin country GDP per capita data; in current US dollars IMF - World Economic Outlook database

WEO POP D

Destination country population data

IMF - World Economic Outlook database

Notes: The EBRD measures reform on a scale between 1 and 4+ (=4.33); 1 represents no or little progress; 2 indicates important progress; 3 is substantial progress; 4 indicates comprehensive progress, while 4+ indicates countries have reached the standards and performance norms of advanced industrial countries, i.e., of OECD countries. By construction, this variable is ordered qualitative rather than cardinal.

• Thanks: to Richard Frensch, Osteuropa-Institut, for providing the data set.

XXXIV

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig

• EViews-Commands to extract selected data from main workfile: – to select observations of countries that export to Kazakhstan: in workfile: Proc → Copy/Extract from Current Page → By Value to New Page or Workfile:

in Sample - observations to copy: @all if (iso d="KAZ"). Objects to copy: select. Page Destination: select.

– to select observations for one period, e.g. 2004: as above but: in Sample - observations to copy: 2004 2004

– to select observations for trade flows from Germany to Kazakhstan for all periods: as above but: in Sample - observations to copy: @all if (iso o="KAZ") and (iso d="GER")

• Websites CEPII XXXV

Bibliography

Anderson, J. E. & Wincoop, E. v. (2003), ‘Gravity with gravitas: A solution to the border puzzle’, The American Economic Review 93, 170–192. Davidson, R. & MacKinnon, J. G. (2004), Econometric Theory and Methods, Oxford University Press, Oxford. Fratianni, M. (2007), The gravity equation in international trade, Technical report, Dipartimento di Economia, Universita Politecnica delle Marche. XXXVI

Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig

Pindyck, R. S. & Rubinfeld, D. L. (1998), Econometric models and economic forecasts, Irwin McGraw-Hill. Stock, J. H. & Watson, M. W. (2007), Introduction to Econometrics, Pearson, Boston, Mass. Wooldridge, J. M. (2009), Introductory Econometrics. A Modern Approach, 4th edn, Thomson South-Western.

XXXVII