Order Statistics

2.5 Conditional distributions, order statistics as a Markov ... 9.2 Quick measures of ... tants of order statistics, and on testing for outliers from ...

0 downloads 134 Views 930KB Size
Order Statistics Third Edition

WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Noel A. C. Cressie, Nicholas I. Fisher, Iain M. Johnstone, J. B. Kadane, Louise M. Ryan, David W. Scott, Adrian F. M. Smith, JozefL. Teugels Editors Emeriti: Vic Barnett, J. Stuart Hunter, David G. Kendall A complete list of the titles in this series appears at the end of this volume.

Order Statistics Third Edition

H. A. DAVID Iowa State University Department of Statistics Ames, I A

H. N. NAGARAJA The Ohio State University Department of Statistics Columbus, OH

iWILEYINTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION

Copyright © 2003 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail: [email protected] Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data: David, H. A. (Herbert Aron), 1925Order statistics — 3rd ed. / H.A. David, H.N. Nagaraja. p. cm. Includes bibliographical references and index. ISBN 0-471-38926-9 (cloth) 1. Order statistics. I. Nagaraja, H. N. (Haikady Navada), 1954-. II. Title. QA278.7.D38 2003 519.5—dc21 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1

2003050174

To Ruth—HAD To my mother, Susheela—HNN

This page intentionally left blank

Contents PREFACE 1 INTRODUCTION 1.1 The subject of order statistics 1.2 The scope and limits of this book 1.3 Notation 1.4 Exercises

xi 1 1 3 4 7

2 BASIC DISTRIBUTION THEORY 2.1 Distribution of a single order statistic 2.2 Joint distribution of two or more order statistics 2.3 Distribution of the range and of other systematic statistics 2.4 Order statistics for a discrete parent 2.5 Conditional distributions, order statistics as a Markov chain, and independence results 2.6 Related statistics 2.7 Exercises

9 9 11

17 20 22

3 EXPECTED VALUES AND MOMENTS 3.1 Basic formulae 3.2 Special continuous distributions

33 33 40

13 16

vii

viii

CONTENTS

3.3 3.4 3.5

The discrete case Recurrence relations Exercises

4 BOUNDS AND APPROXIMATIONS FOR MOMENTS OF ORDER STATISTICS 4.1 Introduction 4.2 Distribution-free bounds for the moments of order statistics and of the range 4.3 Bounds and approximations by orthogonal inverse expansion 4.4 Stochastic orderings 4.5 Bounds for the expected values of order statistics in terms of quantiles of the parent distribution 4.6 Approximations to moments in terms of the quantile function and its derivatives 4.7 Exercises 5

THE NON-IID CASE 5.7 Introduction 5.2 Order statistics for independent nonidentically distributed variates 5.3 Order statistics for dependent variates 5.4 Inequalities and recurrence relations—non-IID cases 5.5 Bounds for linear functions of order statistics and for their expected values 5.6 Exercises

6 FURTHER DISTRIBUTION THEORY 6.1 Introduction 6.2 Studentization 6.3 Statistics expressible as maxima 6.4 Random division of an interval 6.5 Linear functions of order statistics 6.6 Moving order statistics

42 44 49

59 59 60 70 74 80 83 86 95 95 96 99 102 106 113 121 121 122 124 133 137 140

CONTENTS

6.7 6.8 6.9

Characterizations Concomitants of order statistics Exercises

7 ORDER STATISTICS IN NONPARAMETRIC INFERENCE 7.1 Distribution-free confidence intervals for quantiles 7.2 Distribution-free tolerance intervals 7.3 Distribution-free prediction intervals 7.4 Exercises

iX

142 144 148

159 159 164 167 169

8 ORDER STATISTICS IN PARAMETRIC INFERENCE 8.1 Introduction and basic results 8.2 Information in order statistics 8.3 Bootstrap estimation of quantiles and of moments of order statistics 8.4 Least-squares estimation of location and scale parameters by order statistics 8.5 Estimation of location and scale parameters for censored data 8.6 Life testing, with special emphasis on the exponential distribution 8.7 Prediction of order statistics 8.8 Robust estimation 8.9 Exercises

171 171 180

204 208 211 223

9 SHORT-CUT PROCEDURES 9.1 Introduction 9.2 Quick measures of location 9.3 Range and mean range as measures of dispersion 9.4 Other quick measures of dispersion 9.5 Quick estimates in bivariate samples 9.6 The studentized range 9.7 Quick tests 9.8 Ranked-set sampling

239 239 241 243 248 250 253 257 262

183 185 191

X

CONTENTS

9.9 9.10 9.11 9.12

O-statistics and L-moments in data summarization Probability plotting and tests of goodness of fit Statistical quality control Exercises

10 ASYMPTOTIC THEORY 10.1 Introduction 10.2 Representations for the central sample quantiles 10.3 Asymptotic joint distribution of central quantiles 10.4 Optimal choice of order statistics in large samples 10.5 The asymptotic distribution of the extreme 10.6 The asymptotic joint distribution of extremes 10.7 Extreme-value theory for dependent sequences 10.8 Asymptotic properties of intermediate order statistics 10.9 Asymptotic results for multivariate samples 10.10 Exercises 11 ASYMPTOTIC RESULTS FOR FUNCTIONS OF ORDER STATISTICS 11.1 Introduction 11.2 Asymptotic distribution of the range, midrange, and spacings 11.3 Limit distribution of the trimmed mean 11.4 Asymptotic normality of linear functions of order statistics 11.5 Optimal asymptotic estimation by order statistics 11.6 Estimators of tail index and extreme quantiles 11.7 Asymptotic theory of concomitants of order statistics 11.8 Exercises

268 2 70 274 277 283 283 285 288 290 296 306 309 311 313 315

323 323 324 329 331 335 341 345 350

APPENDIX: GUIDE TO TABLES AND ALGORITHMS 355 REFERENCES

367

INDEX

451

Preface to Third Edition

Since the publication in 1981 of the second edition of this book both theory and applications of order statistics have greatly expanded. In this edition Chapters 2-9 deal with finite-sample theory, with division into distribution theory (Chapters 2-6) and statistical inference (Chapters 7-9). Asymptotic theory is treated in Chapters 10 and 11, representing a doubling in coverage. In the spirit of previous editions we present in detail an up-to-date account of what we regard as the basic results of the subject of order statistics. Many special topics are also taken up, but for these we may merely provide an introduction if other more extensive accounts exist. The number of references has increased from 1000 in the second edition to around 1500, and this in spite of the elimination of a good many references cited earlier. Even so, we had to omit a larger proportion of relevant references than before, giving some preference to papers not previously mentioned in review articles. In addition to an increased emphasis on asymptotic theory and on order statistics in other than random samples (Chapter 5), the following sections are entirely or largely new: 2.6. Related statistics; 4.4. Stochastic orderings; 6.6. Moving order statistics; 6.7. Characterizations; 7.3. Distribution-free prediction intervals; 8.2. Information in order statistics; 8.3. Bootstrap estimation; 9.6. Studentizedrange;9.8. Ranked-set sampling; and 9.9. O-statistics and L-moments in data summarization. Section 6.6 includes a major application to median and order-statistic filters and Section 9.6 to bioequivalence testing. XI

Xii

PREFACE

Order Statistics continues to be both textbook and guide to the research literature. The reader interested in a particular section is urged at least to skim the exercises for that section and where relevant to look at the corresponding appendix section for related statistical tables and algorithms. We are grateful to Stephen Quigley, Editor, Wiley Series in Probability and Statistics, for encouraging us to prepare this edition. Encouragement was also provided by N. Balakrishnan, to whom we owe a special debt for his careful reading of most of the book. This has resulted in many corrections and clarifications as well as several suggestions. Chapter 10 benefited from a careful perusal by Barry Arnold. D. Dharmappa in Bangalore prepared a preliminary version of the manuscript in BTgX with speed and accuracy. The typing of references and index was ably done by Jeanette LaGrange of the Iowa State Statistics Department. We also acknowledge with appreciation the general support of our respective departments. H. A. DAVID H. N. NAGARAJA Ames, Iowa Columbus, Ohio January 2003

PREFACE

XiH

Preface to Second Edition In the ten years since the first edition of this book there has been much activity relevant to the study of order statistics. This is reflected by an appreciable increase in the size of this volume. Nevertheless it has been possible to retain the outlook and the essential structure of the earlier account. The principal changes are as follows. Sections have been added on order statistics for independent nonidentically distributed variates, on linear functions of order statistics (in finite samples), on concomitants of order statistics, and on testing for outliers from a regression model. In view of major developments the section on robust estimation has been greatly expanded. Important progress in the asymptotic theory has resulted in the complete rewriting, with the help of Malay Ghosh, of the sections on the asymptotic joint distribution of quantiles and on the asymptotic distribution of linear functions of order statistics. Many other changes and additions have also been made. Thus the number of references has risen from 700 to 1000, in spite of some deletions of entries in the first edition. Many possible references were deemed either insufficiently central to our presentation or adequately covered in other books. By way of comparison it may be noted that the first (and so far only) published volume of Harter's (1978b) annotated bibliography on order statistics contains 937 entries covering the work prior to 1950. I am indebted to P. G. Hall, P. C. Joshi, Gordon Simons, and especially Richard Savage for pointing out errors in the first edition. The present treatment of asymptotic theory has benefitted from contributions by Ishay Weissman as well as Malay Ghosh. All the new material in this book has been read critically and constructively by H. N. Nagaraja. It is a pleasure to thank also Janice Peters for cheerfully given extensive secretarial help. In addition, I am grateful to the U.S. Army Research Office for longstanding support. H. A. DAVID Ames, Iowa July 1980

Xiv

PREFACE

Preface Order statistics make their appearance in many areas of statistical theory and practice. Recent years have seen a particularly rapid growth, as attested by the references at the end of this book. There is a growing recognition that the large body of theory, techniques, and applications involving order statistics deserves study on its own, rather than as a mere appendage to other fields, such as nonparametric methods. Some may decry this increased specialization, and indeed it is entirely appropriate that the most basic aspects of the subject be incorporated in general textbooks and courses, both theoretical and applied. On the other hand, there has been a clear trend in many universities toward the establishment of courses of lectures dealing more extensively with order statistics. I first gave a short course in 1955 at the University of Melbourne and have since then periodically offered longer courses at the Virginia Polytechnic Institute and especially at the University of North Carolina, where much of the present material has been tried out. In this book an attempt is made to present the subject of order statistics in a manner combining features of a textbook and of a guide through the research literature. The writing is at an intermediate level, presupposing on the reader's part the usual basic background in statistical theory and applications. Some portions of the book, are, however, quite elementary, whereas others, particularly in Chapters 4 and 9, are rather more advanced. Exercises supplement the text and, in the manner of M. G. Kendall's books, usually lead the reader to the original sources. A special word is needed to explain the relation of this book to the only other existing general account, also prepared in the Department of Biostatistics, University of North Carolina, namely, the multiauthored Contributions to Order Statistics, edited by A. E. Sarhan and B. G. Greenberg, which appeared in this Wiley series in 1962. The present monograph is not meant to replace that earlier one, which is almost twice as long. In particular, the extensive set of tables in Contributions will long retain their usefulness. The present work contains only a few tables needed to clarify the text but provides, as an appendix, an annotated guide to the massive output of tables scattered over numerous journals and books; such tables are essential for the ready use of many of the methods described. Contributions was not designed as a textbook and is, of course, no longer quite up to date. However, on a number of topics well developed by 1962 more extensive coverage will be found there than here. Duplication of all but the most fundamental material has been kept to a minimum. In other respects also the size of this book has been kept down by deferring wherever feasible to available specialized monographs. Thus plans for the treatment of the role of order statistics in simultaneous inference have largely been abandoned in view of R. G. Miller's very readable account in 1966.

PREFACE

XV

The large number of references may strike some readers as too much of a good thing. Nevertheless the list is far from complete and is confined to direct, if often brief, citations. For articles dealing with such central topics as distribution theory and estimation I have aimed at reasonable completeness, after elimination of superseded work. Elsewhere the coverage is less comprehensive, especially where reference to more specialized bibliographies is possible. In adopting this procedure I have been aided by knowledge of H. L. Barter's plans for the publication of an extensive annotated bibliography of articles on order statistics. It is a pleasure to acknowledge my long-standing indebtedness to H. O. Hartley, who first introduced me to the subject of order statistics with his characteristic enthusiasm and insight. I am also grateful to E. S. Pearson for his encouragement over the years. In writing this book I have had the warm support of B. G. Greenberg. My special thanks go to P. C. Joshi, who carefully read the entire manuscript and made many suggestions. Belpful comments were also provided by R. A. Bradley, J. L. Gastwirth, and P. K. Sen. Expert typing assistance and secretarial help were rendered by Mrs. Delores Gold and Mrs. Jean Scovil. The writing was supported throughout by the Army Research Office, Durham, North Carolina. H. A. DAVID

Chapel Hill, North Carolina December 1969

This page intentionally left blank

1 Introduction

1 .1 THE SUBJECT OF ORDER STATISTICS If the random variables X\ , . . . , Xn are arranged in order of magnitude and then written as

we call X^ the z'th order statistic (i = 1, ..., n). In much of this book the (unordered) Xi are assumed to be statistically independent and identically distributed. Even then the X(^ are necessarily dependent because of the inequality relations among them. At times we shall relax the assumptions and consider nonidentically distributed Xi as well as various patterns of dependence. The subject of order statistics deals with the properties and applications of these ordered random variables and of functions involving them. Examples are the extremes X{n} and ^T(i), the range W = X^ — -X"(i), the extreme deviate (from the sample mean) A"(n) — X, and, for a random sample from a normal N(p,, a"2} distribution, the studentized range W/S,,, where 5,, is a root-mean-square estimator of a based on i/ degrees of freedom. All these statistics have important applications. The extremes arise, for example, in the statistical study of floods and droughts, in problems of breaking strength and fatigue failure, and in auction theory (Krishna, 2002). The range is well known to provide a quick estimator of a and has found particularly wide acceptance in the field of quality control. The extreme deviate is a basic tool in the detection of outliers, large values of (X(n) — X)/o- indicating the presence of an excessively large observation. The studentized range is of key importance in

2

INTRODUCTION

the ranking of "treatment" means in analysis of variance situations. More recently this statistic has been found useful in bioequivalance testing and for related tests of interval hypotheses (Section 9.6). Another major example is the median M = X/i^+y ^ or |(X(i n ) 4- X^in+^) according as n is odd or even. Long known to be a robust estimator of location, the median for n odd has come to be widely used as a smoother in a time series X\, X?,..., that is, one plots or records the moving median M^ = med (Xj,Xj+i..., Xj+n-i), j = 1,2, ...Under the term median filter this statistic has been much used and studied in signal and image processing. With the help of the Gauss-Markov theorem of least squares it is possible to use linear functions of order statistics quite systematically for the estimation of parameters of location and/or scale. This application is particularly useful when some of the observations in the sample have been "censored," since in that case standard methods of estimation tend to become laborious or otherwise unsatisfactory. Life tests provide an ideal illustration of the advantages of order statistics in censored data. Since such experiments may take a long time to complete, it is often desirable to stop after failure of the first r out of n (similar) items under test. The observations are the r times to failure, which here, unlike in most situations, arrive already ordered for us by the method of experimentation; from them we can estimate the necessary parameters, such as the true mean life. Other occurrences arise in the study of the reliability of systems. A system of n components is called a k-out-of-n-system if it functions if and only if (iff) at least A; components function. For components with independent lifetime distributions F\,..., Fn the time to failure of the system is seen to be the (n — k + l)th order statistic from the set of underlying heterogeneous distributions FI ,..., Fn. The special cases k = n and k = 1 correspond respectively to series and parallel systems. Computers have provided a major impetus for the study of order statistics. One reason is that they have made it feasible to look at the same data in many different ways, thus calling for a body of versatile, often rather informal techniques commonly referred to as data analysis (cf. Tukey, 1962; Mosteller and Tukey, 1977). Are the data really in accord with (a) the assumed distribution and (6) the assumed model? Clues to (a) may be obtained from a plot of the ordered observations against some simple function of their ranks, preferably on probability paper appropriate for the distribution assumed. A straight-line fit in such a probability plot indicates that all is more or less well, whereas serious departures from a straight line may reveal the presence of outliers or other failures in the distributional assumptions. Similarly, in answer to (6), one can in simple cases usefully plot the ordered residuals from the fitted model. Somewhat in the same spirit is the search for statistics and tests that, although not optimal under ideal (say normal-theory) conditions, perform well under a variety of circumstances likely to occur in practice. An elementary example of these robust methods is the use, in samples from symmetrical populations, of the trimmed mean, which is the average of the observations remaining after the most extreme k (k/n < |) at each end have been removed. Loss of efficiency in the normal case

THE SCOPE AND LIMITS OF THIS BOOK

3

may, for suitable choice of k, be compensated by lack of sensitivity to outliers or to other departures from an assumed distribution. Finally, we may point to a rather narrower but truly space-age application. In large samples (e.g., of particle counts taken on a spacecraft) there are interesting possibilities for data compression (Eisenberger and Posner, 1965), since the sample may be replaced (on the spacecraft's computer) by enough order statistics to allow (on the ground) both satisfactory estimation of parameters and a test of the assumed underlying distributional form. The availability of such large data sets, a common feature, for example, in environmental and financial studies, where central or extreme order statistics are of main interest, necessitates the development of asymptotic theory for order statistics and related functions.

1.2 THE SCOPE AND LIMITS OF THIS BOOK Although we will be concerned with all of the topics sketched in the preceding section, and with many others as well, the field of order statistics impinges on so many different areas of statistics that some limitations in coverage have to be imposed. To start with, unlike Wilks (1948), we use "order statistics" in the narrower sense now widely accepted: we will nor deal with rank-order statistics, as exemplified by the Wilcoxon two-sample statistic, although these also require an ordering of the observations. The point of difference is that rank-order statistics involve the ranks of the ordered observations only, not their actual values, and consequently lead to nonparametric or distribution-free methods—at any rate for continuous random variables. On the other hand, the great majority of procedures based on order statistics depend on the form of the underlying population. The theory of order statistics is, however, useful in many nonparametric problems and also in an assessment of the nonnull properties of rank tests, for example, by the power function. Other restrictions in this book have more of an ad hoc character. Order statistics play an important supporting role in multiple comparisons and multiple decision procedures such as the ranking of treatment means. In view of the useful books by Gupta and Panchapakesan (1979), Hsu (1996), and especially Hochberg and Tamhane (1987), there seems little point in developing here the inference aspects of the subject, although the needed order-statistics theory is either given explicitly or obtainable by only minor extensions. In the same spirit we have eliminated the chapter in previous editions on tests for outliers, in view of the excellent extensive account by Barnett and Lewis (1994). However, we have expanded a section on robust estimation in which robustness against outliers is a major issue. A vast literature has developed on the analysis of data subject to various kinds of censoring. We treat this in some detail, but confine ourselves largely to normal and exponential data. These two cases are the most important and also bring out the statistical issues involved.

4

INTRODUCTION

Much more could be said about asymptotic methods than we do in Chapters 10 and 11. In fact, the asymptotic theory of extremes and of related statistics has been developed at length in a book by Galambos (1978, 1987), and a detailed compilation of theoretical results emphasizing financial applications is given by Embrechts, Kliippelburg, and Mikosch (1997); on the more applied side Gumbel's (1958) account continues to be valuable. The asymptotic theory of central order statistics and of linear functions of order statistics has also been a very active research area in more recent years and is well covered in Shorack and Wellner (1986). We have thought it best to confine ourselves to a detailed treatment of some of the most important results and to a summary of other developments. The effective application of order-statistics techniques requires a great many tables. Inclusion even of only the most useful would greatly increase the bulk of this book. We therefore limit ourselves to a few tables needed for illustration; for the rest, we refer the reader to general books of tables, such as the two volumes by Pearson and Hartley (1970, 1972) and the collection of tables in Sarhan and Greenberg (1962). Extensive tables of many functions of interest have been prepared by Harter (1970a, b) in two large volumes devoted entirely to order statistics. Harter and Balakrishnan (1996, 1997) have extended these. Many references to tables in original papers are given throughout our text, and a guided commentary to available tables and algorithms is provided in the Appendix. Related Books Many books on various aspects of order statistics have been written in the last 20 years and are cited in the text as needed. Those of a general nature include the somewhat more elementary and shorter account by Arnold, Balakrishnan, and Nagaraja (1992) and two large multi-authored volumes on theory and applications, edited by Balakrishnan and Rao (1998a, b). Emphasizing asymptotic theory are books by Leadbetter, Lindgren, and Rootzen (1983), Galambos (1987), Resnick (1987), Reiss (1989), and Embrechts et al. (1997).

1.3

NOTATION

Although this section may serve for reference, the reader is urged to look through it before proceeding further. As far as is feasible, random variables, or variates for short, will be designated by uppercase letters, and their realizations (observations) by the corresponding lowercase letters. By order statistics we will mean either ordered variates or ordered observations. Thus: i,..., Xn i,..., xn

unordered variates unordered observations

NOTATION

<

ordered variates 1 order statistics ordered observations J ordered variates- -extensive form

< x(n)

X\-.n < • • • <

When the sample size n needs to be emphasized, we use the extensive form of notation, switching rather freely from the extensive to the brief form. cumulative distribution function (cdf) of X empirical distribution function probability density function (pdf) for a continuous variate probability function (pf) for a discrete variate probability integral transformation inverse cdf or quantile function,

F(x] = Pr{X < x} Fn(x]

/(*)

u = F(x] = inf{x : F(x) > u} = Q(u] (sometimes x(u}) F(r}(x},Fr:n(x] = Pr{X(r)
<

F~l (0) = lower bound of F cdf of X(r},Xr-n r = l,...,n pdf or pf of X(r) , Xr:n joint cdf of JC(r) and X(s) I
But the sample median is

n odd n even (sample) range ith quasi-range

w

P = 0X1

= W]

mean of k ranges of n W for jfth sample concomitant of X(r) , Xr:n mean, variance of X means of X, Y (bivariate case) variance of X, Y covariance of X, Y correlation coefficient

INTRODUCTION

p,r:n = E(Xr-.n) HXb = E(X*.n) Pr,s:n ~

mean of Xr-n kth raw moment of Xr:n

E(Xr:nXs:n)

<2(w) = F~l(u) pr = r/(n + l),qr = l-p r

inverse cdf, quantile function

Qr = Q(pr), /r = /(Qr) Qr = dQ(pr)/dpr

=

l/fr

Sv Qn^v = Wn/Sv 5 = [£(Xi - X)*/(n - 1)] * S 0, b > 0 7p(a, 6) = /Qp ta~l(l - t}b~ldtl B(a,6) /?(a, 6) b(p, n) xf, 1 1 2 <£(x) = (27r)~2e~2 x

estimator of cr based on v DF; for a N(n,(r2) distribution vS^a1 = xl studentized range (Wn, Sv independent) (internal) estimator of a pooled estimator of a S for j th sample F is in the domain of maximal attraction of the extreme-value cdf G—(10.5.2) holds beta function incomplete beta function (1.3.1) beta variate X having cdf Pr{X
— 00 < X < OO

$(x) = J^ <{>(t}dt N((j,, a } AT(^t, S) nW

[x] rv pdf cdf iid inid

standard normal cdf normal variate, mean fj,, variance cr2 multinomial variate, mean vector /Lt, covariance matrix E

= n ( n - l ) . . . ( n - f c + l) k = 1, ...,n integral part of x, but n^ = E(X^ ) and Y[r] = concomitant of X(r) random variable probability density function cumulative distribution function independent, identically distributed independent, nonidentically distributed

EXERCISES

c.f. a.s. Id DF ML LS BLUE UMVU UMP H1, H2 HB1, HB2

characteristic function almost surely limit distribution degrees of freedom maximum likelihood least squares best linear unbiased estimator uniformly minimum variance unbiased uniformly most powerful Harter (1970a, b), Range, Order Statistics...1,2 Harter and Balakrishnan (1997,1996), Range, Order Statistics...!, 2 Balakrishnan and Rao (1998 a, b) Order Statistics: Theory, Applications... 1, 2 Pearson and Hartley (1970, 1972) Biometrika Tables 1, 2 Sarhan and Greenberg (1962) Contributions to Order Statistics Exercise ("example" is written in full) decimal (e.g., to 3D = to 3 decimal places) significant (e.g., to 4S = to 4 significant figures) appendix listing of tables relating to Section 3.2

BR1, BR2 PHI, PH2 SG Ex. D S A3.2

1.4

EXERCISES

1.1. For real numbers x i , . . . , xn determine all c values for which ^"=1 \Xi — c\ is the smallest. 1.2. Let x r: n(xi,...., x n ) denote the rth largest among the real numbers xi,..., x n . (a) Show that Xl-.2(xi,X2)

= m i n ( x i , X 2 ) = f (Xi + X 2 ) -

|JX1 - X 2 | ,

£ 2 : 2(xi,x 2 ) = max(xi,X2) = |(xi +x 2 ) + ||xi -x 2 |. (6) Show also that Xl-.n(Xl, . . . , X n ) = X l : 2 ( x i : n _ l ( x i , . . . , X n - l ), £l:n-l (#2, . . . , X n ) ) , Xn:n(xi, . . . , X n ) = X 2 : 2 (x n -l:n-l (x\ , . . . , X n _ i ), X n - i : n _ l (x 2 , . . . , X n ) ) ,

and that for r = 2,3,..., n - 1 Xr-.n(xi, ...,Xn) = X l : 2 ( X r : n - l ^ l , X 2 , . . . , X n - l ), • • - , Xr:n-l (x n , X\ , . . . , X n _ 2 ) ) .

(Meyer, 1969)

8

INTRODUCTION

1.3. For the real numbers xi,...., xn let max^^xi,..., x n ) denote the i/th largest (v = l,...,n). Also define xn,v = max(l/)

xi,xi -f x 2 , . . . , x

and Xm,v = max Show that

(a) xn,v = xi + max (l/) (0,x 2) ...,£" =2 au), (&) xn,i/ = xi +max ( l / ) (0,Xn-i,i,--.,x„-!,„) 2

(c) x n ,^ = majc '(xi,xi + Xn-i, v -i,x* + xJi_

i/ = l,...,n-l, \

^—.2

yj

l-n>3 (Pollaczek, 1975)

2 Basic Distribution Theory

2.1

DISTRIBUTION OF A SINGLE ORDER STATISTIC

We suppose that X\ , . . . , Xn are n independent variates, each with cumulative distribution function (cdf)F(x). Let F(r) (x)(r = 1, . . . , n) denote the cdf of the rth order statistic A"(r) . Then the cdf of the largest order statistic X(n) is given by F(n)(x)

= =

Pr{X(n}
(2.1.1)

Likewise we have

=

< x} = 1 - Pr{X(1} > x} 1 - Prfall Xi > x} = 1 - [1 - F(x)]n.

(2.1.2)

These are important special cases of the general result for F(r) (x): F(r)(x)

= =

PT{X(r)
since the term in the summand is the binomial probability that exactly are less than or equal to x.

iofXi,...,Xn

10

BASIC DISTRIBUTION

THEORY

An alternative form of (2.1.3) is F(r)(x) = F'(x)

_

l - F(x)P, '

j=o ^

where the RHS is the sum of the probabilities that exactly rofXi,..., Xr+j , including Xr+j, are less than or equal to x. This negative binomial version of (2.1.3) may also be obtained by repeated application of (6) in Ex. 2.1.6. We write (2.1.3) as F ( r ) (x)=£ F ( x ) (n,r)

(2.1.4)

and note that the E function has been tabled extensively (e.g., Harvard Computation Laboratory, 1955, where the notation F/(n, r, F(x)) is used). Alternatively, from the well-known relation between binomial sums and the incomplete beta function we have F(r)(x) = IF(x)(r,n- r + 1),

(2.1.5)

where Ip(a, b) is defined by (1.3.1). Thus F(r) (x) can also be evaluated from tables of 7p(a, 6) (K. Pearson, 1934). Percentage points of X(r) may be obtained by inverse interpolation in the above tables or more directly from Table 16 of Biometrika Tables (Pearson and Hartley, 1970), which gives percentage points of the incomplete beta function. Example 2.1. Find the upper 5% point of J£(4) in samples of 5 from a standard normal parent. We require x such that J F ( X ) (4,2)=0.95 or

W(.) (2, 4) =0.05. This gives 1 - F(x) = 0.07644 and hence x = 1.429. It should be noted that results (2.1.1)-(2.1.5) hold equally for continuous and discrete variates. We shall now assume that Xi is continuous with probability density function (pdf) /(x) = F'(x), but will return to the discrete case in Section 2.4. If /( r )(x) denotes the pdf of _X"(r) we have from (2.1.5) F(x) A f x n r T- / tr~l(l v - t) ~ dt

B(r,n-r + l) dx./0 1

B(r,n-r + 1)

'

F'-1(x)[l-F(x)]"-'7(x).

(2.1.6)

In view of the importance of this result we will also derive it otherwise. The event x < J£(r) < x -I- 5x may be realized as follows:

JOINT DISTRIBUTION OF TWO OR MORE ORDER STATISTICS

r —1

11

n —r

x 11 x + Sx Xi < x for r - 1 of the Xit x < Xi < x + Sx for one Xi, and Xi > x + 6x for the remaining n — r of the Xi. The number of ways in which the n observations can be so divided into three parcels is

n!

-r)!

1 B ( r , n - r + l)'

and each such way has probability Fr~1(x)[F(x + Sx) - F(x)][l - F(x + Sx)]n~rRegarding Sx as small, we have therefore

Pr{x < X(r) < x + Sx}

=

1 B ( r , n - r + 1) •Ff-1(x)f(x)Sx[l

- F(x + Sx)]n~r + O(Sx)2,

where O(Sx)2 means terms of order (Sx)2 and includes the probability of realizations of x < X(r) < x + Sx in which more than one Xi is in (x, x + Sx]. Dividing both sides by Sx and letting Sx —> 0, we again obtain (2.1.6). The distribution of X^ when the sample size is itself a random variable, say N, is easily obtained by conditioning on N = n. For example, specific results when N has a generalized negative binomial, a generalized Poisson, or a generalized logarithmic series distribution are given by Gupta and Gupta (1984). See also Exs. 2.1.8 and 2.1.9.

2.2

JOINT DISTRIBUTION OF TWO OR MORE ORDER STATISTICS

The joint density function of X^ and X^(l < r < s < n) is denoted by /(r)( s )(a;, y). An expression corresponding to (2.1.6) may be derived by noting that the compound event x < X(rj < x + Sx, y < X^
r— 1

s —r —1 x

x + Sx

M

n —s

y\\y + Sy

12

BASIC DISTRIBUTION THEORY

meaning that r — 1 of the observations are less than x, one is in (x, x + 6x], etc. It follows that for x < y n\

--I,

.

—r—1

(2.2.1)

Generalizations are now clear. Thus the joint pdf of -X"(ni) , . . . , -X"(nfc) (1 < n\ < • • • < nfc < n; 1 < k < n) is for x\ < • • • < xk ,

f(x2}---[l-F(xk)]n-n»f(xk).

(2.2.2)

+00, no = 0,nfc+i = n + 1, the RHS may be

If we define XQ = — oo,Xk+i written as

(2.2.3)

n!

In particular, the joint pdf of all n order statistics becomes simply

This result is indeed directly obvious since there are n! equally likely orderings of the Xi, and may be used as the starting point for the derivation of the joint distribution of k order statistics (k < n) in the continuous case. The joint cdf F( r )( s )(x,y) of X(r) and X^ may be obtained by integration of (2.2.1) as well as by a direct argument valid also in the discrete case. We have for x
=

Pr{at least r Xi < x, at least s Xi < y} n

j

exactly i Xi < x, exactly j Xi
n\ (2.2.4) Also for x > y the inequality X
DISTRIBUTION OF THE RANGE

13

We may remark here that a similar argument leads to the joint cdf of X^ and Y(s) when these order statistics stem from n independent observations on the couple (X, Y)— see Ex. 2.2.5. (Note that we no longer require r < s.) A rather different approach is given in Galambos( 1975). The correlation of X(n) andF(n) is studied for the bivariate normal distribution by Bofinger and Bofinger (1965) and for several other distributions by Bofinger (1970). For a general discussion of the ordering of multivariate data see Barnett (1976b).

2.3

DISTRIBUTION OF THE RANGE AND OF OTHER SYSTEMATIC STATISTICS

From the joint pdf of k order statistics we can by standard transformation methods derive the pdf of any well-behaved function of the order statistics. For example, to find the pdf of Wrs = X(s) — X^ we put wrs = y — x in (2.2.1) and note that the transformation from x, y to x, wrs has Jacobian unity in modulus. Thus, writing Crs for the constant in (2.2.1), we have on integrating out over x

oo

Fr-l(x}f(x}[F(x

/ -00 •[1 - F(x + wra)}n-sdx.

+ wra) - F(x}]s-r-1f(x

+ wrs) (2.3.1)

Of special interest is the case r = 1, s = n, when Wrs becomes the range W and (2.3.1) reduces to

oo /

-00

f(x)[F(x + w)- F(x)]n~2f(x

+ w)dx.

(2.3.2)

The cdf of W is somewhat simpler. On interchanging the order of integration we have OO

rW

f(x] I (n- l ) f ( x + w')[F(x + w1) / -oo JO oo f ( x ) [ F ( x + w')-F(x)]n-l\™',=%dx. / -oo /oo f ( x ) [ F ( x + w)- F(x)]n-ldx.

F(x)]n-2dw'dx.

(2.3.3)

-oo

This important result may also be obtained by noting that nf(x)dx[F(x

+ tw) - F(x)]n-1

is the probability given x that one of the Xi falls into the interval (x, x + dx] and all of the n — 1 remaining Xi fall into (x, x + w).