Advanced Linear Modeling: Statistical Learning and Dependent Data

This is the third edition of Advanced Linear Modeling (ALM). It is roughly 50 per cent longer than the previous edition. It discusses the extension of linear models into areas beyond those usually addressed in regression and analysis of variance. As in previous editions, its primary emphasis is on models in which the data display some form of dependence and many of the changes from the previous edition were made to systematize this emphasis on dependent data. Nonetheless, it begins with topics in modern regression analysis related to nonparametric regression and penalized estimation (regularization). R code for the analyses in the book is available at R code for Third Edition

Mathematical background is contained in Appendix A on differentiation and Kronecker products. Also some notation used throughout the book is set in Subsection 1.0.1.

This edition has been written in conjunction with a fifth edition of Christensen (2011), often hereafter referred to as PA. Some discussions that previously appeared in PA have been moved here. Obviously you cannot do advanced linear modeling without previously learning about linear modeling. I have tried to make this book readable to people who have studied linear model theory from sources other than PA, but I need to cite some source for basic results on linear models, so obviously I cite PA. In cases where I need to cite results for which the new version of PA is different from the previous edition(s), the citations are given as PA-V.

I have rearranged the topics from the previous edition of ALM so that the material related to independent data comes first followed by the material on dependent data. The chapter on response surfaces has been dropped but is available in a new volume downloadable from Topics in Experimental Design. Some familiarity with inner products is assumed, especially in Chapters 1 and 3. The required familiarity can be acquired from PA.

Chapter 1 expands the previous introduction to nonparametric regression. The discussion follows what is commonly known as the basis function approach, despite the fact that many of the techniques do not actually involve the use of basis functions per se. In fact, when dealing with spaces of functions the very idea of a basis is subject to competing definitions. Tim Hanson pointed out to me the obvious fact that if a group of functions are linearly independent, they always form a basis for the space that they span, but I think that in nonparametric regression the idea is to approximate wider collections of functions than just these spanning sets. Chapter 1 now also includes a short introduction to models involving an entire function of predictor variables.

Chapter 2 is an expanded version of the discussion of penalized regression from Christensen (2011). A new Chapter 3 extends this by introducing reproducing kernel Hilbert spaces.

Chapter 4 is new except for the last section. It gives results on an extremely general linear model for dependent or heteroscedastic data. It owes an obvious debt to Christensen (2011, Chapter 12). It contains several particularly useful exercises. In a standard course on linear model theory, the theory of estimation and testing for dependent data is typically introduced but not developed, see for example Christensen (2011, Sections 2.7 and 3.8). Section 4.1 of this book reviews, but does not re-prove, those results. The current book then applies those fundamental results to develop theory for a wide variety of practical models.

I finally figured out how, without overwhelming the ideas in abstruse notation, to present MINQUE as linear modeling, so I have done that in Chapter 4. In a technical subsection I give in to the abstruse notation so as to derive the MINQUE equations. Previously I just referred the reader to Rao for the derivation.

Chapter 5 on mixed models originally appeared in PA. It has been shortened in places due of overlap with Chapter 4 but includes several new examples and exercises. It contains a new emphasis on linear covariance structures that leads not only to variance component models but the new Section 5.6 that examines a quite general longitudinal data model. The details of the recovery of interblock information for a balanced incomplete block design from PA no longer seem relevant, so they were relegated, along with the response surface material, to the volume on my website.

Chapters 6 and 7 introduce times series: first the frequency domain which uses models from Chapter 1 but with random effects as in Chapter 5 and then the time domain approach which can be viewed as applications of ideas from the frequency domain.

Chapter 8 on spatial data is little changed from the previous edition. Mostly the references have been updated.

The former chapter on multivariate models has been split in three: Chapter 9 on general theory with a new section relating multivariate models to spatial and time series models and a new discussion of multiple comparisons, Chapter 10 on applications to specific models, and Chapter 11 with an expanded discussion of generalized multivariate linear models (also known as generalized multivariate analysis of variance (GMANOVA) and growth curve models).

Chapters 12 and 14 are updated versions of the previous chapters on discriminant analysis and principal components. Chapter 13 is a new chapter on binary regression and discrimination. Its raison d'etre is that it devotes considerable attention to support vector machines. Chapter 14 contains a new section on classical multidimensional scaling.

From time to time I mention the virtues of Bayesian approaches to problems discussed in the book. One place to look for more information is BIDA, i.e., Christensen et al. (2010).

Thanks to my son Fletcher who is always the first person I ask when I have doubts. Joe Cavanaugh and Mohammad Hattab have been particularly helpful as have Tim Hanson, Wes Johnson, and Ed Bedrick. Finally, my thanks to Al Nosedal-Sanchez, Curt Storlie, and Thomas Lee for letting me modify our joint paper into Chapter 3.

As I have mentioned elsewhere, the large number of references to my other works is as much about sloth as it is ego. In some sense, with the exception of BIDA, all of my books are variations on a theme.

Preface to Second Edition

This is the second edition of Linear Models for Multivariate, Time Series and Spatial Data. It has a new title to indicate that it contains much new material. The primary changes are the addition of two new chapters: one on nonparametric regression and one on response surface maximization. As before, the presentations focus on the linear model aspects of the subject. For example, in the nonparametric regression chapter there is very little about kernal regression estimation but quite a bit about series approximations, splines, and regression trees, all of which can be viewed as linear modeling.

The new edition also includes various smaller changes. Of particular note are a subsection in Chapter 1 on modeling longitudinal (repeated measures) data and a section in Chapter 6 on covariance structures for spatial lattice data. I would like to thank Dale Zimmerman for the suggestion of incorporating material on spatial lattices. Another change is that the subject index is now entirely alphabetical.

Preface to First Edition

This is a companion volume to Plane Answers to Complex Questions: The Theory of Linear Models. It consists of six additional chapters written in the same spirit as the last six chapters of the earlier book. Brief introductions are given to topics related to linear model theory. No attempt is made to give a comprehensive treatment of the topics. Such an effort would be futile. Each chapter is on a topic so broad that an in depth discussion would require a book length treatment.

People need to impose structure on the world in order to understand it. There is a limit to the number of unrelated facts that anyone can remember. If ideas can be put within a broad, sophisticatedly simple structure, not only are they easier to remember but often new insights become available. In fact, sophisticatedly simple models of the world may be the only ones that work. I have often heard Arnold Zellner say that, to the best of his knowledge, this is true in econometrics. The process of modeling is fundamental to understanding the world.

In Statistics, the most widely used models revolve around linear structures. Often the linear structure is exploited in ways that are peculiar to the subject matter. Certainly this is true of frequency domain times series and geostatistics. The purpose of this volume is to take three fundamental ideas from standard linear model theory and exploit their properties in examining multivariate, time series and spatial data. In decreasing order of importance to the presentation, the three ideas are: best linear prediction, projections and Mahalanobis's distance. (Actually, Mahalanobis's distance is a fundamentally multivariate idea that has been appropriated for use in linear models.) Numerous references to results in Plane Answers are made. Nevertheless, I have tried to make this book as independent as possible. Typically, when a result from Plane Answers is needed not only is the reference given but also the result itself. Of course, for proofs of these results the reader will have to refer to the original source.

I want to re-emphasize that this is a book about linear models. It is not traditional multivariate analysis, time series, or geostatistics. Multivariate linear models are viewed as linear models with a nondiagonal covariance matrix. Discriminant analysis is related to the Mahalanobis distance and multivariate analysis of variance. Principle components are best linear predictors. Frequency domain time series involves linear models with a peculiar design matrix. Time domain analysis involves models that are linear in the parameters but have random design matrices. Best linear predictors are used for forecasting time series; they are also fundamental to the estimation techniques used in time domain analysis. Spatial data analysis involves linear models in which the covariance matrix is modeled from the data; a primary objective in analyzing spatial data is making best linear unbiased predictions of future observables. While other approaches to these problems may yield different insights, there is value in having a unified approach to looking at these problems. Developing such a unified approach is the purpose of this book.

There are two well known models with linear structure that are conspicuous by their absence in my two volumes on linear models. One is Cox's (1972) proportional hazards model. The other is the generalized linear model of Nelder and Wedderburn (1972). The proportional hazards methodology is a fundamentally nonparametric technique for dealing with censored data having linear structure. The emphasis on nonparametrics and censored data would make its inclusion here awkward. The interested reader can see Kalbfleisch and Prentice (1980). Generalized linear models allow the extension of linear model ideas to many situations that involve independent nonnormally distributed observations. Beyond the presentation of basic linear model theory, these volumes focus on methods for analyzing correlated observations. While it is true that generalized linear models can be used for some types of correlated data, such applications do not flow from the essential theory. McCullagh and Nelder (1989) give a detailed exposition of generalized linear models and Christensen (1990) contains a short introduction.

Acknowledgements

I would like to thank MINITAB for providing me with a copy of release 6.1.1, BMDP for providing me with copies of their programs 4M, 1T, 2T, and 4V and Dick Lund for providing me with a copy of MSUSTAT. Nearly all of the computations were performed with one of these programs. Many were performed with more than one.

I would not have tackled this project but for Larry Blackwood and Bob Shumway. Together Larry and I reconfirmed, in my mind anyway, that multivariate analysis is just the same old stuff. Bob's book put an end to a specter that has long haunted me: a career full of half-hearted attempts at figuring out basic time series analysis.

At my request, Ed Bedrick, Bert Koopmans, Wes Johnson, Bob Shumway and Dale Zimmerman tried to turn me from the errors of my ways. I sincerely thank them for their valuable efforts. The reader must judge how successful they were with a recalcitrant subject. As always, I must thank my editors Steve Fienberg and Ingram Olkin for their suggestions. Jackie Damrau did an exceptional job in typing the first draft of the manuscript.

Finally, I have to recognize the contribution of Magic Johnson. I was so upset when the 1987-88 Lakers won a second consecutive NBA title that I began writing this book in order to block the mental anguish. I am reminded of Woody Allen's dilemma: is the importance of life more accurately reflected in watching The Sorrow and the Pity or in watching the Knicks? (In my case, the Jazz and the Celtics.) It's a tough call. Perhaps life is about actually making movies and doing statistics.

Table of Contents - Third Edition

Prefaces

Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition

1 Nonparametric Regression

1.1 Basic Notation
1.2 Linear Approximations
1.3 Simple Nonparametric Regression
1.4 Estimation

1.4.1 Polynomials
1.4.2 Cosines
1.4.3 Haar wavelets
1.4.4 Cubic splines
1.4.5 Orthonormal Series Estimation

1.5 Variable Selection
1.6 Heteroscedastic Simple Nonparametric Regression
1.7 Approximating-functions with Small Support

1.7.1 Polynomial Splines

1.7.1.1 B-splines
1.7.1.2 Equivalence of spline methods*

1.7.2 Fitting local functions
1.7.3 Local Regression

1.8 Nonparametric Multiple Regression

1.8.1 Redefining \phi and the Curse of Dimensionality
1.8.2 Reproducing Kernel Hilbert Space Regression

1.9 Testing Lack of Fit in Linear Models
1.10 Regression Trees
1.11 Regression on Functional Predictors
1.12 Density Estimation
1.13 Exercises

2 Penalized Estimation

2.1 Introduction

2.1.1 Reparameterization and RKHS Regression: It's All About the Penalty
2.1.2 Nonparametric Regression

2.2 Ridge Regression

2.2.1 Generalized Ridge Regression
2.2.2 Picking k
2.2.3 Nonparametric Regression

2.3 Lasso Regression
2.4 Bayesian Connections
2.5 Another Approach

2.5.1 Geometry

2.5.1.1 More lasso geometry

2.5.2 Equivalence of approaches

2.6 Two Other Penalty Functions

3 Reproducing Kernel Hilbert Spaces

3.1 Introduction

3.1.1 Interpolating Splines

3.2 Banach and Hilbert Spaces

3.2.1 Banach Spaces
3.2.2 Hilbert Spaces

3.3 Reproducing Kernel Hilbert Spaces

3.3.1 The projection principle for an RKHS

3.4 Two Approaches

3.4.1 Testing Lack of Fit

3.5 Penalized Regression with RKHSs

3.5.1 Ridge and Lasso Regression
3.5.2 Smoothing Splines .
3.5.3 Solving the General Penalized Regression Problem
3.5.4 General Solution Applied to Ridge Regression
3.5.5 General Solution Applied to Cubic Smoothing Splines

3.6 Choosing the Degree of Smoothness

4 Covariance Parameter Estimation

4.1 Introduction and Review

4.1.1 Estimation of \beta
4.1.2 Testing

4.1.2.1 Improved Tests: Satterthwaite and Kenward-Roger

4.1.3 Prediction
4.1.4 Quadratic Estimation of \theta

4.2 Maximum Likelihood

4.2.1 Generalized Likelihood Ratio Tests

4.3 Restricted Maximum Likelihood Estimation
4.4 Linear Covariance Structures
4.5 MINQUE

4.5.1 Deriving the MINQUE Equations

4.6 MIVQUE
4.7 The Effect of Estimated Covariances

4.7.1 Mathematical Results*

5 Mixed Models and Variance Components

5.1 Mixed Models
5.2 Mixed Model Equations
5.3 Equivalence of Random Effects and Ridge Regression
5.4 Partitioning and Linear Covariance Structures
5.5 Variance component models

5.5.1 Variance component estimation

5.6 A Longitudinal Model
5.7 Henderson's Method 3

5.7.1 Additional Estimates

5.8 Exact F Tests for Variance Components

5.8.1 Wald's test
5.8.2 Ofversten's second method
5.8.3 Comparison of tests

6 Frequency Analysis of Time Series

6.1 Stationary Processes
6.2 Basic Data Analysis
6.3 Spectral Approximation of Stationary Time Series
6.4 The Random Effects Model
6.5 The White Noise Model

6.5.1 The Reduced Model: Estimation
6.5.2 The Reduced Model: Prediction

6.6 Linear Filtering

6.6.1 Recursive filters

6.7 The Coherence of Two Time Series
6.8 Fourier Analysis
6.9 Additional Exercises

7 Time Domain Analysis

7.1 Correlations and Prediction

7.1.1 Partial Correlation and Best Linear Prediction
7.1.2 The Durbin-Levinson Algorithm
7.1.3 Innovations Algorithm

7.2 Time Domain Models

7.2.1 Autoregressive Models: AR(p)s
7.2.2 Moving Average Models: MA(q)s
7.2.3 Autoregressive Moving Average Models: ARMA(p,q)s
7.2.4 Autoregressive Integrated Moving Average Models: ARIMA(p,d,q)s

7.3 Time Domain Prediction

7.3.1 Conditioning on Y_\infty

7.4 Nonlinear Least Squares

7.4.1 The Gauss--Newton Algorithm
7.4.2 Nonlinear Regression

7.5 Estimation

7.5.1 Correlations
7.5.2 Conditional Estimation for AR(p) Models
7.5.3 Conditional Least Squares for ARMA(p,q)s
7.5.4 Conditional MLEs for ARMA(p,q)s
7.5.5 Unconditional Estimation for ARMA(p,q) Models
7.5.6 Estimation for ARIMA(p,d,q) Models

7.6 Model Selection

7.6.1 Box--Jenkins

7.6.1.1 Testing Model Fit

7.6.2 Model Selection Criteria
7.6.3 An Example

7.7 Seasonal Adjustment
7.8 The Multivariate State-Space Model and the Kalman Filter

7.8.1 The Kalman Filter
7.8.2 Parameter Estimation
7.8.3 Missing Values

7.9 Additional Exercises

8 Linear Models for Spatial Data: Kriging

8.1 Modeling Spatial Data

8.1.1 Stationarity

8.2 Best Linear Unbiased Prediction of Spatial Data: Kriging

8.2.1 Block Kriging
8.2.2 Gaussian Process Regression

8.3 Prediction Based on the Semivariogram: Geostatistical Kriging
8.4 Measurement Error and the Nugget Effect
8.5 The Effect of Estimated Covariances on Prediction
8.6 Models for Covariance Functions and Semivariograms

8.6.1 The Linear Covariance Model
8.6.2 Nonlinear Isotropic Covariance Models
8.6.3 Modeling Anisotropic Covariance Functions
8.6.4 Nonlinear Semivariograms

8.7 Models for Spatial Lattice Data

8.7.1 Spatial Covariance Selection Models
8.7.2 Spatial Autoregression Models
8.7.3 Spatial Autoregressive Moving Average Models

8.8 Estimation of Covariance Functions and Semivariograms

8.8.1 Nonlinear covariance functions
8.8.2 Linear Covariance Functions
8.8.3 Traditional Geostatistical Estimation

9 Multivariate Linear Models: General

9.1 The Univariate Model
9.2 BLUEs
9.3 Unbiased Estimation of \Sigma
9.4 Maximum Likelihood Estimates
9.5 Hypotheses and Statistics
9.6 Test Statistics

9.6.1 Equivalence of test statistics

9.7 Prediction and Confidence Regions
9.8 Multiple Testing Methods
9.9 Multivariate Time Series and Spatial Models

9.9.1 A Tensor Model

9.10 Additional Exercises

10 Multivariate Linear Models: Applications

10.1 One-Sample Problems
10.2 Two-Sample Problems
10.3 One-Way Analysis of Variance and Profile Analysis

10.3.1 Profile analysis
10.3.2 Comparison with split plot analysis
10.3.3 Computations
10.3.4 Covariance matrix modeling

10.4 Growth Curves for One-way MANOVA
10.5 Testing for Additional Information
10.6 Additional Exercises

11 Generalized Multivariate Linear Models and Longitudinal Data

11.1 Generalized Multivariate Linear Models
11.2 Generalized least squares analysis
11.3 MACOVA analysis
11.4 Rao's Simple Covariance Structur

11.4.1 Reduced models in Z
11.4.2 Unbiased Covariance Parameter Estimation
11.4.3 Testing the SCS Assumption

11.5 Longitudinal Data

11.5.1 Full data
11.5.2 Incomplete data

11.6 Functional Data Analysis
11.7 Generalized Split Plot Models

11.7.1 GMLMs are GSP models

11.8 Additional Exercises

12 Discrimination and Allocation

12.1 The General Allocation Problem

12.1.1 Mahalanobis's distance
12.1.2 Maximum likelihood
12.1.3 Bayesian methods

12.2 Estimated Allocation and QDA
12.3 Linear Discrimination Analysis: LDA
12.4 Cross-validation
12.5 Discussion
12.6 Stepwise LDA
12.7 Linear Discrimination Coordinates

12.7.1 Finding Linear Discrimination Coordinates
12.7.2 Using linear discrimination coordinates
12.7.3 Relationship to Mahalanobis distance allocation
12.7.4 Alternate choice of linear discrimination coordinates

12.8 Linear Discrimination
12.9 Additional Exercises

13 Binary Discrimination and Regression

13.1 Binomial Regression

13.1.1 Data Augmentation Ridge Regression

13.2 Binary Prediction
13.3 Binary Generalized Linear Model Estimation
13.4 Linear Prediction Rules

13.4.1 Loss Functions
13.4.2 Least Squares Binary Prediction

13.5 Support Vector Machines

13.5.1 Probability Estimation
13.5.2 Parameter Estimation

13.5.2.1 The Kernel Trick

13.5.3 Advantages of SVMs
13.5.4 Separating Hyper-Hogwash

13.6 Best Prediction and Probability Estimation
13.7 Binary Discrimination

14 Principal Components, Classical Multidimensional Scaling, and Factor Analysis

14.1 The Theory of Principal Components

14.1.1 Sequential Prediction
14.1.2 Joint Prediction
14.1.3 Other Derivations of Principal Components
14.1.4 Principal Components Based on the Correlation Matrix

14.2 Sample Principal Components

14.2.1 The Sample Prediction Error
14.2.2 Using Principal Components

14.3 Classical Multidimensional Scaling
14.4 Factor Analysis

14.4.1 Additional Terminology and Applications
14.4.2 Maximum Likelihood Theory
14.4.3 Principal Factor Estimation
14.4.4 Computing
14.4.5 Discussion

14.5 Additional Exercises

Appendix A: Mathematical Background

A.1 Differentiation {545
A.2 Vec operators and Kronecker Products
A.3 Quadratic Optimization

A.3.1 Application to Support Vector Machines

A.3.1.1 Computing \beta_*
A.3.1.2 The Dual Criterion
A.3.1.3 Computing \beta _0

Appendix B: Best Linear Predictors

B.1 Properties of Best Linear Predictors
B.2 Irrelevance of Units in Best Multivariate Prediction

Appendix C: Residual Maximum Likelihood

C.1 Maximum Likelihood Estimation for Singular Normal Distributions
C.2 Residual Maximum Likelihood Estimation

References
Index
Author Index

Buy Advanced Linear Modeling now!

Web design by Ronald Christensen (2007) and Fletcher Christensen (2008)

Advanced Linear Modeling: Statistical Learning and Dependent Data

Preface to Third Edition

Preface to Second Edition

Preface to First Edition

Table of Contents - Third Edition