Advanced Linear Modeling

Buy Advanced Linear Modeling




Advanced Linear Modeling: Statistical Learning and Dependent Data

R code for Third Edition

Errata for Third Edition

Data Files for Third Edition

Data Files for Second Edition

Preface to Third Edition, Preface to Second Edition, Preface to First Edition, Table of Contents

Preface to Third Edition

This is the third edition of Advanced Linear Modeling (ALM). It is roughly 50 per cent longer than the previous edition. It discusses the extension of linear models into areas beyond those usually addressed in regression and analysis of variance. As in previous editions, its primary emphasis is on models in which the data display some form of dependence and many of the changes from the previous edition were made to systematize this emphasis on dependent data. Nonetheless, it begins with topics in modern regression analysis related to nonparametric regression and penalized estimation (regularization). R code for the analyses in the book is available at R code for Third Edition

Mathematical background is contained in Appendix A on differentiation and Kronecker products. Also some notation used throughout the book is set in Subsection 1.0.1.

This edition has been written in conjunction with a fifth edition of Christensen (2011), often hereafter referred to as PA. Some discussions that previously appeared in PA have been moved here. Obviously you cannot do advanced linear modeling without previously learning about linear modeling. I have tried to make this book readable to people who have studied linear model theory from sources other than PA, but I need to cite some source for basic results on linear models, so obviously I cite PA. In cases where I need to cite results for which the new version of PA is different from the previous edition(s), the citations are given as PA-V.

I have rearranged the topics from the previous edition of ALM so that the material related to independent data comes first followed by the material on dependent data. The chapter on response surfaces has been dropped but is available in a new volume downloadable from Topics in Experimental Design. Some familiarity with inner products is assumed, especially in Chapters 1 and 3. The required familiarity can be acquired from PA.

Chapter 1 expands the previous introduction to nonparametric regression. The discussion follows what is commonly known as the basis function approach, despite the fact that many of the techniques do not actually involve the use of basis functions per se. In fact, when dealing with spaces of functions the very idea of a basis is subject to competing definitions. Tim Hanson pointed out to me the obvious fact that if a group of functions are linearly independent, they always form a basis for the space that they span, but I think that in nonparametric regression the idea is to approximate wider collections of functions than just these spanning sets. Chapter 1 now also includes a short introduction to models involving an entire function of predictor variables.

Chapter 2 is an expanded version of the discussion of penalized regression from Christensen (2011). A new Chapter 3 extends this by introducing reproducing kernel Hilbert spaces.

Chapter 4 is new except for the last section. It gives results on an extremely general linear model for dependent or heteroscedastic data. It owes an obvious debt to Christensen (2011, Chapter 12). It contains several particularly useful exercises. In a standard course on linear model theory, the theory of estimation and testing for dependent data is typically introduced but not developed, see for example Christensen (2011, Sections 2.7 and 3.8). Section 4.1 of this book reviews, but does not re-prove, those results. The current book then applies those fundamental results to develop theory for a wide variety of practical models.

I finally figured out how, without overwhelming the ideas in abstruse notation, to present MINQUE as linear modeling, so I have done that in Chapter 4. In a technical subsection I give in to the abstruse notation so as to derive the MINQUE equations. Previously I just referred the reader to Rao for the derivation.

Chapter 5 on mixed models originally appeared in PA. It has been shortened in places due of overlap with Chapter 4 but includes several new examples and exercises. It contains a new emphasis on linear covariance structures that leads not only to variance component models but the new Section 5.6 that examines a quite general longitudinal data model. The details of the recovery of interblock information for a balanced incomplete block design from PA no longer seem relevant, so they were relegated, along with the response surface material, to the volume on my website.

Chapters 6 and 7 introduce times series: first the frequency domain which uses models from Chapter 1 but with random effects as in Chapter 5 and then the time domain approach which can be viewed as applications of ideas from the frequency domain.

Chapter 8 on spatial data is little changed from the previous edition. Mostly the references have been updated.

The former chapter on multivariate models has been split in three: Chapter 9 on general theory with a new section relating multivariate models to spatial and time series models and a new discussion of multiple comparisons, Chapter 10 on applications to specific models, and Chapter 11 with an expanded discussion of generalized multivariate linear models (also known as generalized multivariate analysis of variance (GMANOVA) and growth curve models).

Chapters 12 and 14 are updated versions of the previous chapters on discriminant analysis and principal components. Chapter 13 is a new chapter on binary regression and discrimination. Its raison d'etre is that it devotes considerable attention to support vector machines. Chapter 14 contains a new section on classical multidimensional scaling.

From time to time I mention the virtues of Bayesian approaches to problems discussed in the book. One place to look for more information is BIDA, i.e., Christensen et al. (2010).

Thanks to my son Fletcher who is always the first person I ask when I have doubts. Joe Cavanaugh and Mohammad Hattab have been particularly helpful as have Tim Hanson, Wes Johnson, and Ed Bedrick. Finally, my thanks to Al Nosedal-Sanchez, Curt Storlie, and Thomas Lee for letting me modify our joint paper into Chapter 3.

As I have mentioned elsewhere, the large number of references to my other works is as much about sloth as it is ego. In some sense, with the exception of BIDA, all of my books are variations on a theme.

Preface to Second Edition

This is the second edition of Linear Models for Multivariate, Time Series and Spatial Data. It has a new title to indicate that it contains much new material. The primary changes are the addition of two new chapters: one on nonparametric regression and one on response surface maximization. As before, the presentations focus on the linear model aspects of the subject. For example, in the nonparametric regression chapter there is very little about kernal regression estimation but quite a bit about series approximations, splines, and regression trees, all of which can be viewed as linear modeling.

The new edition also includes various smaller changes. Of particular note are a subsection in Chapter 1 on modeling longitudinal (repeated measures) data and a section in Chapter 6 on covariance structures for spatial lattice data. I would like to thank Dale Zimmerman for the suggestion of incorporating material on spatial lattices. Another change is that the subject index is now entirely alphabetical.

Preface to First Edition

This is a companion volume to Plane Answers to Complex Questions: The Theory of Linear Models. It consists of six additional chapters written in the same spirit as the last six chapters of the earlier book. Brief introductions are given to topics related to linear model theory. No attempt is made to give a comprehensive treatment of the topics. Such an effort would be futile. Each chapter is on a topic so broad that an in depth discussion would require a book length treatment.

People need to impose structure on the world in order to understand it. There is a limit to the number of unrelated facts that anyone can remember. If ideas can be put within a broad, sophisticatedly simple structure, not only are they easier to remember but often new insights become available. In fact, sophisticatedly simple models of the world may be the only ones that work. I have often heard Arnold Zellner say that, to the best of his knowledge, this is true in econometrics. The process of modeling is fundamental to understanding the world.

In Statistics, the most widely used models revolve around linear structures. Often the linear structure is exploited in ways that are peculiar to the subject matter. Certainly this is true of frequency domain times series and geostatistics. The purpose of this volume is to take three fundamental ideas from standard linear model theory and exploit their properties in examining multivariate, time series and spatial data. In decreasing order of importance to the presentation, the three ideas are: best linear prediction, projections and Mahalanobis's distance. (Actually, Mahalanobis's distance is a fundamentally multivariate idea that has been appropriated for use in linear models.) Numerous references to results in Plane Answers are made. Nevertheless, I have tried to make this book as independent as possible. Typically, when a result from Plane Answers is needed not only is the reference given but also the result itself. Of course, for proofs of these results the reader will have to refer to the original source.

I want to re-emphasize that this is a book about linear models. It is not traditional multivariate analysis, time series, or geostatistics. Multivariate linear models are viewed as linear models with a nondiagonal covariance matrix. Discriminant analysis is related to the Mahalanobis distance and multivariate analysis of variance. Principle components are best linear predictors. Frequency domain time series involves linear models with a peculiar design matrix. Time domain analysis involves models that are linear in the parameters but have random design matrices. Best linear predictors are used for forecasting time series; they are also fundamental to the estimation techniques used in time domain analysis. Spatial data analysis involves linear models in which the covariance matrix is modeled from the data; a primary objective in analyzing spatial data is making best linear unbiased predictions of future observables. While other approaches to these problems may yield different insights, there is value in having a unified approach to looking at these problems. Developing such a unified approach is the purpose of this book.

There are two well known models with linear structure that are conspicuous by their absence in my two volumes on linear models. One is Cox's (1972) proportional hazards model. The other is the generalized linear model of Nelder and Wedderburn (1972). The proportional hazards methodology is a fundamentally nonparametric technique for dealing with censored data having linear structure. The emphasis on nonparametrics and censored data would make its inclusion here awkward. The interested reader can see Kalbfleisch and Prentice (1980). Generalized linear models allow the extension of linear model ideas to many situations that involve independent nonnormally distributed observations. Beyond the presentation of basic linear model theory, these volumes focus on methods for analyzing correlated observations. While it is true that generalized linear models can be used for some types of correlated data, such applications do not flow from the essential theory. McCullagh and Nelder (1989) give a detailed exposition of generalized linear models and Christensen (1990) contains a short introduction.

Acknowledgements

I would like to thank MINITAB for providing me with a copy of release 6.1.1, BMDP for providing me with copies of their programs 4M, 1T, 2T, and 4V and Dick Lund for providing me with a copy of MSUSTAT. Nearly all of the computations were performed with one of these programs. Many were performed with more than one.

I would not have tackled this project but for Larry Blackwood and Bob Shumway. Together Larry and I reconfirmed, in my mind anyway, that multivariate analysis is just the same old stuff. Bob's book put an end to a specter that has long haunted me: a career full of half-hearted attempts at figuring out basic time series analysis.

At my request, Ed Bedrick, Bert Koopmans, Wes Johnson, Bob Shumway and Dale Zimmerman tried to turn me from the errors of my ways. I sincerely thank them for their valuable efforts. The reader must judge how successful they were with a recalcitrant subject. As always, I must thank my editors Steve Fienberg and Ingram Olkin for their suggestions. Jackie Damrau did an exceptional job in typing the first draft of the manuscript.

Finally, I have to recognize the contribution of Magic Johnson. I was so upset when the 1987-88 Lakers won a second consecutive NBA title that I began writing this book in order to block the mental anguish. I am reminded of Woody Allen's dilemma: is the importance of life more accurately reflected in watching The Sorrow and the Pity or in watching the Knicks? (In my case, the Jazz and the Celtics.) It's a tough call. Perhaps life is about actually making movies and doing statistics.

Table of Contents - Third Edition

Advanced Linear Modeling

Buy Advanced Linear Modeling now!




Web design by Ronald Christensen (2007) and Fletcher Christensen (2008)