![]() Thus, the trace of the hat matrix is n/k. ![]() Then, at each of the n measured points, the weight of the original value on the linear combination that makes up the predicted value is just 1/ k. Note that unlike in the original case, non-integer degrees of freedom are allowed, though the value must usually still be constrained between 0 and n.Ĭonsider, as an example, the k-nearest neighbour smoother, which is the average of the k nearest measured values to the given point. In general the numerator would be the objective function being minimized e.g., if the hat matrix includes an observation covariance matrix, Σ, then becomes. The last approximation above reduces the computational cost from O( n2) to only O( n). For example, if the goal is to estimate error variance, the redf would be defined as tr(( I − H)'( I − H)), and the unbiased estimate is (with ), There are corresponding definitions of residual effective degrees-of-freedom (redf), with H replaced by I − H. I.e., the regression (not residual) degrees of freedom in linear models are "the sum of the sensitivities of the fitted values with respect to the observed response values". In the case of linear regression, the hat matrix H is X( X ' X)−1 X ', and all these definitions reduce to the usual degrees of freedom. Regarding the former, appropriate definitions can include the trace of the hat matrix, tr( H), the trace of the quadratic form of the hat matrix, tr( H'H), the form tr(2 H - H H'), or the Satterthwaite approximation, tr( H'H)2/tr( H'HH'H). Here one can distinguish between regression effective degrees of freedom and residual effective degrees of freedom. The effective degrees of freedom of the fit can be defined in various ways to implement goodness-of-fit tests, cross-validation and other inferential procedures. The distribution is a generalized chi-squared distribution, and the theory associated with this distribution provides an alternative route to the answers provided by an effective degrees of freedom. is not an orthogonal projection), these sums-of-squares no longer have (scaled, non-central) chi-squared distributions, and dimensionally defined degrees-of-freedom are not useful. However, because H does not correspond to an ordinary least-squares fit (i.e. Where is the vector of fitted values at each of the original covariate values from the fitted model, y is the original vector of responses, and H is the hat matrix or, more generally, smoother matrix.įor statistical inference, sums-of-squares can still be formed: the model sum-of-squares is the residual sum-of-squares is. However, these procedures are still linear in the observations, and the fitted values of the regression can be expressed in the form Many regression methods, including ridge regression, linear smoothers and smoothing splines are not based on ordinary least squares projections, but rather on regularized (generalized and/or penalized) least-squares, and so degrees of freedom defined in terms of dimensionality is generally not useful for these procedures.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |