Sums of Squares

1 One-Way Analysis of Variances

We reformulate the theory of elementary One-Way Analysis of Variance in terms of factors:

We have one factor, F say, with k levels, and which has the corresonding linear space LF as we have seen. In the theory of Linear Normal Models it is assumed that the observations y1, ⋯ , yn are from independent normally distributed random variables Y1, ⋯ , Yn which have the same variance but different means: ......Yi ......N ( μi , σ2) ......, and the model for the one-way analysis of variance model is that the means vector , ...μ = ( μ1, ⋯ , μn ) ...is in the space LF, ...or equivalently that ...μ = XF α , ...for some (unspecified) α ∊ IRk.

Once we have established this model we can test hypotheses on it. For example the hypothesis of a uniform mean, i.e. that the factor does not have any effect on the observations. This is equivalent to saying μ ∊ LO , ...the space corresponding to the null factor .

So the design for this example is ...Δ = { I, F, O }. ...To test for a uniform mean we calculated the quantity

ESS ⁄ k 1

RSS ⁄ n k
, where

ESS = || pF y pO y ||2 , ...RSS = || y pF y ||2

 

 

2 Table of Variances

Until now we've looked at Factors simply in terms of their abstract qualities, in terms of mappings between finite sets and the associated linear spaces and projections, concluding in the orthogonal partition of IRn determined by factors in an orthogonal design. So far we haven't considered the observations which are categorized by the factors. This we will now do, by considering some important statistics on a set of observations. For the moment we make no assumptions as to the nature of the variables (distribution, mean etc.) or make any hypotheses about them.

Let y = (y1, ⋯ ,yn) be a set of observations. Let Δ be an orthogonal design on {1, ⋯ ,n}. For any factor F ∊ Δ there is a linear space VF with an orthogonal projection QF (as discussed in Factorial Design )

We define the following quantities:

SSDF ...= ...|| QF y || 2
SSF ...= ...|| PF y || 2

SSDF is known as the sum of squares of the deviations and SSF is the sum of squares of the factor F.
The quantity ...dF = dim VF ...is the degrees of freedom corresponding to the SSD.

dF ...= ...dim VF
|F| ...= ...dim LF

If Δ is an orthogonal design on the set of observations (i.e. the set of all factors under consideration) then we can draw up a table containing the values of the SSD and their corresponding degrees of freedom for all the factors in the design. We will call this the Table of Variances .

3 Deriving SSD Directly

For the time being we concentrate on deriving the table, without reference to its use in estimation and hypothesis testing.

From the theory of 2-sided ANOVA we know that:

SSF ...= ...|| PF y || 2 ...= ...

...
Sf2

nf
f ∊ F

where

Sf ......= ...

...
yi
F(i) = f
, ............nf ......= ...#{ i | F(i) = f }

We know that

QF ...= ...

...
αFG PG
G ≤ F

so

SSDF ...= ...|| QF y || 2 ...= ......yTQF y

............= ...

...
αFG ...yTPG y
G ≤ F
...= ...
...
αFG || PG y || 2
G ≤ F

............= ...

...
αFG SSG
G ≤ F

similarly

dF ...= ...tr(QF) ...= ...tr

...
αFG PG
G ≤ F

...............= ...

...
αFG tr(PG)
G ≤ F
...= ...
...
αFG |G|
G ≤ F

SSDF ...= ...
...
αFG SSG
G ≤ F

dF ...= ...
...
αFG |G|
G ≤ F

4 Deriving SSD Recursively

The above formulae are explicit but not very useful for computation, as a priori we do not know the values of the αFG. To find a more useful algorithm we use the formula for PF in terms of QG previously derived, we have:

SSF ...= ...|| PF y || 2 ...= ...|| ∑G ≤ F QG y || 2

...............= ...G ≤ F || QG y || 2 ...= ...G ≤ F SSDG

SSF ...= ...
...
SSDG
G ≤ F

SSDF ...= ...SSF
...
SSDG
G < F

Similarly

|F| ...= ...dim LF ...= ...G ≤ F dim VG ...= ...G ≤ F dG

|F| ...= ...
...
dG
G ≤ F

dF ...= ...|F| –
...
dG
G < F

With the above formulae it is possible to work recursively from the coarsest factor, C say, (which will be the null factor in a balanced design), since SSD C = SS C , ...and ...d C = |C|.