“Effect Size” is the new magic number that, it seems, will solve all of our education problems. Ideas with big effect sizes are apparently the new panaceas, and small effect size ideas are panned. However, misuse and misunderstanding of this statistic is causing big problems.

The most commonly-used formula for effect size is:

$latex d=\frac{\bar{x}_{1}-\bar{x}_{2}}{s}$

That’s simply the *observed change in average scores* divided by *the standard deviation (a measure of how spread out the scores are) of the scores*. There are more nuanced and complicated versions of this formula, but they all build on the same principles.

So, what effect size tells you is the number of standard deviations that the scores changed by. Let’s unpack that a bit:

- If the group has, on average, increased its average scores a lot, the effect size will be larger.
- The more the group’s scores are spread out, the smaller the effect size.

So this effect size is, essentially, a very simple measure indeed and is used to give a rough guide to how effective an intervention was. It’s a bit more subtle than just looking at average change, as it notices that a change in scores of five students:

- from 10, 20, 30, 40, 50
- to 20, 30, 40, 50, 60

respectively is less dramatic than a change

- from 18, 19, 20, 21, 22
- to 28, 29, 30, 31, 32

…as in the latter group they were all clustered closely around 20 but changed to being clustered around 30 and even the weakest student outperformed the previously strongest student. However, you can see why both could be said to be as ‘effective’, it’s all in the numbers.

Now, John Hattie (author of teacher effectiveness bible ‘Visible Learning‘) has looked at hundreds of educational studies and concluded that, on average, the significant ones are those that have an Effect Size of more than 0.4. This is a simplifying assumption as otherwise it’s hard to figure out what to pay attention to. Essentially, he studied thousands of interventions and said the ‘average’ Effect Size is 0.4 so we should concentrate on those that are above this level instead of wasting time and effort on those below.

However, despite the huge value of the interventions that Hattie describes, there’s a **big** problem with the way many people use the statistics. Every year a class of students would expect to improve their scores on some standard test (assuming they learn something during that year). When researchers have looked at the average Effect Sizes of these year-on-year changes in the USA from a battery of different standard tests they found the following:

1 – 21.522 – 30.973 – 40.64 – 50.365 – 60.406 – 70.327 – 80.238 – 90.269 – 100.2410 – 110.1911 – 120.1912 – 130.06

Year | Effect Size |
---|---|

1 – 2 | 1.52 |

2 – 3 | 0.97 |

3 – 4 | 0.64 |

4 – 5 | 0.36 |

5 – 6 | 0.40 |

6 – 7 | 0.32 |

7 – 8 | 0.23 |

8 – 9 | 0.26 |

9 – 10 | 0.24 |

10 – 11 | 0.19 |

11 – 12 | 0.19 |

12 – 13 | 0.06 |

What’s going on here!?

There are, unfortunately, three possible explanations:

- School pupils make less improvement each year, and these numbers genuinely reflect a full year of average progress
- As classes progress, their results get more and more spread out (i.e. the slowest-learning pupils’ scores get further and further below the fastest-learning pupils) which makes the standard deviation much, much bigger and therefore Effect Sizes get smaller
- As classes progress, they have less progress to make on these standardised tests. In the first year they go from knowing no answers to a few, whereas in the last year they know nearly everything so there aren’t many more answers they can improve on.

In reality it may well be a combination of these effects, but the important lesson from this is that whereas an Effect Size of 0.4 equates to 100% of one year’s progress from year 5-6, it reflects just over 25% of a year’s progress in year 1-2 and over 600% of a year’s progress in years 12-13!

So, context is King here. Primary school studies may well end up with larger effect sizes than secondary school studies, but it doesn’t necessarily transfer. After all, we don’t know if a 0.4 Effect Size study in year 5 would create one whole year of extra progress in Year 12 (i.e. Effect Size 0.06) or might it stay with Effect Size 0.4 and create over six years of extra progress?

Indeed, just because there was a successful study in one school it doesn’t even follow that another school will have the same issues, same staff, same culture and could expect the same Effect Size. All Effect Sizes do is give you one possible measure of the significance of the result, but really tells you very little about it reproducibility, especially in different contexts.

Be warned…

“All Effect Sizes do is give you one possible measure of the significance of the result”

No. Effect size is not a measure of significance (e.g. how likely that a result is not chance / noise). Effect size (note the word size) is a measure of the magnitude of an effect.

So, for example, you could have a significant effect, but one with a very small effect size; this would mean that the effect is likely to exist (significance), but may not be particular important (magnitude). The importance is of course dependent on context.