Wednesday, 18 April 2012

Ad Hoc Analist

Chris Grayling put out a press release yesterday. He also had journalists briefed about a speech he is due to make to the think-tank Policy Exchange. The Guardian's Patrick Wintour kindly obliged with a 'report' essentially saying everything Grayling wanted it to. Yay for free publicity and Wintour being a D minus hack applied absolutely no balance or scrutiny. Grayling is using research from a DWP ad hoc analysis paper(the kind used to get around quality control standards that all official statistics and research must pass) to argue that the work experience scheme introduced last year(the one where there was a lot of hoo-hah a while ago and some Trotskyists achieved precisely nothing for some publicity whilst distracting from the hard work of real welfare campaigners), is effective and beneficial. Grayling's take on it is both stupid and dishonest.
The research done took two groups of young people; one went on work placement and the other didn't and afterward 16% more who had gone on work placements 'off work' than the group that didn't. First alarm bell:
The results, which have been reviewed by independent experts...
Ah yes, so it's been audited and passed the quality controls of the National Statistics Authority, so we have a rare example of figures being used by a DWP minister which have been properly...
.. the National Institute of Economic and Social Research
But- they aren't the auditors of official statistics and research. So Grayling is again trying to get around proper vetting and this time pretending that his garbage has been properly checked. Yes, it's an ad hoc analysis paper.

Second concern is the numbers themselves and methodology, the first sample group was of  3,490 people aged between 18-24. The press release fudges whether these are JSA claimants or just generally claiming out of work benefits. The focus seems to be on JSA. However, the sample is not randomised; it is of those who were first onto the work experience scheme when it started. Ok, so how do you make sure your control sample is anything more than a fig leaf? You can't. This isn't a random sample of participants but the most motivated, able and work-ready group of people looking for work. No effort is made to distinguish their reasons for being the first work experience participants from their participation itself.

So we end up with a control group that is picked based on what the researchers think would be an equivalent sample, except that they of course are not because for what ever reason, they didn't take up the work experience when it first started. It's blatant cherry-picking. By the way, the control group just so happened to have 378,210 people in it. The dice just keep getting heavier on the one side.
When some solid figures are presented to give an indication of how 'similar' the two groups are, I'm left wondering how much adding up has been done. Reading through the tables, I do add them up.

The control group has 1% more disabled people, 1% more ethnic minorities, 1% more low qualified, 1% more lone parents but the sample group figure is predictably round to 0% overall and there is 1% greater local authority unemployment in the control group sample. The mean number of weeks which the sample group have been on JSA is 29 but for the control group it's 27. So the sample group have on average been looking for work longer, something which is not controlled for in the research, nor are any of the other factors which add up to why the sample group was 16% more successful. The control group had on average been on benefits for a week less than the sample group.

With the hints the table gives me, there is every reason to see that the control group has a substantial number of people who are not work-ready, have been out of work for less time but were in work for longer prior to becoming unemployed and have actually been on plenty of DWP programmes that simply haven't helped. This research paper is comparing a handful of the best and most able with hundreds of thousands that have a significant number of disadvantaged. No wonder Grayling loves it but needed to find an 'independent' reviewer other than the official ombudsman. The paper itself gives it's own spin on the figures, insisting that the sample group are the disadvantaged.

Having failed to randomise the samples properly, the researchers instead opt to use a tool called the Propensity Score Matching to address the overt selection bias, but I don't know enough about this to assess how appropriate it is for this. But the fact that they opted for this when randomisation would have been so much simpler and accurate suggests the choice is deliberate. All the assumptions the paper details stretch reality and reinforce the notion that the sample group is disadvantaged over the control group. At this point I am convinced that had this DWP research paper been published through the proper channels and not the ones ministers use to generate headlines or 'facts to fit the opinion', it would not have passed official quality control standards.
 Finally, I went looking for the figure Grayling was using from the paper. First though is the bit he's leaving out, despite having made all the assumptions in favour of the result they wanted the researchers found:
In the first 8 weeks after starting, participants were more likely to be on benefit than non-participants. Since the period of WE placements is usually 2-8 weeks this is likely to reflect a ‘lock-in’ period when participants were engaged in WE, which reduced the time spent on job search activity. This effect is often seen in employment programmes.
So, 46% of work experience participants leave benefits compared with 40% for non-participants. No wonder they opt to use the figure-twisting '16% more' figure rather than have these figures side by side. So where do the six extra percentage points go for non-participants? Is it really the case that work experience gets an increase of six points? Well, I counted remember: 1% more disabled, 1% more lone parents, 1% more ethnic minority etc. Even with overlap, it doesn't take much to explain the difference which would have been controlled for by randomisation much more adequately than the statistical tool that was used. Numerically similar sized sample and control groups wouldn't have hurt either.

They however have some weasel words to address the matter of the sample group being the first work experience participants, an essentially non-randomised sample:
There is no evidence of a decline in impacts so we would expect these impacts to persist for a much longer time. In particular, impacts from a smaller earlier cohort showed that the impacts persisted at similar levels until 30 weeks. However, it is really too early to speculate on how long the impact might continue in order to estimate the cost-effectiveness of the programme.
Well that doesn't appear to have stopped Chris Grayling. I have to wonder if the researchers knew what he was going to be doing with their work.
Some sensitivity tests were performed using different cohorts and sub-groups. We found that the impact estimates were largely insensitive to each of these alternative implementations. This provides increased confidence that the methodology was robust and that the findings are not biased by the definition of the chosen participant and non-participant samples.
Nevertheless, the analysis is complex and caution should be applied to the results, least of all because this is a first impact analysis, based on a small cohort of starts from the early months of WE.
To their credit the authors go on to list things that need to be considered and why care should be taken in interpreting their results. Not that Coalition ministers are given to reading those bits.

4 comments:

  1. That deserves more than "0 Comments" although I'm working from the position of having been at school with these types, and unfortunately feel that nothing short of capsicum in their Astroglide will get through to them.

    But thank you for taking the time nevertheless.

    ReplyDelete
  2. I have at times undertaken research,and have come across PSM-

    "Over the past 25 years, evaluators of social programs have searched for nonexperimental methods that can substitute effectively for experimental ones. Recently, the spotlight has focused on one method, propensity score matching (PSM), as the suggested approach for evaluating employment and education programs. We present a case study of our experience using PSM, under seemingly ideal circumstances, for the evaluation of the State Partnership Initiative employment promotion program. Despite ideal conditions and the passing of statistical tests suggesting that the matching procedure had worked, we find that PSM produced incorrect impact estimates when compared with a randomized design. Based on this experience, we caution practitioners about the risks of implementing PSM-based designs."

    http://amstat.tandfonline.com/doi/abs/10.1198/000313008X332016

    It's rollocks ,particularly applied to such a small sample and timescale.

    ReplyDelete
  3. http://statistics.dwp.gov.uk/asd/index.php?page=adhoc_analysis

    ReplyDelete
  4. You see Mason, this is why I love you.

    I read the study and just came away with a vague sense of "none of that seems quite right". I picked up the 40% V 46% somehow being spun into 16%.v

    I'm never confident enough of the vague instinct to write about it though, which is what makes me so glad that you, Declan and Sam Barnett-Cormack are out there ;)

    ReplyDelete