Improving the value of policy evaluation Nicholas Mays Professor of - - PowerPoint PPT Presentation

improving the value of policy evaluation
SMART_READER_LITE
LIVE PREVIEW

Improving the value of policy evaluation Nicholas Mays Professor of - - PowerPoint PPT Presentation

Improving the value of policy evaluation Nicholas Mays Professor of Health Policy Department of Health Services Research and Professor of Health Policy, Department of Health Services Research and Policy Treasury guest lecture, 29 January 2013


slide-1
SLIDE 1

Improving the value of policy evaluation

Nicholas Mays

Professor of Health Policy Department of Health Services Research and Professor of Health Policy, Department of Health Services Research and Policy

Treasury guest lecture, 29 January 2013 Treasury guest lecture, 29 January 2013

Improving health worldw ide p g

w w w.lshtm.ac.uk

slide-2
SLIDE 2

Outline Outline

  • Focus of presentation

Focus of presentation

  • Meaning of ‘value’ of policy evaluation
  • Perceived problems with evaluation in public policy

p p p y

  • Current proposals for increasing the ‘value’ of policy evaluation

– Supply – Demand – Interactive

  • Critique of proposals
  • Conclusions and questions for discussion
slide-3
SLIDE 3

Scope Scope

( )

  • External evaluation (mainly)
  • Ex ante and ex post evaluation

E l ti i l i d t ll ti d d l i

  • Evaluation involving new data collection and secondary analysis
  • UK, NZ (mainly)
  • National level social policy
  • National level social policy
  • Not discussing technical quality of evaluations or dissemination and

knowledge transfer g

  • Partially drawing on recent experience of trying to improve the

‘value’ of policy evaluation through directing a government‐funded h l d ( k/) research unit in England (www.piru.ac.uk/)

slide-4
SLIDE 4

‘Value’ of ‘evaluation’ Value of evaluation

  • ‘Evaluation’:

– ‘systematic assessment of the operation and/or the

  • utcomes of a programme or policy’ (Weiss, 1998, p4)
  • Currently thinking of ‘value’ as:

– ‘better potential to use’

  • As much a result of policy processes, norms and

p y p , expectations as a feature of evaluation itself

slide-5
SLIDE 5

Perceived problems with policy evaluation

  • Lack of understanding of the potential benefits of evaluation
  • Lack of understanding of the potential benefits of evaluation

– even denigration of evaluation in some quarters as too late, irrelevant, unreliable, incomprehensible, embarrassing ...

  • Lack of commissioning of, consideration for the needs of, and use of,

evaluations ex ante and ex post

  • Misuse or inappropriate use of evaluations

– lack of research/statistical skills

S t th th h lth l i b hi d i hi ti ti il bilit d

  • Sectors other than health lagging behind in sophistication, availability and use
  • f evaluation

– all sectors should be aspiring to be ‘evidence‐informed’ if not ‘evidence‐based’ policy

  • Typically the concern of researchers rather than decision makers though

Typically the concern of researchers rather than decision makers though periodic waves of interest from the latter

– 1960s, late 1990s and currently in UK/England (associated with wider civil service critique and reform plans from Government) – series of reports/initiatives in NZ (Review of the Centre 2001; MfO 2003; SPEaR; Better Public – series of reports/initiatives in NZ (Review of the Centre, 2001; MfO, 2003; SPEaR; Better Public Services Advisory Group, 2011), though more about outcomes & targets across government – current focus on evaluation justified by a context of financial stringency (Obama, UK Alliance for Useful Evidence, French Experimentation Fund for Youth)

slide-6
SLIDE 6

One topical call for the ‘good use’ of evidence (and one of the more sophisticated!)

‘It is exceptionally rare for scientific evidence to mandate a single solution to a policy problem; rather it informs the range of solutions that might be feasible and predicts what the outcome of each is that might be feasible and predicts what the outcome of each is most likely to be. (p54) .... Ministers aren’t obliged to make every decision according to the evidence presented to them by scientists d thi l Th h ld h th t th d t k and nothing else. They should, however, ensure that they do take scientific advice on questions to which it is most pertinent ... Above all, politicians and civil servants should not be allowed to get away , p g y with laying claim to evidence‐based policy when decisions have actually been taken by other means.’ (p56) d ( ) h k f h Henderson M. (2012) The geek manifesto –why science matters.

slide-7
SLIDE 7

Recent reports arguing for change abound

slide-8
SLIDE 8

Supply side proposals (research community) Supply side proposals (research community)

  • More ex ante evaluation

– More modelling and ex ante assessment (answering today’s problems, today) – Greater willingness to use ‘best possible’ rather than ‘definitive’ evidence – Greater willingness to engage with policy decision makers and advise g g g p y

  • More agile ex post evaluation

– e.g. more use of large, long‐term observational administrative databases

  • More attention to setting out the ‘logic’ underlying policies

More attention to setting out the logic underlying policies

– to assist with evaluation design and explaining how and why policies might ‘work’ – working with policy makers, to help clarify the intervention, population, intended

  • utcomes and processes, and data needed for improved evaluation
  • utcomes and processes, and data needed for improved evaluation
  • Change to reward and recognition systems in universities

– more weight to ‘impact’; less to grants, original research and peer reviewed outputs

slide-9
SLIDE 9

Demand side proposals: largely ex ante Demand side proposals: largely ex ante

  • Raise the ‘absorptive capacity’ of policy organisations by giving them

– more permeable boundaries – stronger external networks (e.g. with research community) – stronger ‘organisational memory’ i ti i di id l li ffi i l ( th t th l ti i th i k) – incentives on individual policy officials (e.g. that they use evaluation in their work) – access to ‘foresight’ offered by research community

  • More ‘open’ (transparent), rigorous policy development processes

early engagement of policy advisers with researchers at ideas stage pre policy – early engagement of policy advisers with researchers at ideas stage, pre‐policy – ‘Red teaming’ (confidential process within government) – citizens’ juries (public process outside government) – a more level playing field between interests to allow a wider range of ‘voices’ and a more level playing field between interests to allow a wider range of voices and evidence to be considered – clearer articulation of ‘facts’, ‘values’ and ‘interests’ upfront so it is clearer where evidence is and should be used requirement on government to explain its reasons for decisions especially when – requirement on government to explain its reasons for decisions especially when contrary to scientific advice and to represent the evidence accurately

slide-10
SLIDE 10

Demand side proposals: ex ante and ex post Demand side proposals: ex ante and ex post

  • Variants on a ‘NICE’ beyond new drugs and devices

ll h lth d i l li t t t d t i l – e.g. across all health and social policy, to test and trial approaches to ‘what works’ (Civil Service Reform Plan, 2012) – e.g. an ‘independent’ body linking supply and demand for evaluation (Puttick, 2012) – e.g. a body separate from Government Departments to commission evaluations of their policies and assess the quality of Departments’ own evaluations (Hallsworth and quality of Departments own evaluations (Hallsworth and Rutter, 2011, p24) – e.g. a body or bodies to undertake independent ex ante & ex post assessments of policy plans (e.g. an extended OBR)

slide-11
SLIDE 11

Demand side proposals: ex ante evaluation

f h l l d b l f f l

  • Range of other controls on policy and accountability for use of evaluation

– Review of UK ‘Impact Assessments’ to produce closer integration with

  • riginal Business Cases and reduce post hoc rationalisation

g p – Making a reality of plans for ‘Post‐Implementation Reviews’ in IAs – Stronger incentives for policy agencies to use evidence routinely

  • e.g. block on introduction of proposals unless policy makers can

show how the evidence relates to or supports their proposals (with rights of appeal if weak evidence) g pp ) – Select Committees to undertake an ‘evidence check’ on all submissions – Publication of all Departmental Risk Assessments of major policies

slide-12
SLIDE 12

Demand side proposals: ex post evaluation

( )

  • Greater use of experimentation (RCTs) to inform policy choices

(Haynes et al, 2012 – Behavioural Insights Team in Cabinet Office)

– Bold statements such as Bold statements such as

  • ‘RCTs are the best way of determining whether a policy is working ... we

should and could use RCTs much more extensively in domestic public policy to test the effectiveness of new and existing interventions ’ (p6) to test the effectiveness of new and existing interventions ... (p6)

  • Straightforward process summed up by the title ‘Test, Learn, Adapt’
  • Also wider endorsement of policy ‘pilots’, ‘demonstrations’,

‘pathfinders’, ‘trailblazers’

slide-13
SLIDE 13

Obama administration’s evidence‐based social policy initiatives

1 Id tif i t t i l bl 1. Identify important social problem 2. Identify ‘model’ responses shown by rigorous research (especially RCTs) to be effective 3. Obtain funds to scale up these programmes 4. Fund reputable organisations to replicate and scale up models 5 Continuously evaluate to assess ‘fidelity’ and effectiveness at scale 5. Continuously evaluate to assess fidelity and effectiveness at scale

  • Topics: home visiting, ‘teen pregnancy prevention, etc.
  • By budget 2014, all submissions will require evidence of effective testing

and policy experimentation including low‐cost rigorous evaluation using administrative data and use of waivers to legal provisions to enable i i f rigorous testing of programmes

slide-14
SLIDE 14

Demand side proposals: ex post evaluation

  • Policy makers to set out more clearly the underlying mechanisms and
  • Policy makers to set out more clearly the underlying mechanisms and

desired outcomes of policies before commissioning evaluation or implementing any policy, build in sensible variations to aid evaluation and strive not to move the ‘goal posts’ excessively during evaluation g p y g

  • Steps to prevent policy makers making spurious claims to be undertaking

evidence‐based policy making – place scientific advice and evaluation at heart of government in NZ p g (Gluckman, 2011) – more concerted effort to respect the ‘Principles of Scientific Advice to Government’ in UK

– e.g. ‘Government should respect and value the academic freedom, professional status and expertise of its independent scientific advisers.’ – e.g. ‘Scientific advisers should respect the democratic mandate of the Government to take decisions based on a wide range of factors and recognise Government to take decisions based on a wide range of factors and recognise that science is only part of the evidence that Government must consider in developing policy.’ http://www bis gov uk/go‐science/principles‐of‐scientific‐advice‐ http://www.bis.gov.uk/go‐science/principles‐of‐scientific‐advice‐ to‐government

slide-15
SLIDE 15

Joint interactive activities Joint, interactive activities

  • Careful analysis and discussion of whether, and, if so, what type of

evaluation could conceivably be used in this field in future by policy makers to shape their decisions makers to shape their decisions

– more and less formal ‘value of information analysis’ – affects goals and design of the evaluation (e.g. Is A better than B? vs. Which approaches to A work best?) approaches to A work best?)

  • Careful assessment of the ‘evaluability’ of a policy or initiative before

commissioning an evaluation or embarking on evaluation g g

slide-16
SLIDE 16

Five questions to assess ‘evaluability’ Five questions to assess evaluability

  • Where Is a Particular Intervention Situated in the Evolution of an

Where Is a Particular Intervention Situated in the Evolution of an Overall Intervention Program?

  • How Will an Evaluative Study of This Intervention Affect Policy

Decisions?

  • What Are the Plausible Sizes and Distribution of the Intervention’s

Hypothesized Impacts? Hypothesized Impacts?

  • How Will the Findings of an Evaluative Study Add Value to the

Existing Scientific Evidence? g

  • Is It Practicable to Evaluate the Intervention in the Time Available?

Ogilvie et al. Milbank Quarterly 2011: 89(2): 206–225

slide-17
SLIDE 17

Limitations of proposals I p p

  • ‘Piety and naivety’ (Walker, 2012) in relation to policy

making (demand side) making (demand side)

– requires assumptions of rational policy making processes run by fearless officials and politicians, with high tolerance of uncertainty, a public with high levels of trust, etc — e. g. rarely any sign of a willingness to make binding ex ante commitments to use findings of evaluation to take decisions to commitments to use findings of evaluation to take decisions to stop, modify or continue a policy — e.g. rarely any consideration of the role of evaluators in helping k li it bl f l ti make a policy suitable for evaluation

  • Thus offer limited counter‐weight to day to day pressures
  • n Ministers and officials in adversarial systems
  • n Ministers and officials in adversarial systems
  • Could increase ‘strategic’ requests for infeasible

evaluations for presentational reasons evaluations for presentational reasons

slide-18
SLIDE 18

Limitations of proposals II Limitations of proposals II

  • It is unclear how researchers are informed or find
  • ut about pre‐policy activities and on what terms

they participate in (confidential) policy processes they participate in (confidential) policy processes

  • If researchers are too closely integrated, do they

l th i ‘i d d ’ d th th i l ? lose their ‘independence’ and thus their value?

  • Testing and criticism of policy ideas/proposals by
  • utsiders could risk officials’ relationships with

Ministers which, in turn, are crucial to ‘good’ policy making (Hallsworth & Rutter, 2011

slide-19
SLIDE 19

Limitations of (specific) proposals III

  • ‘NICE for social policy’

– NICE is unusual in that some of its ‘guidance’ is effectively mandatory, but only applies to a small number of new technologies and not ‘system’ policies – Many aspects of policy cannot easily be reduced to a single set of unambiguously VFM interventions since context matters; multiple, conflicting objectives are the norm – Potential risk of stifling innovation and creativity especially in ‘difficult’ areas Potential risk of stifling innovation and creativity, especially in difficult areas – Possible that NICE‐type models might work in niche areas though remember that NICE is often ignored

  • More use of RCTs (Haynes et al, 2012)

– Cannot be a complete solution (better for components of policies i.e. ‘interventions’?) Many policies are highly unlikely to be placed under researcher control for a trial – Many policies are highly unlikely to be placed under researcher control for a trial – Most policies have multiple goals and relevant outcomes which RCTs struggle with – The counter‐factual may be unclear (e.g. Responsibility Deal) – RCTs tend to lack ability to explain how and why impacts came about which can be crucial if others are to use the knowledge to use the knowledge

  • e.g. WSD trial unable to explain effects observed or show which sub‐groups were responsible

– RCTs based on a wide range of settings needed for high external validity – RCTs may have to be very large to identify policy (as against clinical) effects validly – Even results of RCTs can be highly contested Even results of RCTs can be highly contested – Adapting policy in light of evidence is not a trivial task in a competitive political environment

slide-20
SLIDE 20

Limitations of proposals IV Limitations of proposals IV

  • More ex ante analysis

More ex ante analysis

– risk of degenerating into Mark Henderson’s ‘imaginary’ evidence – modelling can become hugely contested (e.g. NICE)

  • ‘Evaluability’ analysis

– tends to be neglected – competitive tendering tends to work against this type of analysis competitive tendering tends to work against this type of analysis – has to contend with a ‘more is better’ ideology especially from suppliers

slide-21
SLIDE 21

Are there excessive expectations of what evaluation can do?

“Trillions of dollars are invested yearly in “Trillions of dollars are invested yearly in programmes to improve health, social welfare, education, and justice (which we will refer to generally as public programmes) will refer to generally as public programmes). Yet we know little about the effects of most

  • f these attempts to improve peoples’ lives,

and what we do know is often not used to and what we do know is often not used to inform decisions. We propose that governments and non‐governmental

  • rganisations (NGOs) address this failure
  • rganisations (NGOs) address this failure

responsibly by mandating more systematic and transparent use of research evidence to assess the likely effects of public y p programmes before they are launched, and the better use of well designed impact evaluations after they are launched”.

Oxman A, et al. The Lancet 2010; 375(9712): 427‐431

slide-22
SLIDE 22

Limitations of proposals V Limitations of proposals V

  • Contrary trends

– media spotlight on government and pressure to act – downsizing and turbulence in government departments (reduced capacity) downsizing and turbulence in government departments (reduced capacity) – emphasis on ‘localism’ and out‐sourcing in finding and implementing solutions to problems

  • could lead to more formal experimentation and natural experiments but
  • could lead to more formal experimentation and natural experiments, but

also unhelpful variation since likely to increase number and variety of local decision makers

  • could also lead to such variation that ‘signal’ is lost in ‘noise’ and/or
  • could also lead to such variation that ‘signal’ is lost in ‘noise’ and/or

experiments are too small scale to general useful knowledge (e.g. Sure Start programme involved 500 sites but almost as many variants and big variation in quality of delivery) variation in quality of delivery)

slide-23
SLIDE 23

Conclusions and observations Conclusions and observations

  • The role of evaluation in public policy making seems to be ‘rediscovered’

periodically, sometimes (deliberately?) naively l b ll f b l

  • As an evaluator, I cannot be against calls for more, better evaluations!
  • Many of the recent proposals are worthy; some may have effect, but there are

many contrary pressures; and some seem naive

some policy processes may be more amenable than others to being ‘evidence based’ – some policy processes may be more amenable than others to being evidence‐based

  • Clearly policy makers seek to be informed (they are not irrational), but the

dynamics of science (why, how?) and policy (what to do?) are different

– so there will always be limits on the use of evaluative evidence in policy making – but we should not give up, rather work towards ‘intelligent’ policy making (Sanderson, 2009)

  • Interventions (policies) in a political setting cannot be separated from the

Ministers and agencies implementing them

when a policy is ‘tested’ the test includes a test of the judgement and reputation of the – when a policy is tested , the test includes a test of the judgement and reputation of the Minister and her advisers (and not as scientists!)

  • Much policy making involves a very different form of ‘experimentation’ from

that assumed by e.g. proponents of more RCTs

– a process of continuous (informal) learning from experience and feed back in a particular context, involving repeated adaptations of ‘interventions’

slide-24
SLIDE 24

Agency‐centred (conventional, linear, staged) model of policy design and implementation

Source: Eppel E, Turner D, Wolf A. Experimentation and learning in policy Source: Eppel E, Turner D, Wolf A. Experimentation and learning in policy implementation: implications for public management. Institute for Policy Studies Working Paper 11/04. Wellington: VUW, 2011, Figure 1

slide-25
SLIDE 25

‘Experimental’ (continuous action, feedback, iteration) model of policy design and iteration) model of policy design and implementation

Source: Eppel E Turner D Wolf A Experimentation and learning in policy Source: Eppel E, Turner D, Wolf A. Experimentation and learning in policy implementation: implications for public management. Institute for Policy Studies Working Paper 11/04. Wellington: VUW, 2011, Figure 2

slide-26
SLIDE 26

Some outstanding questions Some outstanding questions

  • Do these recent proposals simply amount to a call for ‘good’ policy

making?

– i.e. is there anything new here? – are the appeals to more evaluation (evidence) helpful or necessary for ‘better’ pp ( ) p y policy making?

  • Are they simply another way of arguing for a more ‘rational’ (and

thus politically more forgiving?) policy process? thus politically more forgiving?) policy process?

  • If so, can they deal with sectional interests, value conflict, etc.?
  • Do they require fundamental shifts in the balance of power and

authority to work?

– are such shifts at all realistic in current conditions?

  • If successful, what implications would they have for the

If successful, what implications would they have for the independence and distinctive contribution of evaluators?

– can researchers be partners without becoming servants of policy?

slide-27
SLIDE 27

References and further reading References and further reading

1. Weiss CH. Evaluation: methods for studying programs and policies. 2nd ed. Upper Saddle River NJ: Prentice Hall, 1998 2. Glasby J, ed. Evidence, policy and practice: critical perspectives in health and social care. Bristol: Policy Press, 2011 3 Hallsworth M Rutter J Making policy better: improving Whitehall’s core business London: Institute for Government 2011 3. Hallsworth M, Rutter J. Making policy better: improving Whitehall s core business. London: Institute for Government, 2011 http://www.instituteforgovernment.org.uk/publications/making‐policy‐better 4. Puttick R. Ten steps to transform the use of evidence. London: NESTA, 2011 http://www.nesta.org.uk/library/documents/TenStepsBlog.pdf 5. Puttick R. Why we need a ‘NICE for social policy’. London: NESTA, 2012 http://www.nesta.org.uk/library/documents/NICE.pdf p // g / y/ / p 6. HM Government. Civil Service Reform Plan 2012. London: The Stationery Office http://www.civilservice.gov.uk/reform 7. Haynes L, Service O, Goldacre B, Torgerson D. Test, learn, adapt: developing public policy with randomised controlled trials. London: Cabinet Office Behavioural Insights Team, 2012 http://www.cabinetoffice.gov.uk 8. BIS and Government Office for Science. Principles of Scientific Advice to Government. http://www.bis.gov.uk/go‐ science/principles‐of‐scientific‐advice‐to‐government 9. Ogilvie et al. Assessing the evaluability of complex public health interventions: five questions for researchers, funders and policymakers Milbank Quarterly 2011: 89(2): 206–225 10. Steventon A, et al. Effect of telehealth on use of secondary care and mortality: findings from the Whole System Demonstrator cluster randomised trial. BMJ 2012;344:e3874 doi: 10.1136/bmj.e3874 (Published 21 June 2012) 11. Walker D. Would a version of the health standards body Nice work for social policy? Guardian Policy Hub 11 June 2012 h // di k/ bli l d k/20 2/j / / i d d f l http://www.guardian.co.uk/public‐leaders‐network/2012/jun/11/nice‐standards‐for‐pol 12. Sanderson I. Intelligent policy making for a complex world: pragmatism, evidence and learning. Political Studies 2009; 57: 699‐719 13. Eppel E, Turner D, Wolf A. Experimentation and learning in policy implementation: implications for public management. Institute for Policy Studies Working Paper 11/04. Wellington: VUW, 2011 14 St k G J h P D i i t i li k i th h f id b t h t k P liti l St di 14. Stoker G, John P. Design experiments: engaging policy makers in the search for evidence about what works. Political Studies 2009; 57: 356‐73