This blogpost comes from a recent Issues Paper I wrote for the Melbourne School of Government, which also contains a countervailing view from my outstanding colleague at the University of Melbourne, Professor Jenny Lewis.
Around election time, politicians often state their support for ‘evidence-based policy’. But what does this really mean and how do we distinguish strong evidence from weak evidence?
The ultimate goal of evidence-based policymaking is better public policies, thereby creating healthier and wealthier societies. Evidence-based policies should also provide taxpayers with more confidence that the Government is spending their hard earned money wisely. Once we agree that these are the right goals to strive for, the question is: how do we get there? Setting lofty, long-term goals is no doubt important, but there is much we can do in the short-term to evaluate how government policies are faring with regard to our ‘healthy and wealthy’ agenda.
The idea underpinning evidence-based policy is pretty simple: since the world is a complex place, it is not always straight forward to determine whether a policy has worked: we need to invest effort into figuring out which ones ‘work’ and which ones don’t. The use of evidence – rather than relying on ideology – should ensure that good policies survive and bad policies are killed off.
Despite its apparent simplicity, the evidence-based policy agenda is often misunderstood. Assuming we can agree on what we mean when we say the policy ‘works’, we need a set of tools and techniques to help guide us. But how do we do this? The key challenge is to identify a counterfactual: what would have happened in the absence of participating in the program? Although this is challenging, there are ways in which it can be calculated.
Public policymakers face a difficult task in trying to maximise the returns from their programs. Focusing on the economic (non-political) objectives of the program, it seems obvious that all policymakers want the best possible outcomes given the prevailing economic conditions. However, programs have both intended and unintended consequences, which both need to accounted for.
More generally, it often isn’t possible to predict the effects of a specific policy unless similar policies have been implemented elsewhere. But lesson can only be learnt if the outcomes of the program are carefully documented and analysed. Since it is difficult to establish causal effects of policies, such analysis must be undertaken carefully. The consumers of this analysis are ultimately the central agencies who control the purse strings of the government departments: ‘high-quality’ evidence is needed to convince them of the merits of the program.
Doubtless, there are many factors that influence the success of a specific policy. Nevertheless, there are a range of methods and techniques that can be used to examine the effects of specific programs. These methods produce the ‘evidence base’ from which to build better policy. But because not all evidence produced by these methods is the same quality, it is equally important to be able to rank them. That is, to provide a hierarchy of evidence – in which the gold standard will provide robust causal evidence of the effects of a specific program.
In recent years, it has become clearer that access to unit-record data has become important for building the evidence base: the analyst must be able to see the characteristics of the individual unit of interest in order to draw strong inferences about the effects of policy. And with many new initiatives to link data and make it more accessible, there is every chance that technology will make it much cheaper (and safer in terms of privacy) to do so in the future.
Programs often have effects that ripple throughout the economy, not just on the target group of interest. For example, programs designed to impact mothers’ labour supply will probably affect mothers’ decisions about if (and when) to return to (casual, part-time or full-time) work, but are also likely to have effects on the demand for childcare services, take-away meals and dry cleaning services. In these studies, the determination of whether a program ‘works’ is in the broadest sense of the word.
Since these economy-wide effects are difficult to evaluate, evaluations often resort to the simpler (but still difficult) task of evaluating the effects of the program on the target group: that is, whether the program works in the ‘narrow sense’.