‘Roughly 3 billion US Dollars have been spent on the impact evaluations and systematic reviews’, according to the International Initiative for Impact Evaluation (3ie) reporting in January 2020.+
This number only includes the impact evaluations and systematic reviews included in 3ie’s global repositories since 206. The field of evidence-informed policy has been dominated by the belief that more and better evaluations would lead to better policymaking.
Huge efforts and resources have gone into Randomised Control Trials (RCT) and impact evaluations, setting up new evaluation units in governments and national evaluation bodies to oversee an increasingly complex evaluation system.
Evaluations are important to ensure accountability and, well designed, they can help learn. Surely, too, they contribute to the body of global knowledge that eventually makes it into the narratives and trends and advice that inform policymakers.
But are evaluations being used?
I have long had my doubts. In this article, I report on two papers from 2014 and 2025 that add weight to my doubts.
Jean-Louis Arcand (2014) wrote about why impact evaluations seldom lead to evidence-based policymaking, and Michelle Rao (2025) discussed the lack of impact of evaluation on policy spending. I found Rao’s study really interesting. In Latin America, Conditional Cash Transfers (CCT) have become a “must-have” and are subject to multiple evaluations. Do they make a difference?
No, according to Rao. She focused on 128 evaluations of CCT programmes in Latin America and the Caribbean between 2000 and 2015 and found a robust zero correlation between evaluation results and spending, regardless of whether individual studies or cumulative evidence is considered.
Arcand found that even highly negative evaluations are unlikely to lead to programme termination unless the prior belief in program success is exceptionally weak and/or evaluators wield substantial influence. This suggests that if an idea is popular to start with; there is very little an evaluation can do to change track.
Here’s a summary of the main findings:
- Zero relationship between evaluation outcomes and spending: A key finding across multiple analyses is a robust and zero association between evaluation findings and subsequent policy spending. This holds regardless of how the evaluation results are summarised, whether by the magnitude of the treatment effects, statistical significance, or how positively the results are framed. In other words, don’t bother to invest in fancy communications.
- Individual evaluations vs. cumulative evidence: The lack of association between evaluation outcomes and spending is observed whether policymakers respond to individual evaluations or the cumulative evidence base. Even when aggregating multiple evaluations to account for heterogeneity and external validity, there is still no link between stronger aggregate evidence and increased spending. The strength of the evidence does not seem to predict when a country will start a new programme or end an existing one.
- Surprising findings: Policymakers do not appear to respond more to unexpected findings relative to the existing evidence base. More positive results compared to prior evidence do not lead to larger spending increases, and more negative results do not lead to larger decreases in spending. This is regardless of assumptions about how policymakers weigh evidence from other countries when forming their prior beliefs.
- Framing of research results: The way research results are framed, whether positively or negatively, does not significantly affect policy spending. More positively framed evaluations do not correspond with larger increases in spending.
- Credibility and generalisability do not matter: Factors such as the credibility of the study (e.g., use of randomised controlled trials or publication in top academic journals) and its generalisability (e.g., external validity to broader populations) do not seem to influence spending decisions. Our own research on credibility demonstrates that the robustness of research is not a critical factor when judging a source’s trustworthiness.
The only factors that seem to have a positive impact on us are the timeliness and relevance of the evaluation:
- Timeliness of evaluations: The most significant factor influencing the use of evaluation findings is the timeliness of the research. Evaluations that are made available more quickly after the year when the programme impact was measured are more likely to affect policy spending. Specifically, evaluations made available within four years are considered timely and are significantly predictive of spending. This suggests that the relevance of evaluation results decreases as time passes.
- Political alignment: The impact of timely evaluations is amplified when the evaluation results can be attributed to the political party in power. When the political party in power at the time of the evaluation is the same as the party in power when the results are published, there is a stronger association between the evaluation findings and subsequent changes in spending. This suggests that political considerations and incentives play a crucial role in the use of evidence. Timeliness and political alignment explain the very few cases of experiments scaled by the MineduLab.
- Actionability: The actionability of an evaluation, which includes both timeliness and relevance to the policymaker’s decisions, influences the use of the findings. The studies identify timeliness as a key dimension of actionability and also suggest that evaluations that measure outcomes closely aligned with policy objectives may be more actionable.
- Influence of evaluators: While not a primary focus of the empirical analysis, Jean-Louis Arcand models the policy-making process as a contest between different actors and suggests that the influence of frequentist (academic) evaluators can play a role. If the influence of frequentist evaluators is very low, as it tends to be in the real world, their work will likely have little effect on policy. According to this model, the probability of a program being cancelled is an increasing function of the influence of the Bayesian and frequentist evaluators and a decreasing function of the influence of anti-evaluation decision-makers.
What can explain this lack of influence?
The authors suggest the following potential reasons for lack of impact:
- Policymakers may face constraints to using evidence, leading to higher responsiveness to evidence well-aligned with policy decisions and associated with lower constraints to evidence use.
- There is also the possibility of “impact buying”, where policymakers commission research to justify desired spending changes. The papers provide some evidence against this, driving the findings in this context.
- Another potential reason is that policymakers may use evidence on comparative policies for relative spending decisions, but this seems unlikely in the context of conditional cash transfers, because of the lack of a comparable number of evaluations for other large-scale policies.
- The exogeneity of the evaluations is also discussed as a potential problem, as some evaluations are explicitly demanded by the governments implementing the programmes, and others aren’t. A civil servant in Peru once told me that international experts’ evaluations of their programmes were great but they could not use them. Peru’s results-based budgeting legislation meant that he had to respond to the evaluations commissioned by the government only. Other evaluations were nice to have, but he could not base his spending decisions on them.
Arcand also provides a model of the policymaking process as a competition between different types of actors – in which evaluators play a role:
- Anti-evaluation policymakers prefer to base their decisions solely on their prior beliefs without considering the results of any impact evaluations.
- Frequentist evaluators base their decisions solely on the results of an impact evaluation, recommending continuation when the results are positive and cancellation when they are negative.
- A Bayesian decision-maker attempts to combine their prior beliefs with the results of an impact evaluation in a statistically rational way to form their posterior beliefs.
In simple terms, even if an evaluation shows that a program is not working, a strong initial belief in the program by decision-makers and a lack of influence by evaluators means the programme will be difficult to cancel. According to the model, the more positive the evaluation results and the stronger the prior beliefs, the more difficult it is to cancel a programme. But even if the evaluation is negative, positive beliefs will make it difficult to cancel it.+
These studies underscore the importance of understanding the way in which narratives (or beliefs) are constructed, the political economy of policymaking and the relationship between research and policy. The limited impact of evaluations on policy spending is clearly down to prior sets of beliefs or values, political, economic, social and cultural constraints faced by policymakers, political considerations, or a lack of influence by academic evaluators who do not know or do not care to know about these constraints. We describe these factors as thorny issues for evidence-informed policymaking.
In summary, while programme evaluations provide valuable insights, they don’t consistently translate into changes in policy spending.
As a consequence, future investments in evaluations and evaluation bodies need to be considered against the capacity of these to be timely and politically relevant as they appear to play a crucial role, while factors such as the credibility and generalisability of the evaluation or the way the results are framed are not significant drivers for policy change.
Recommendations
Some simple recommendations may be formulated based on these findings:
- Time evaluations carefully—very carefully—to match the political and budget cycles of individual programmes, policies and governments.
- Align them—and the questions and methods—to the political agendas and interests of those in power – if possible embed the evaluations in oficial policymaking processes.
- Aim results and recommendations at the right people and roles.
- Frame them to address prior and strongly held beliefs and values.
- Ensure the evaluators have local influence—a PhD from an Ivy League is no guarantee.
But, above all, I think that funders should think hard about the potential impact of their investments before making them and invest more in research that independently challenges this questionable belief in the impact of evaluations – even if evaluations themselves are an intrinsically good thing to have.