

For practitioners and advocates across our sector, Vision Zero is a moral goal, one which rightfully leads to us removing risk and creating layers of protection ‘systemically’ within the Safe System. But as we deliver this change together and casualty numbers on our roads (hopefully) come down, there is a challenge for evaluators: Less Data. How can we tell what works with a scarcity of outcome data?
Nathan Harpham
Principal Consultant
Samuel Scott
Senior Consultant
The current burden of road traffic injury is huge, and in 2023 there were nearly 30,000 people killed or seriously injured on Britain’s roads. So, we have a long way to go! But, if we split this down according to individual regions, smaller time periods or even sections of road, the numbers can be too small to see a meaningful impact when evaluating a scheme. Evaluation is crucial for operators and partners delivering road safety interventions, making sure we are achieving value for money and having a genuinely positive impact on safety outcomes. So, we need to consider how evaluation can continue to be effective as we progress towards Vision Zero.
Is it better to have 9 Casualties than 10? This seems to be a straightforward question, and on the surface, has a very easy answer; yes, fewer casualties means less road traffic injury and trauma on our roads. However, the question for evaluators here is, does a reduction of 1 casualty actually mean our roads are intrinsically safer? And, if not, what other evidence can help us to determine that?
In this latest blog, Sam and Nathan from Agilysis reflect on what evaluating really means when you don’t have enough data and what we might do about it. We consider how this can present difficult scenarios that make it all the harder to generate reliable results that provide the basis for improved delivery and justification for intervention. From both a practical and philosophical perspective, evaluation is riddled with pitfalls, often right from the get-go, but as we look to build and cement a culture of honest and open evaluation across the sector, one that recognises the critical need for evidence-led delivery, we can be sure that addressing these challenges proactively and in open dialogue with each other is the way to go.
A Difficult Reality – Delivery Context and Data Scarcity
Learning and accountability are just two of the words that summarise what a culture of evaluation looks like – starting with that critical question of what are we trying to achieve? and afterwards what have we achieved and how have we learned from it to inform future action? Evaluation is a cycle of learning, one that should be embedded across the design, testing, implementation, and selection of all measures associated with a given intervention. In an environment where operational delivery, available finance, resource and organisational will to evaluate effectively have all been heavily impacted in recent years, evaluating (whether directly or through commissioned partners) has for many become a luxury. This is despite the intrinsic value we know this brings to Safe System success and quality-assured, cost effective delivery.
The principle of proactive action on evaluation (a core feature of the Safe System!) collides with these political, strategic and physical realities – something which is especially true for local actors and responsible partners who are most at the mercy of competing demands, directives from above, and fiscal constraints (both capital and revenue related).
Even when we have the support and culture that lends itself to a proactive approach, evaluation (and persuading others of its value!) is naturally held back a lack of locally-collated data and sufficient evidence to corroborate its impact for planned or continued delivery at a specific time and place. To overcome these everyday barriers, evaluators have a number of options.
Weighing Up Our Options
There are a number of recurring barriers for evaluators when it comes to this data conundrum, with equally as many opportunities to remedy, or at the very least mitigate their negative methodological impacts (see Engagement with different options is vital to securing the future of evaluation as a highly valuable Safe System activity.
Interpreting Evaluation Data
Once we have developed an evaluation plan to address the challenge of limited data, we should look to ensure we interpret results in a manner that is proportionate to the data and evidence at hand:
- Size of change versus relative change: Magnitude of change should be interpreted alongside comparative analysis; large changes can be misinterpreted if we don’t consider the change in context (from baselines or long-term trends etc.).
- Accounting for exposure and use of denominators: Change does not exist in a vacuum and is not isolated from ‘the amount of network activity’ that is taking place. Expressing values as ‘rates of change’ can help assess this relative to contextual factors.
- Accuracy in reporting: Theory-based, experimental, and quasi-experimental designs (collectively involving numerous methods as potential options) should be interpreted based on what the data and type of evidence used can actually tell us. The temptation to make statements that go beyond what has been found undermines the fabric of an evaluation and the basic validity of results.
- Natural variation, regression, and causation: Honest engagement in attributing contribution directly to intervention impacts (causation) or drawing connections between changes and intervention effects (correlation) feeds into having plausible evaluation outputs that take account of expected statistical variations such as ‘regression-to-the-mean.’
All of the options for overcoming barriers, as well as interpreting results effectively, require a solid basis if published results are to be credible. This means having robust a logic model or ‘theory of change’ that connects how change is intended to come about from initial inception and inputs, through to output activities and measured outcomes.
Strategic Opportunities - Addressing Evaluation Barriers and Options Together
Whilst evaluators can seem to be ‘specialist operators’ and evaluation methodologies can seem daunting for those who engage only periodically with this type of evidence, we can all play our part in making this space more accessible and resilient as we address barriers together:
- Using only traditional datasets such as STATS19 and tube survey speed data limits opportunities to gather evaluation evidence on indirect or ‘surrogate’ measures for safety. Use of datasets in isolation can also limit opportunities to push for innovative analytical methods and data integration that could really transform how the sector evaluates.
- Engaging with the most up-to-date evaluation guidance is critical to delivering evaluation that is grounded in extracting as much value as possible on whether an intervention works. Evaluation of existing strengths, weaknesses and efficacy of approaches taken, as well as verified and effective behavioural components and emotional levers have transformed the sector’s thinking. Similarly updated government communications and evaluation guidelines, alongside international guidance, are placing evaluation at the heart of policymaking and helping to reach audiences that have proved especially difficult to influence.
- Evaluating as part of a ‘team’ or consortia helps to share the load and make the whole thing less daunting; pooling resource to either directly deliver evaluation or commission and collaborate with independent experts to undertake evaluation on our behalf.
- Publishing and sharing all results where possible is key to progress, being honest about what has and has not worked. Even interventions that do not result in desired changes or improvements should be published so that others can learn from the results and so that practitioners incorporate learnings; avoiding commitment to and duplication of ineffective interventions. This is especially important for interventions that focus on behaviour change (where behavioural theories, intervention functions, and ‘BCTs’ are utilised in an evaluation) as a central premise of their design.
- Long-term planning for what can and cannot be evaluated helps articulate what can plausibly be concluded at a given or future point in time. Critically, this fosters a focus on how best to ensure longer-term evaluation can be secured through embedding it across scheduled activities, helping to avoid a lack of data and evidence later on.
Evaluation is not a one-size-fits-all. Different scenarios lend themselves to specific needs and requirements before, during and after intervention rollout; all whilst getting the fundamentals right for a credible approach.
So, 9 casualties is definitely better than 10 from a Vision Zero perspective. But from the standpoint of delivering a fit-for-purpose, proportionate and actionable evaluation that is based on a large enough ‘pool of data’– there are many acute barriers. These can only be addressed if we work together on this persistent challenge and foster transparency – after all, this is about learning and accountability, right?
Table 1 - Barriers and options with a limited ‘data pool’
Barrier |
Description |
Remedial Option |
Option Implications |
|
+ |
- |
|||
Limited Sample Size |
Having a large enough sample size is key to meaningful results. Sample sizes that are too small render analytical methods less impactful, with results not statistically significant. |
Extend evaluation period (timeframe) |
Provides data representative of more medium to long term patterns, with no ‘new’ sources needed. This also helps smooth out random variations and account for ‘regression to the mean.’ |
Extending data collection periods can incur resource and time costs, as well as longer waiting times for data and final evaluation reporting. |
Extend the scope of an evaluation |
Opens up the evaluation to new areas (either new target areas or new evaluation parameters) that may increase the likelihood of gathering additional insight, such as around unintended consequences. This could take the form of expanding the evaluation design (approach type, methods, sampling) and the amount of measured or controlled variables. |
Extending the evaluation’s scope may lead to extra time in incorporating, measuring, and reporting on additional insight gathered, especially if new evaluation questions, sources or methods are introduced which fundamentally alter the evaluation’s basic design parameters. |
||
Diminishing ‘Rate of Returns’ |
Intervening with increased efficacy decreases casualty numbers. This leaves an ever-smaller ‘pool of data’ that exhibit patterns and trends. |
Ground intervention design in best practice and/or link to what has worked elsewhere, replicating evaluation methods where possible. |
This provides a methodological basis for evaluation where there is limited data in an area where there are already effective measures and published results. Utilising an established logic model can demonstrate value across all phases of intervention delivery. |
Leaning too heavily on best practice / guidance documentation without sufficient data to measure change can create more of theory-based premise. Although even for complex evaluations this can be the right approach to take when interactions at play are hard to detangle. |
Collaborate with partners to expand the ‘target area’ of an evaluation. |
Grants potential access to more outreach channels and hence a greater data pool (target population survey responses or numerical count data etc.). Collaborating or commissioning support can guarantee quality outputs and a strong independent voice that leads to trustworthy results. |
Expanding the target area via use of partner means responsibility for the evaluation’s delivery is likely to be shared, and if not managed sufficiently, could lead to duplication or lack of consistent data collection and even compromised reporting. |
||
Reliance on traditional data and analysis |
Relying on traditional sources that measure casualty numbers such as STATS19.With smaller numbers of casualties there is a need to look beyond these sources and to take a more proactive approach (in line with the Safe System), looking at indicators and risk rather than final outcomes. |
Supplementing traditional data with ‘proxy’ or ‘surrogate’ indicators which are inherently linked to the amount of harm on the roads |
This can massively increase the sample size. For example, ‘near miss’ events are much more frequent than collisions. Assuming that the evidenced link between an indicator or ‘proxy measure’ and the final outcome is strong, these can provide good evidence relating to the desired change e.g. reduction in casualty numbers. |
Exploration of appropriate proxy measures and indicators is time-consuming when done in isolation and can detract from the known value of using validated data. It can also be difficult to establish the evidence base behind the indicators and why they are useful. |
Integrating novel datasets and pursuing innovative methods |
Relying on traditional data and analytical methods when evaluating is a sector level risk: if new types of data and methods are not tried, tested and verified for use, then how can the sector become more confident in their use? As a sector we need to be establishing more innovative approaches to utilise the rich data environment we are in. |
The high level of technical skill and expertise incumbent in using novel datasets and methods is a persistent barrier. Investing time and resource to ensure this can be done adequately is often not feasible for stakeholders by themselves, with agreements to acquire these skills (or personnel) often more feasible alongside partner organisations. |
References
Agilysis. (2024). A Nudge in the Right Direction: Evaluation Report of the Adult Pedestrian Trial. So-Mo.
Agilysis. (2024). Behaviour in the Safe System.
Box, E. (2023). Empowering young drivers with road safety education: practical guidance emerging form the Pre-Driver Theatre and Worskhop Education Research (PDTWER). RAC Foundation.
Box, E. (2024, July 10). OPINION: young driver interventions should focus on positive, empowering strategies. Retrieved from https://roadsafetygb.org.uk/news/opinion-young-driver-interventions-should-focus-on-positive-empowering-strategies/
European Commission: Directorate-General for International Partnerships, Hassnain, H., McHugh, K., Lorenzoni, M., Alvarez, V., Augustyn, A., Davies, R., Rogers, P., & Buchanan-Smith, M. (2024). Evaluation handbook. Publications Office of the European Union.
Fosdick, T. (2019). Effectiveness of UK Road Safety Behaviour Change Interventions. RAC Foundation.
Fylan, F. (2017). Using Behaviour Change Techniques: Guidance for the road safety community. RAC Foundation.
Government Communications Service. (2024). The GCS Evaluation Cycle.
HM Treasury. (2020). Magenta Book: Central Government guidance on evaluation.
Waring, S., Almond, L., & Halsall, L. (2024). Safe Drive Stay Alive Road Safety Intervention for Young People: A Process and Outcome Evaluation. Safer Roads Greater Manchester Partnership.
Wass, N., & Hope, H. (2022). Message Not Received: Evaluation Report. So-Mo.