Measuring Impact When You Cannot See the Program Directly
By Frank Salet
Most evaluation approaches are still built around one assumption: that, at some point, you can go and see the program.
As argued in our last piece, that assumption no longer holds across much of the development and humanitarian landscape.
The practical question is no longer whether data can be collected. It is how to turn what is available—often partial, indirect, and uneven—into a credible assessment of impact.
From data availability to evidence construction
In access-constrained settings, the limitation for evaluations is rarely the absence of data. Reports exist. Surveys can be conducted through host-national teams. Satellite imagery is available. Community-level feedback can be collected remotely.
The difficulty lies in what comes next.
Individually, these sources provide only fragments. They reflect different perspectives, capture different moments in time, and carry different biases. The challenge is not gathering them but combining them in a way that supports a defensible assessment.
This requires a shift in how evaluation is approached. Instead of treating data as standalone inputs, we need to treat evidence as something that is constructed—deliberately, and in layers.
Step 1: Build an evidence stack
In Apricity’s recent WASH (water, sanitation and hygiene) evaluation in Yemen, where direct access to most project sites was not possible, we designed our approach around technology-supported evidence layering from the outset.
Illustrative: Geo-referenced field observations collected in a hard-to-access environment as part of a layered evidence approach to impact assessment.
Rather than relying on a single method, we developed multiple evidence streams in parallel. Host-national teams conducted structured surveys, site visits, and key informant interviews. Simultaneously mobile-based data collection captured geo-referenced observations, photos, and short interviews across a wider set of locations. We acquired satellite imagery for selected sites, allowing comparison of physical infrastructure before and after implementation. We then aggregated these evidence layers through geospatial analysis, linking each observation to specific locations and interventions.
Each layer provided something different. Field teams described how systems were being used and managed, while community contributors captured more informal, day-to-day realities—sometimes confirming, sometimes challenging formal reporting. Satellite imagery offered an independent view of whether physical infrastructure existed and when it appeared.
Individually, none of these sources were complete. But together, they produced a picture no single method could have achieved alone.
Step 2: Test signals against each other
The value of this layered approach does not come from the number of data sources, but from how they are used.
A common interpretation of triangulation is that multiple sources should confirm the same finding. In practice, the opposite is often more useful. Differences between sources are not a problem to resolve; they are a signal to investigate.
In one case in Yemen, all sources aligned: a newly constructed water tower was visible in satellite imagery, verified on-site by enumerators, and reported as fully operational by facility staff. In another case, the picture was less straightforward: survey data suggested that systems were functioning well, with high reported satisfaction and consistent access. At the same time, community-level reporting pointed to irregular supply or declining reliability. Satellite imagery, where applicable, confirmed that infrastructure had been built, but could not speak to how well it was working.
These differences did not weaken the analysis. They made it more precise. They allowed the evaluation to move beyond the simple question of whether something had been built, toward a more meaningful assessment of whether it was working as intended.
Step 3: Interpret, don’t just aggregate
At this stage, the task shifts from assembling data to interpreting relationships because more data does not automatically lead to better insight. In many cases, the critical step is understanding why different sources point in different directions.
In Yemen, this revealed patterns that would have been difficult to identify through any single method. Systems that appeared technically sound were sometimes underperforming because there was no mechanism to fund maintenance. Infrastructure that remained physically intact showed declining use over time. In other cases, design decisions—such as the accessibility of a water tank—limited effectiveness despite successful construction.
These are not discrepancies to be averaged out. They are the substance of the evaluation. Impact, in this context, is not something that can be directly or singularly observed. It emerges through the interaction of these signals—where they reinforce each other, and where they diverge.
Step 4: Arrive at defensible confidence
This layered approach does not aim to eliminate uncertainty. It aims to make it explicit and manageable.
Rather than presenting findings as binary—verified or not, successful or not—it supports calibrated judgments. Where multiple sources align, confidence increases. Where evidence is partial or mixed, conclusions can still be drawn, but with appropriate caution.
This is how our Yemen evaluation approached its findings. Outcomes were assessed with moderate confidence, based on the combined weight of evidence from surveys, community reporting, and satellite observation. The result was not perfect certainty, but a level of clarity that was sufficient to support decision-making.
In practice, that is what matters. Evaluations are not conducted to eliminate all doubt, but to inform choices in environments where uncertainty is unavoidable.
What this changes
This way of working challenges a long-standing assumption in monitoring and evaluation: that credibility is primarily a function of proximity.
Direct observation remains valuable, but it is not the only basis for reliable evidence. In many contexts, it is not even the most realistic one.
Credibility comes from how evidence is built. It depends on whether sources are brought together in a structured way, tested against each other, and their limitations are acknowledged rather than ignored.
For practitioners, this means moving beyond a focus on individual methods and toward designing coherent evidence systems. For donors, this requires recognizing that remote and hybrid approaches are essential to working in fragile and hard-to-access environments.
The question is no longer whether we can see the program directly. It is whether we can construct a clear enough picture to understand what is happening—and what needs to change.
Done well, that is not a second-best option. It is a different, and often a more rigorous way of working.
What does it actually mean to treat a map as evidence rather than illustration? And how can this impact evaluation findings? In our next piece, we dive into these questions by looking at one of the most underused tools in the evaluator’s kit – spatial analysis.
Frank Salet is part of the Apricity team. This piece draws on experience gained through that work but reflects their own views and conclusions and does not necessarily represent the views of Apricity.