I’ve just finished reading The Science of Evaluation: A Realist Manifesto by Ray Pawson (SAGE, 2013), which re-states the case for realist forms of evaluation and further explicates ways of doing it (e.g. in systematic reviews). Pawson is Professor of Social Research Methodology at the University of Leeds and he describes social science inquiry as a “febrile, pre-paradigmatic world” which is “unable to settle on a definitive set of first principles”. The book continues to make the case for realist philosophical grounding, and is a “sequel” to Realistic Evaluation and Evidence-base Policy: A Realist Perspective.
A number of useful crossovers between evaluation and foresight practice are clear in his latest work. For example, Pawson argues that, at its core, evaluation functions as a form of decision-support (similar to the arguments made by many foresight practitioners). He also focusses on the problem of complexity, describing the layers of complexity facing evaluators and associated methodological challenges (similar to foresight practitioners who stress the implication of complexity for forecasting and strategy). It is also argued that the strategic foresight field is in a pre-paradigmatic phase in its development (e.g. see this paper), which Pawson argues is the case for social science inquiry as-a-whole.
In this post I’ll focus on the discussion of the problem of complexity, including Pawson’s critique of alternative approaches to complexity such as the systems theory and the ‘systems thinking’ perspective and pragmatic perspectives – all of which are common in the foresight field. I’ll mainly draw on Part 2 of the book which is provocatively titled “The Challenge of Complexity: Drowning or Waving?”
Complexity in evaluation research
A key chapter – entitled “A complexity checklist” – outlines the key characteristics of program complexity, providing a useful ‘checklist’ with an easy to remember acronym: VICTORE. He argues that these aspects of complexity are characteristics of all programs. The acronym stands for:
- Volitions: the “choice architecture” of a program including how program subjects might respond to a program or intervention;
- Implementation: the implementation chains of an intervention/program which “are prone to inconsistency and interpretation, blockages, delays, and unintended consequences” (p.36);
- Contexts: the context of an intervention refers to the circumstance in which it plays out. Pawson outlines a “four I’s” framework: Individuals (characteristic and capacities of stakeholders in the program); Interpersonal relations; Institutional settings; and Infrastructure (which refers to the wider social, economic, and cultural setting of a program/intervention);
- Time: the history and of timing of an intervention;
- Outcomes: approaches for monitoring and ways stakeholders might interpret the outcomes;
- Rivalry: the pre-existing policy landscape in which the program is embedded – this primarily refers to “other, contiguous programs and policies may share or oppose the ambitions of the intervention under study and actions of stakeholders and subjects under study” (p.44); and
- Emergence: potential emergent effects, long-term adaptations, and unintended consequences associated with the program/intervention
All evaluators should step back before they commence research design and enter the field, and attempt an initial “mapping” of the contours of complexity “as they envelop the intervention(s) under study” (p.43). This advice is also just as relevant to foresight practice. In my work I’ve faced all these elements – the checklist makes sense to me. A key question is how best to deal with this complexity?
‘Non-realist’ approaches for dealing with complexity
Four approaches are examined: augmented trials perspective, the systems perspective, the critical realist perspective, and pragmatic perspectives. With regard to evaluation science Pawson argues that the first three are counterproductive. The fourth, pragmatic, perspective has merit but is argued to only offer piecemeal solutions and to lack “a unified philosophy” (p.47).
Augmented experimental trials (i.e. adapted forms of randomised controlled trial frameworks) are criticised for conceiving “of complexity in a sufficiently truncated form so that it can be dealt with within the mainstream of evaluation approaches”. A particularly important issue is whether the core experimental logic can be adopted when evaluating social interventions. That is, whether ‘experiment’ and ‘control’ groups are in stable and identical conditions, and whether the intervention can be conceived as ‘singular and stable’. Pawson is emphatic on this point: he argues that the “uniformity demanded [by this experimentalist logic] … cannot be realised in social interventions” (p.49).
The critique of systems perspectives is too detailed to do justice to here; instead I’ll note a few aspects. First, systems concepts are seen as important for evaluation research but not viewed as “master concepts” (i.e. they don’t underpin it). Second, a key concern is that ‘systems thinking’ multiplies rather than solves the complexity burden (Pawson argues it “embellishes” it), and that the very high level of abstraction constrains efforts to put empirical inquiry into these concepts.
Important for my research Pawson critiques Rittel and Webber’s characterisation of “wicked problems”. He is especially critical of two characteristics that I’ve previously discussed on this blog: “Every solution to a wicked problem is a ‘one-shot operation’ because there is not opportunity to learn from trial and error: every attempt counts significantly”; and “Every wicked problem is essentially unique”.
These characteristics “seem to embrace solipsism and deny that we can learn from inquiry to inquiry” (p.55), which is a key concern that I raised in blog posts last year. He comments:
To be sure, no two programmes will be exactly alike but we would be unable to recognise them as programmes unless they contained common elements: resources, implementation chains, stakeholders, inputs, outputs and so on. We would be unable to understand that the problems confronted were indeed wicked without the pre-existing and chequered family history of attempts to overcome health inequalities, underdevelopment, drug abuse, crime victimisation, and so on. On balance, I prefer Mark Twain’s starting point that history does not repeat itself, but it rhymes. (p.55)
A pragmatic perspective is also adopted and developed by various evaluators in response to unforeseen challenges and emergence, such as new models for addressing inconsistent program implementation, or evaluation in the context of supporting innovation and adaptation in dynamic environments (e.g. ‘developmental evaluation‘ which is not committed to any particular method), or other tailored blending of methods to address research challenges. Pawson’s argues these approaches are piecemeal, rather than offering a genuine solution to complexity. He writes that “one can modify a design to accommodate one arm of complexity only for its other limbs to trip up the inquiry” p.74)
Finally, the critical realist perspective developed by Bhaskar is argued to be a philosophical solution to explanation in complex systems which “is a parody of science” (p.71). It is further argued to be “yet another grab for the totalising explanatory systems for which vainglorious social science has an insatiable appetite”. Pawson does not mince words! The critique is too detailed to retell in full here.
Towards evaluative research that takes complexity seriously
So what is the realist alternative that Pawson is arguing for? It is simultaneously an expansion of the scope of evaluation and a more modest approach to such inquiry.
The modesty aspect refers to the notion of corrigible realism, which draws on the nomenclature of Karl Popper and Donald Campbell. Corrigible realists “admit to a permanent state of partial knowledge”. Research coverage is always partial “and the understanding of any intervention is always imperfect, impermanent, and thus corrigible” (p.85). This is argued to be a permanent condition.
Pawson also calls for “breaking the link between evaluation and the programme”. This aspect is the core of the proposed expansion in evaluation scope. By this he means the moving beyond the current practice of one-off evaluations which “start each inquiry from scratch” to programs of research “that are co-ordinated, cumulative and mutually informative” (p.84). The research strategy he proposes is to focus on underlying ‘intervention theories’ which are more generic than the specific program that is being evaluated. He writes: “rather than starting from scratch each inquiry should begin where previous ones have left off. The basic antidote to complexity is for inquiry to be iterative” (p.84):
Just as the old saying teaches that it takes a whole village to raise a child, it might indeed require an entire scientific discipline to evaluate complex programmes. (p.85)
A new set of organising principles is laid-out with the acronym TARMATO, drawing on the ideas of founding realist scholars: Theory, Abstraction, Reusable conceptual platforms, Model building, Adjudication, Trust, and Organised scepticism. The first one is central: as noted above Pawson argues intervention/program theories should be the central unit of analysis and “the gathering point for cumulative inquiry” (p.86). Moreover, theory is viewed as the starting point for evaluative research.
I’ll just briefly describe a couple of other key principles. Abstraction is the idea of developing ‘middle-range theory’ (as developed by Robert K. Merton) to provide the basis for drawing transferable lessons, and identifying the “common conceptual ground” that links different programs. The various ‘internal and necessary components’ of a program theory is termed a reusable ‘generic conceptual platforms’ (e.g. the necessary components of a public disclosure intervention, such as “league tables” for schools etc). Model building is basically a process of proposition development and refinement. Model building occurs as subsidiary hypotheses are added and examined, applying a realist logic.
The following quote captures the core emphasis on theory and model-building in the above principles: “evaluation science assumes that there will be some pattern to success and failure across interventions, and that we can build a model to explain it” (p.96).
The other principles are for how evaluation research is done (e.g. adjudicating between rival hypotheses) and organised (e.g. a community of researchers with continuing close scrutiny of each other’s work).
Overall The Science of Evaluation: A Realist Manifesto is an ambitious and important work. It is written in a surprisingly engaging and plain speaking manner (for the most part).
A few key aspects immediately jump out for my own research: 1) the need to do initial “mapping” of the contours of complexity for the cases that are being examined (applying the VICTORE checklist); 2) the emphasis on intervention/program theories as the focus of learning; and 3) drawing more systematically on the findings of previous evaluative inquiries into similar projects. Additionally the argument that realist forms of evaluation can help to avoid some of the potential pitfalls in efforts to handle “wicked” problems is especially important for my research and sustainability science more broadly.