Category Archives: Program Evaluation

Presentation recap for my AEA 2015 (#eval15) Presentations on #ProgDes

AEA 2015 was a special one for me for it was the first time the concept of program design got traction. I presented with fellow design-minded evaluators in two sessions.

In the first one, I reported on my experience of embedding design principles into a developmental evaluation. The presentation was entitled,  Lessons-learned from embedding design into a developmental evaluation: The significance of power, ownership, and organizational culture. And, here’s the abstract:  

Recent attempts at developmental evaluation (DE) are incorporating human-centered design (HCD) principles (Dorst, 2011; IDEO, n.d.) to facilitate program development. HCD promotes a design-oriented stance toward program development and articulates a set of values that focuses the evaluation beyond those ideals expressed by stakeholders. Embedding design into DE promises to offer a more powerful means to promoting program development beyond either approach alone. Yet, embedding design into DE introduces additional challenges. Drawing on a case study into a design-informed DE, this panelist discusses the tensions and challenges that arose as one developmental evaluator attempted to introduce design into a DE. Insights from the case study point to the importance of:

– Attending to power dynamics that could stifle or promote design integration; and,

– Evaluator sensitivity over the deep attachment program developers had over program decisions

These findings allude to the significance of organizational culture in enabling a design-informed DE.

In the second presentation, Chithra Adams (@ChithraAdams), John Nash (@jnash), Beth Rous (@bethrous), and I discussed how principles of human-centered design could be applied to the development of programs.

Specifically, we introduced two design exercises–Journey Mapping, and User Archetyping–as means to bringing human-centered design principles into program design and evaluation.

In an upcoming post, we’ll take a deep dive into these design exercises and examine their application to program design.

Are you curious about program design? Have you any particular questions about its methods and methodologies that you’d like us to write about? Drop me a note below or find me on Twitter @chiyanlam, where I curate tweets on evaluation, design, social innovation, and creativity.

Until next time. Onwards!

Merit/Worth/Significance Explained in Plain Language

I recently received an e-mail from a fellow doctoral student asking me to explain Scriven’s notion of merit/worth/significance. One part of her dissertation is around determining the value of test preparation training (e.g. MCAT/GMAT/LSAT prep courses) among language learners. One of her committee members suggested that she use M/W/S as a framework for tackling this aspect of her work. So, I wrote back to her, saying, why don’t we Skype and talk about this.

I’ve been thinking about this problem since. As an evaluator, I am reminded that one of the basic purposes in evaluation is the determination of merit/worth/significance of something. And, we typically refer to whatever we are evaluating (the ‘something’) as the evaluand. This classical definition of evaluation constitutes a part of what Scriven (1991) refers to as the logic of evaluation in a paper by the same name in the Evaluation Thesaurus. The logic of evaluation is a seminal contribution to the field as it gets at the core of what makes evaluation unique as compared to, say, research–evaluation allows us to make evaluative claims. The distinction between M/W/S and its application in evaluation is an important one, but finding accessible writing on this topic is difficult. Perhaps, m/w/s is so obvious to everyone else but me :). Hopefully not.

So… what’s merit, worth, and significance?

Merit, worth, and significance can be easily explained by reference to evaluating anapple. Say you’re at a grocery store. The decision you’ll have to make is to buy an apple. 


Merit has to do with the intrinsic properties, characteristics, or attributes of an evaluand. When buying an apple, most people would prefer an apple that is not rotten, is sweet to taste, and is not otherwise damaged or deformed. That’s typically what people would look for if the apple were to be eaten on its own. But, what if you were buying the apple to make an apple pie? Then, you may wish to buy an apple that is not sweet but  tart. So, as we can see, what we value to be desirable attributes of an object depends on other contextual factors. 

Here is another example. A car has merit if it is reliable (i.e. does not break down while you’re driving down the highway; predictable), is safe (i.e. has adequate safety features and operates as intended), and is powerful relative to its intended application (i.e. say, a commuter car vs a pick-up truck to haul construction material). Now, you may say, a car has merit only if it has an integrated air conditioning unit or a stereo system. A design-conscious person may insist that a car be visually appealing. Increasingly, drivers want good fuel consumption. Different people may hold different views of what constitutes merit. In other words, an evaluand may be evaluated against different dimensions of quality, i.e. criteria. Part of what makes evaluation  fun is surfacing the criteria that one might use to evaluate an evaluand. What’s ‘good’ to you is not necessarily ‘good’ to me. That’s why there are so many kinds of cars out there. 

In a program evaluation, we typically think of a program as having merit if: 1) it does what it sets out to do, i.e. achieves its intended outcomes, and that 2) it makes a meaningful difference as a consequence to its operation.


Now, worth is a trickier concept. In everyday parlance, we might say that an apple (assuming that is ‘good’) is worth something; that ‘something’ is typically expressed in some monetary value (e.g. this apple is worth $2.00; that car is worth $24,999.) So, worth is the value of an evaluand that is expressed as an equivalence to something else. We may say… that this activity is worth ‘my time’. Whereas merit can be difficult to measure, worth is usually expressed in some more easily measurable unit.

Another way to think about worth is in a comparative situation. Let say you’re evaluating two instances of the same program: Program Breakfast-for-all at Site A and Site B. While they may both have merits, the worth of the program at Site A may be different from Site B depending on its impact on the constituents. Worth between two comparable, but different programs may also differ if one is cheaper to run (so one is worth more than the other).

Finally, significance.

Significance is the fuzziest of the three. Significance refers to the values and meanings that one ascribe to an evaluand. Typically, one can learn about the significance of something by asking questions about: What makes this evaluand special? What meaning does it hold for particular individuals?

Ask any young bride about her diamond ring. While it may not feature a big diamond (so, the ring is of limited worth), it probably holds great significance. A young college graduate may be driving a high-mileage car that is nearing the end of its service life. We might speculate that the car has limited merit (i.e. the transmission is wonky, the body is rusting, but the car is still roadworthy), and as a result is of limited worth to any body, but to the college graduate it may hold significance for his/her livelihood depends on it to get him to work everyday.

Notice that significance often have little to do with merit. Indeed, a program may be shown to have limited impact on a community, but it may hold great significance for its symbolic value. We may say that “it matters! Even if it is to a few.” As another example, a program may be shown to be inefficacious, but if it is the only program of its kind that serves a specific need for a vulnerable population, that’s significance to know, isn’t it?

So what?

Knowing m/w/s well enables us not only to unpack what others mean by ‘good’, but it also helps in raising questions around understanding quality, say, when designing an interview guide or constructing survey questions.

Question for you: Is this how you understand merit/worth/significance? Might you have other powerful ways of explaining m/w/s to others? Comment below.  Thanks for reading!

PS: For all you educators out there, is a grade an indication of merit, worth, or significance, or any/all of three?

Highlights from Michael Quinn Patton’s #eval13 talk on the ‘State of Developmental Evaluation’

Michael Patton gave a great talk today at AEA13 on the State of Developmental Evaluation.  Here are some highlights.

1. The ‘Doors’ to Discovering Developmental Evaluation.

Patton observed that developmental evaluators and clients typically arrive at DE through multiple doors. One door through which people arrive at DE are those engaged in innovation. The second door through which people arrive at DE are those seeking systems change. The third door through which people arrive at DE are those dealing with complexity. The final door through which people arrive at DE are those working with unstable, changing context.

Driving this  ‘search for the alternative’ are evaluation users’ desire for a compatible evaluation framework.

2. DE is becoming a bonafide approach. 

AEA 13 features over 30+ sessions on developmental evaluation.

The Australasian Evaluation Society recently awarded their Best Policy and Evaluation Award to a crew of developmental evaluators.

(The CES awarded its  best student essay to an empirical research on understanding the capacity of DE for developing innovative program.)

3. DE is best enabled by clients who are willing to explore and experiment.

4. DE is methods-agnostic, and in fact, defies prescription.

Patton emphasized the importance of operating from the principles of DE and applying and adapting them when conducting DE. (Another way of looking this is to frame DE as engaging in inquiry… this might actually make a nice blog post).

Some observations…

Participants raised some great questions during the Q&A session.  Part of the confusion, it seems to me, lies in the more subtle aspects  to how and why Developmental Evaluation might be more appropriate/useful in some contexts. This confusion arises because of how necessarily responsive developmental evaluation is by design. The on-ramping for someone who hasn’t done DE, but wants to do it, can be difficult. So,  I wonder if there might be a place for a clearinghouse of sort for frequently asked questions—i.e. the sort often asked by newcomers.

Key Takeaways from Tom Chapel’s AEA13 Workshop: Logic Models for Program Evaluation and Planning

Learning is never an easy task, but, boy, is it worth it. One of the best aspects of the American Evaluation Association annual conference is actually what precedes it — the preconference workshops. More than 60(!) workshops are being offered this year. It is a great opportunity to hear some of our field’s luminaries, thinkers, theorists, practitioners, and innovators share what they know and love doing. It’s also a chance to ‘stay close to the ground’ and learn about the very real concerns and challenges practitioners are experiencing.

TomChapelI just finished Tom Chapel’s (Chief Evaluation Officer, Centre for Disease Control)  2-day workshop on “Logic Model for Program Evaluation and Planning”. In this blog post, I share some of the more salient insights gathered from his session.  Rarely  can one abstract evaluation issues so clearly from a practitioner perspective and be able to teach it so succinctly. He draws in great case example; they are rich, sufficiently complex, yet simple enough to carry great educational value. Kudos to Tom.

My interest in this is two-fold. I am interested in the practical aspects of logic modeling. I am also interested on a theoretical level how he argues for its role in evaluation practices. So, in no particular order, here are nine key insights from the session.  Some are basic and obvious, while others are deceivingly simple but not.

Some foundational ideas:

1)   At the most basic level, a logic model is concerned with the relationship between activities and outcomes. It follows the logic: if we do this, then we can expect this to occur.

2)   Program outcomes—more appropriately, a series of outcomes—drive at a “need”, i.e. the social problem that the program aspires to change.

3)   A logic model is aspirational in nature. It captures the intentions of a program. It is not a representation of truth or how the program actually is (that’s the role of evaluation).

4)   Constructing a logic model often exposes gaps in logic (e.g. how do we get from this step to this step…??). Bringing clarity to a logic model often requires clarification from stakeholders (drawing on practical wisdom) or  empirical evidence (drawing from substantive knowledge underlying the field). It also sets up the case to collect certain evidence in the evaluation if it proves meaningful in an evaluation to do so.

5)   And in talking with program folks about their conceptions of a program, differing logic about why and how the program works is often exposed. These differing views are not trivial matters because they influence the evaluation design and the resulting values judgment we make as evaluators.

6)   And indeed, explicating that logic can surface assumptions about how change is expected to occur, the sequencing of activities through which change is expected to occur, and the chain of outcomes through which change progresses towards ameliorating the social problem. Some of these assumptions can be so critical that unless attended to could lead to critical failure in the program (e.g. community readiness to engage in certain potentially taboo topics; cultural norms, necessary relationships between service agencies, etc…).

7)   Employing logic modeling, thus, avoids the business of engaging in black-box evaluation (a causal-attribution orientation)  which can be of limited value in most program situation. I like the way Tom puts it: Increasingly evaluation are engaged in the improving business, not just the proving business. Logic modeling permits you to open the black box and look at how change is expected to flow from action, and more importantly, where potential pitfalls might lie.

But here’s the real take-away.

8)   These kinds of observations generated from logic modeling could be raised not only at the evaluation stage, but also during planning and implementation. These process use (an idea usually attributed to Michael Patton) insights could prove tremendously useful even at these early stages.

9)   Indeed, problems with the program logic is especially problematic when raised at the end. Imagine telling the funder at year 5 that there is little evidence that the money made any real impact on the problem it set out to address. Early identification of where problematics could lie and the negotiations that ensue can be valuable to the program.

The Design Argument inherent in using Logic Modelling for Planning

First, what Tom is essentially suggesting here is that attention paid to the program logic is worthwhile for evaluators and program staff at any point during the program life cycle.

Where these conversations stand to make a real, meaningful contribution is before the “program is let out of the barn”.  This is important because the intentions inherent in the logic underlying a program gives rise/governs/promotes the emergence of certain program behaviour and activities (in much the same way that DNA or language syntax gives rise to complex behaviour). The logic both defines what IS and IS NOT within the program, doesn’t it.

So, if we accept  the premise that a program can be an object of design (i.e. that we can indeed design a program), then we could argue that the program logic constitutes a major aspect of the design. And because we can evaluate the design itself, as can we with any design objects, evaluating the program design becomes a plausible focus within program evaluation.

Jennifer Ann Morrow on 12 Steps on cleaning and prepping dataset

Jennifer Ann Morrow, faculty member in Evaluation, Statistics, and Measurement at the University of Tennessee, recently blogged about data cleaning and data set preparation at AEA365. She describes 12 steps in her post here, and excerpted below. This is a skill that all quantitative (and qualitative!) researchers should know how to do.

She’ll be running a Professional Development workshop  on the same topic at the upcoming Evaluation 2013 conference in Washington, DC.

1. Create a data codebook
a. Datafile names, variable names and labels, value labels, citations for instrument sources, and a project diary
2. Create a data analysis plan
a. General instructions, list of datasets, evaluation questions, variables used, and specific analyses and visuals for each evaluation question
3. Perform initial frequencies – Round 1
a. Conduct frequency analyses on every variable
4. Check for coding mistakes
a. Use the frequencies from Step 3 to compare all values with what is in your codebook. Double check to make sure you have specified missing values
5. Modify and create variables
a. Reverse code (e.g., from 1 to 5 to 5 to 1) any variables that need it, recode any variable values to match your codebook, and create any new variables (e.g., total score) that you will use in future analyses
6. Frequencies and descriptives – Round 2
a. Rerun frequencies on every variable and conduct descriptives (e.g., mean, standard deviation, skewness, kurtosis) on every continuous variable
7. Search for outliers
a. Define what an outlying score is and then decide whether or not to delete, transform, or modify outliers
8. Assess for normality
a. Check to ensure that your values for skewness and kurtosis are not too high and then decide on whether or not to transform your variable, use a non-parametric equivalent, or modify your alpha level for your analysis
9. Dealing with missing data
a. Check for patterns of missing data and then decide if you are going to delete cases/variables or estimate missing data
10. Examine cell sample size
a. Check for equal sample sizes in your grouping variables
11. Frequencies and descriptives – The finale
a. Run your final versions of frequencies and descriptives
12. Assumption testing
a. Conduct the appropriate assumption analyses based on the specific inferential statistics that you will be conducting.


The Design-Informed Program Evaluation Manifesto

the mastery within.

Consider the lyrical lines
from a Verdi’s bel canto aria,
or the highly evolved
from Picasso’s bull.

Consider the artful sentence,
or a poet’s communication through
white space.

the injustice of genocide
on a photograph;

the peaceful, pulsating
indicator light
on an Apple computer;

the transformation of an apprentice
under the tutelage of a master.

the catharsis from a Shakespearian tragedy
(or, say, a modern-day Steven Sondheim Sweeney Todd);

to inspire, to catalyze, to set in motion.

rhythm, sound, harmony, syntax,
colour, shade, composition.
This gestalt generates creative tension.

the ingenious thinking that lies within springs to life.

To the trained eyes,
simplicity may reveal complexity
and from chaos reveal order;
the resolution is one of beauty and wholeness.
To the untrained eyes,
the sophistication remains,
though unnoticed;

An experience is nevertheless shaped,

experienced, and inspired by the


To design is to render our intentions into the active voice.