MERL Tech News

MERL Tech DC Session Ideas are due Monday, Apr 22!

MERL Tech is coming up in September 2019, and there are only a few days left to get your session ideas in for consideration! We’re actively seeking practitioners in monitoring, evaluation, research, learning, data science, technology (and other related areas) to facilitate every session.

Session leads receive priority for the available seats at MERL Tech and a discounted registration fee. Submit your session ideas by midnight ET on April 22, 2019. You will hear back from us by May 20 and, if selected, you will be asked to submit the final session title, summary and outline by June 17.

Submit your session ideas here by April 22, midnight ET

This year’s conference theme is MERL Tech: Taking Stock

Conference strands include:

Tech and traditional MERL:  How is digital technology enabling us to do what we’ve always done, but better (consultation, design, community engagement, data collection and analysis, databases, feedback, knowledge management)? What case studies can be shared to help the wider sector learn and grow? What kinks do we still need to work out? What evidence base exists that can support us to identify good practices? What lessons have we learned? How can we share these lessons and/or skills with the wider community?

Data, data, and more data: How are new forms and sources of data allowing MERL practitioners to enhance their work? How are MERL Practitioners using online platforms, big data, digitized administrative data, artificial intelligence, machine learning, sensors, drones? What does that mean for the ways that we conduct MERL and for who conducts MERL? What concerns are there about how these new forms and sources of data are being used and how can we address them? What evidence shows that these new forms and sources of data are improving MERL (or not improving MERL)? What good practices can inform how we use new forms and sources of data? What skills can be strengthened and shared with the wider MERL community to achieve more with data?

Emerging tools and approaches: What can we do now that we’ve never done before? What new tools and approaches are enabling MERL practitioners to go the extra mile? Is there a use case for blockchain? What about facial recognition and sentiment analysis in MERL? What are the capabilities of these tools and approaches? What early cases or evidence is there to indicate their promise? What ideas are taking shape that should be tried and tested in the sector? What skills can be shared to enable others to explore these tools and approaches? What are the ethical implications of some of these emerging technological capabilities?

The Future of MERL: Where should we be going and what should the future of MERL look like? What does the state of the sector, of digital data, of technology, and of the world in which we live mean for an ideal future for the MERL sector? Where do we need to build stronger bridges for improved MERL? How should we partner and with whom? Where should investments be taking place to enhance MERL practices, skills and capacities? How will we continue to improve local ownership, diversity, inclusion and ethics in technology-enabled MERL? What wider changes need to happen in the sector to enable responsible, effective, inclusive and modern MERL?

Cross-cutting themes include diversity, inclusion, ethics and responsible data, and bridge-building across disciplines. Please consider these in your session proposals and in how you are choosing your speakers and facilitators.

Submit your session ideas now!

MERL Tech is dedicated to creating a safe, inclusive, welcoming and harassment-free experience for everyone. Please review our Code of Conduct. Session submissions are reviewed and selected by our steering committee.

Upping the Ex Ante: Explorations in evaluation and frontier technologies

Guest post from Jo Kaybryn, an international development consultant currently directing evaluation frameworks, evaluation quality assurance services, and leading evaluations for UN agencies and INGOs.

“Upping the Ex Ante” is a series of articles aimed at evaluators in international development exploring how our work is affected by – and affects – digital data and technology. I’ve been having lots of exciting conversations with people from all corners of the universe about our brave new world. But I’ve also been conscious that for those who have not engaged a lot with the rapid changes in technologies around us, it can be a bit daunting to know where to start. These articles explore a range of technologies and innovations against the backdrop of international development and the particular context of evaluation.  For readers not yet well versed in technology there are lots of sources to do further research on areas of interest.

The series is half way through, with 4 articles published.

Computation? Evaluate it!

So far in Part 1 the series has gone back to the olden days (1948!) to consider the origin story of cybernetics and the influences that are present right now in algorithms and big data. The philosophical and ethical dilemmas are a recurring theme in later articles.

Distance still matters

Part 2 examines the problems of distance which is something that technology offers huge strides forwards in, and yet it remains never fully solved, with a discussion on what blockchains mean for the veracity of data.

Doing things the ways it’s always been done but better (Qualified)

Part 3 considers qualitative data and shines a light on the gulf between our digital data-centric and analogue-centric worlds and the need for data scientists and social scientists to cooperate to make sense of it.

Doing things the ways it’s always been done but better (Quantified)

Part 4 looks at quantitative data and the implications for better decision making, why evaluators really don’t like an algorithmic “black box”; and reflections on how humans’ assumptions and biases leak into our technologies whether digital or analogue.

What’s next?

The next few articles will see a focus on ethics, psychology and bias; a case study on a hypothetical machine learning intervention to identify children at risk of maltreatment (lots more risk and ethical considerations), and some thoughts about putting it all in perspective (i.e. Don’t Panic!).

Good, Cheap, Fast — Pick Two!

By Chris Gegenheimer, Director of Monitoring, Evaluating and Learning Technology at Chemonics International; and Leslie Sage, Director of Data Science at DevResults. (Originally posted here).

Back in September, Chemonics and DevResults spoke at MERL Tech DC about the inherent compromise involved when purchasing enterprise software. In short, if you want good software that does everything you want exactly the way you want it, cheap software that is affordable and sustainable, and fast software that is available immediately and responsive to emerging needs, you may have to relax one of those requirements. In other words: “good, cheap, fast – pick two!”

Of course, no buyer or vendor would ever completely neglect any one of those dimensions to maximize the other two; instead, we all try to balance these competing priorities as best we can as our circumstances will allow. It’s not an “all or nothing” compromise. It’s not even a monolithic compromise: both buyer and vendor can choose which services and domains will prioritize quality and speed over affordability, or affordability and quality over speed, or affordability and speed over quality (although that last one does sometimes come back to bite).

Chemonics and DevResults have been working together to support Chemonics’ projects and its monitoring and evaluation (M&E) needs since 2014, and we’ve had to learn from each other how best to achieve the mythical balance of quality, affordability, and speed. We haven’t always gotten it right, but we do have a few suggestions on how technological partnerships can ensure long-term success.

Observations from an implementer

As a development implementer, Chemonics recognizes that technology advances development outcomes and enables us to do our work faster and more efficiently. While we work in varied contexts, we generally don’t have time to reinvent technology solutions for each project. Vendors bring value when they can supply configurable products that meet our needs in the real world faster and cheaper than building something custom. Beyond the core product functionality, vendors offer utility with staff who maintain the IT infrastructure, continually upgrade product features, and ensure compliance with standards, such as the General Data Protection Regulation (GDPR) or the International Aid Transparency Initiative (IATI). Not every context is right for off-the-shelf solutions. Just because a product exists, it doesn’t mean the collaboration with a software vendor will be successful. But, from an implementer’s perspective, here are a few key factors for success:

Aligned incentives

Vendors should have a keen interest in ensuring that their product meets your requirements. When they are primarily interested in your success in delivering your core product or service — and not just selling you a product — the relationship is off to a good start. If the vendor does not understand or have a fundamental interest in your core business, this can lead to diverging paths, both in features and in long-term support. In some cases, fresh perspectives from non-development outsiders are constructive, but being the outlier client can contribute to project failure.

Inclusion in roadmap

Assuming the vendor’s incentives are aligned with your own, it should be interested in your feedback as well as responsive to making changes, even to some core features. As our staff puts systems through their paces, we regularly come up with feature requests, user interface improvements, and other feedback. We realize that not every feature request will make it into code, but behind every request is a genuine need, and vendors should be willing to talk through each need to figure out how to address it.

Straight talk

There’s a tendency for tech vendors, especially sales teams, to have a generous interpretation of system capabilities. Unmet expectations can result from a client’s imprecise requirements or a vendor’s excessive zeal, which leads to disappointment when you get what you asked for, but not what you wanted. A good vendor will clearly state up front what its product can do, cannot do, and will not ever do. In return, implementers have a responsibility to make their technical requirements as specific, well-scoped, and operational as possible.

Establish support liaisons

Many vendors offer training, help articles, on-demand support, and various other resources for turning new users into power users, but relying on the vendor to shoulder this burden serves no one. By establishing a solid internal front-line support system, you can act as intermediaries and translators between end users and the software vendor. Doing so has meant that our users don’t have to be conversant in developer-speak or technical language, nor does our technology partner have to field requests coming from every corner of our organization.

Observations from a tech vendor

DevResults’ software is used to manage international development data in 145 countries, and we support M&E projects around the world. We’ve identified three commonalities among organizations that implement our software most effectively: 1) the person who does the most work has the authority to make decisions, 2) the person with the most responsibility has technical aptitude and a whatever-it-takes attitude, and 3) breadth of adoption is achieved when the right responsibilities are delegated to the project staff, building capacity and creating buy-in.

Organizational structure

We’ve identified two key factors that predict organizational success: dedicated staff resources and their level of authority. Most of our clients are implementing a global M&E system for the first time, so the responsibility for managing the rollout is often added to someone’s already full list of duties, which is a recipe for burnout. Even if a “system owner” is established and space is made in their job description, if they don’t have the authority to request resources or make decisions, it restricts their ability to do their job well. Technology projects are regularly entrusted to younger, more junior employees, who are often fast technical learners, but their effectiveness is hindered by having to constantly appeal to their boss’ boss’ boss about every fork in the road. Middle-sized organizations are typically advantaged here because they have enough staff to dedicate to managing the rollout, yet few enough layers of bureaucracy that such a person can act with authority.

Staffing

Technical expertise is critical when it comes to managing software implementations. Too often, technical duties are foisted upon under-prepared (or less-than-willing) staffers. This may be a reality in an era of constrained budgets, but asking experts in one thing to operate outside of their wheelhouse is another recipe for burnout. In the software industry, we conduct technical exams for all new hires. We would be thrilled to see the practice extended across the ICT4D space, even for roles that don’t involve programming but do involve managing technical products. Even so, there’s a certain aspect of the ideal implementation lead that comes down to personality and resourcefulness. The most successful teams we work with have at least one person who has the willingness and the ability to do whatever it takes to make a new system work. Call it ‘ownership,’ call it a ‘can-do’ attitude, but whatever it is, it works!

Timing and resource allocation

Change management is hard, and introducing a new system requires a lot of work up front. There’s a lot that headquarters personnel can do to unburden project staff (configuring the system, developing internal guidance and policies, etc.), but sometimes it’s better to involve project staff directly and early. When project staff are involved in the system configuration and decision-making process, we’ve seen them demonstrate more ownership of the system and less resentment of “another thing coming down from headquarters.” System setup and configuration can also be a training opportunity, further developing internal capacity across the organization. Changing systems requires conversations across the entire org chart; well-designed software can facilitate those conversations. But even when implementers do everything right, they should always expect challenges, plan for change management, and adopt an agile approach to managing a system rollout.

Good, cheap, fast: pick THREE!

As we said, there are ways to balance these three dimensions. We’ve managed to strike a successful balance in this partnership because we understand the incentives, constraints, and priorities of our counterpart. The software as a service (SaaS) model is instrumental here because it ensures software is well-suited to multiple clients across the industry (good), more affordable than custom builds (cheap), and immediately available on day one (fast). The implicit tradeoff is that no one client can control the product roadmap, but when each and every customer has a say, the end product represents the collective wisdom, best practice, and feedback of everyone. It may not be perfectly tailored to each and every client’s preferences, but in the end, that’s usually a good thing..

Join us for MERL Tech DC, Sept 5-6th!

MERL Tech DC: Taking Stock

September 5-6, 2019

FHI 360 Academy Hall, 8th Floor
1825 Connecticut Avenue NW
Washington, DC 20009

We gathered at the first MERL Tech Conference in 2014 to discuss how technology was enabling the field of monitoring, evaluation, research and learning (MERL). Since then, rapid advances in technology and data have altered how most MERL practitioners conceive of and carry out their work. New media and ICTs have permeated the field to the point where most of us can’t imagine conducting MERL without the aid of digital devices and digital data.

The rosy picture of the digital data revolution and an expanded capacity for decision-making based on digital data and ICTs has been clouded, however, with legitimate questions about how new technologies, devices, and platforms — and the data they generate — can lead to unintended negative consequences or be used to harm individuals, groups and societies.

Join us in Washington, DC, on September 5-6 for this year’s MERL Tech Conference where we’ll be taking stock of changes in the space since 2014; showcasing promising technologies, ideas and case studies; sharing learning and challenges; debating ideas and approaches; and sketching out a vision for an ideal MERL future and the steps we need to take to get there.

Conference strands:

Tech and traditional MERL:  How is digital technology enabling us to do what we’ve always done, but better (consultation, design, community engagement, data collection and analysis, databases, feedback, knowledge management)? What case studies can be shared to help the wider sector learn and grow? What kinks do we still need to work out? What evidence base exists that can support us to identify good practices? What lessons have we learned? How can we share these lessons and/or skills with the wider community?

Data, data, and more data: How are new forms and sources of data allowing MERL practitioners to enhance their work? How are MERL Practitioners using online platforms, big data, digitized administrative data, artificial intelligence, machine learning, sensors, drones? What does that mean for the ways that we conduct MERL and for who conducts MERL? What concerns are there about how these new forms and sources of data are being used and how can we address them? What evidence shows that these new forms and sources of data are improving MERL (or not improving MERL)? What good practices can inform how we use new forms and sources of data? What skills can be strengthened and shared with the wider MERL community to achieve more with data?

Emerging tools and approaches: What can we do now that we’ve never done before? What new tools and approaches are enabling MERL practitioners to go the extra mile? Is there a use case for blockchain? What about facial recognition and sentiment analysis in MERL? What are the capabilities of these tools and approaches? What early cases or evidence is there to indicate their promise? What ideas are taking shape that should be tried and tested in the sector? What skills can be shared to enable others to explore these tools and approaches? What are the ethical implications of some of these emerging technological capabilities?

The Future of MERL: Where should we be going and what should the future of MERL look like? What does the state of the sector, of digital data, of technology, and of the world in which we live mean for an ideal future for the MERL sector? Where do we need to build stronger bridges for improved MERL? How should we partner and with whom? Where should investments be taking place to enhance MERL practices, skills and capacities? How will we continue to improve local ownership, diversity, inclusion and ethics in technology-enabled MERL? What wider changes need to happen in the sector to enable responsible, effective, inclusive and modern MERL?

Cross-cutting themes include diversity, inclusion, ethics and responsible data, and bridge-building across disciplines.

Submit your session ideas, register to attend the conference, or reserve a demo table for MERL Tech DC now!

You’ll join some of the brightest minds working on MERL across a wide range of disciplines – evaluators, development and humanitarian MERL practitioners, small and large non-profit organizations, government and foundations, data scientists and analysts, consulting firms and contractors, technology developers, and data ethicists – for 2 days of in-depth sharing and exploration of what’s been happening across this multidisciplinary field and where we should be heading.

MERL for Blockchain Interventions: Integrating MERL into Token Design

Guest post by Michael Cooper, Mike is a Senior Social Scientist at Emergence who advises foreign assistance funders, service providers and evaluators on blockchain applications. He can be reached at emergence.cooper@gmail.com 

Tokens Could be Our Focus

There is no real evidence base about what does and does not work for applying blockchain technology to interventions seeking social impacts.  Most current blockchain interventions are driven by developers (programmers) and visionary entrepreneurs. There is little thinking in current blockchain interventions around designing for “social” impact (there is an over abundant trust in technology to achieve the outcomes and little focus on the humans interacting with the technology) and integrating relevant evidence from behavioral economics, behavior change design, human centered design, etc.

To build the needed evidence base, Monitoring, Evaluation, Research and Learning (MERL) practitioners will have to not only get to know the broad strokes of blockchain technology but the specifics of token design and tokenomics (the political economics of tokenized ecosystems).  Token design could become the focal point for MERL on blockchain interventions since:

  • If not all, the vast majority of blockchain interventions will involve some type of desired behavior change
  • The token provides the link between the ledger (which is the blockchain) and the social ecosystem created by the token in which the behavior change is meant to happen
  • Hence the token is the “nudge” meant to leverage behavior change in the social ecosystem while governing the transactions on the blockchain ledger. 

(While this blog will focus on these points, it will not go into a full discussion of what tokens are and how they create ecosystems. But there are some very good resources out there that do this which you can review at your leisure and to the degree that works for you.  The Complexity Institute has published a book exploring the various attributes of complexity and main themes involved with tokenomics while Outlier Ventures has published, what I consider, to be the best guidance on token design.  The Outlier Ventures guidance contains many of the tools MERL practitioners will be familiar with (problem analysis, stakeholder mapping, etc.) and should be consulted.) 

Hence it could be that by understanding token design and its requirements and mapping it against our current MERL thinking, tools and practices, we can develop new thinking and tools that could be the beginning point in building our much-needed evidence base. 

What is a “blockchain intervention”? 

As MERL practitioners we roughly define an “intervention” as a group of inputs and activities meant to leverage outcomes within a given eco-system.  “Interventions” are what we are usually mandated to asses, evaluate and help improve.

When thinking about MERL and blockchain, it is useful to think of two categories of “blockchain interventions”. 

1) Integrating the blockchain into MERL data collection, entry, management, analysis or dissemination practices and

2) MERL strategies for interventions using the blockchain in some way shape or form. 

Here we will focus on the #2 and in so doing demonstrate that while the blockchain is an innovative, potentially disruptive technology, evaluating its applications on social outcomes is still an issue of assessing behavior change against dimensions of intervention design. 

Designing for Behavior Change

We generally design interventions (programs, projects, activities) to “nudge” a certain type of behavior (stated as outcomes in a theory of change) amongst a certain population (beneficiaries, stakeholders, etc.).  We often attempt to integrate mechanisms of change into our intervention design, but often do not for a variety of reasons (lack of understanding, lack of resources, lack of political will, etc.).  This lack of due diligence in design is partly responsible for the lack of evidence around what works and what does not work in our current universe of interventions. 

Enter blockchain technology, which as MERL practitioners, we will be responsible for assessing in the foreseeable future.  Hence, we will need to determine how interventions using the blockchain attempt to nudge behavior, what behaviors they seek to nudge, amongst whom, when and how well the design of the intervention accomplishes these functions.  In order to do that we will need to better understand how blockchains use tokens to nudge behavior. 

The Centrality of the Token

We have all used tokens before.  Stores issue coupons that can only be used at those stores, we get receipts for groceries as soon as we pay, arcades make you buy tokens instead of just using quarters.  The coupons and arcade tokens can be considered utility tokens, meaning that they can only be used in a specific “ecosystem” which in this case is a store and arcade respectively.  The grocery store receipt is a token because it demonstrates ownership, if you are stopped on the way out the store and you show your receipt you are demonstrating that you now have rights to ownership over the foodstuffs in your bag. 

Whether you realize it or not at the time, these tokens are trying to nudge your behavior.  The store gives you the coupon because the more time you spend in their store trying to redeem coupons, the greatly likelihood you will spend additional money there.  The grocery store wants you to pay for all your groceries while the arcade wants you to buy more tokens than you end up using. 

If needed, we could design MERL strategies to assess how well these different tokens nudged the desired behaviors. We would do this, in part, by thinking about how each token is designed relative to the behavior it wants (i.e. the value, frequency and duration of coupons, etc.).

Thinking about these ecosystems and their respective tokens will help us understand the interdependence between 1) the blockchain as a ledger that records transactions, 2) the token that captures the governance structures for how transactions are stored on the blockchain ledger as well as the incentive models for 3) the mechanisms of change in the social eco-system created by the token. 

Figure #1:  The inter-relationship between the blockchain (ledger), token and social eco-system

Token Design as Intervention Design  

Just as we assess theories of change and their mechanisms against intervention design, we will assess blockchain based interventions against their token design in much the same way.  This is because blockchain tokens capture all the design dimensions of an intervention; namely the problem to be solved, stakeholders and how they influence the problem (and thus the solution), stakeholder attributes (as mapped out in something like a stakeholder analysis), the beneficiary population, assumptions/risks, etc. 

Outlier Ventures has adapted what they call a Token Utility Canvas as a milestone in their token design process.  The canvas can be correlated to the various dimensions of an evaluability assessment tool (I am using the evaluability assessment tool as a demonstration of the necessary dimensions of an interventions design, meaning that the evaluability assessment tool assesses the health of all the components of an intervention design).  The Token Utility Canvas is a useful milestone in the token design process that captures many of the problem diagnostic, stakeholder assessment and other due diligence tools that are familiar to MERL practitioners who have seen them used in intervention design.  Hence token design could be largely thought of as intervention design and evaluated as such.

Table#1: Comparing Token Design with Dimensions of Program Design (as represented in an Evaluability Assessment)

This table is not meant to be exhaustive and not all of the fields will be explained here but in general, it could be a useful starting point in developing our own thinking and tools for this emerging space. 

The Token as a Tool for Behavior Change

Coming up with a taxonomy of blockchain interventions and relevant tokens is a necessary task, but all blockchains that need to nudge behavior will have to have a token.

Consider supply chain management.  Blockchains are increasingly being used as the ledger system for supply chain management.  Supply chains are typically comprised of numerous actors packaging, shipping, receiving, applying quality control protocols to various goods, all with their own ledgers of the relevant goods as they snake their way through the supply chain.  This leads to ample opportunities for fraud, theft and high costs associated with reconciling the different ledgers of the different actors at different points in the supply chain.  Using the blockchain as the common ledger system, many of these costs are diminished as a single ledger is used with trusted data, hence transactions (shipping, receiving, repackaging, etc.) can happen more seamlessly and reconciliation costs drop.

However even in “simple” applications such as this there are behavior change implications. We still want the supply chain actors to perform their functions in a manner that adds value to the supply chain ecosystem as a whole, rewarding them for good behavior within the ecosystem and punishing for bad.

What if those shippers trying to pass on a faulty product had already deposited a certain value of currency in an escrow account (housed in a smart contract on the blockchain)? Meaning that if they are found to be attempting a prohibited behavior (passing on faulty products) they surrender a certain amount automatically from the escrow account in the blockchain smart contract.  How much should be deposited in the escrow account?  What is the ratio between the degree of punishment and undesired action?  These are behavior questions around a mechanism of change that are dimensions of current intervention designs and will be increasingly relevant in token design.

The point of this is to demonstrate that even “benign” applications of the blockchain, like supply chain management, have behavior change implications and thus require good due diligence in token design.

There is a lot that could be said about the validation function of this process, who validates that the bad behavior has taken place and should be punished or that good behavior should be rewarded?  There are lessons to be learned from results based contracting and the role of the validator in such a contracting vehicle.  This “validating” function will need to be thought out in terms of what can be automated and what needs a “human touch” (and who is responsible, what methods they should use, etc.).   

Implications for MERL

If tokens are fundamental to MERL strategies for blockchain interventions, there are several critical implications:

  • MERL practitioners will need to be heavily integrated into the due diligence processes and tools for token design
  • MERL strategies will need to be highly formative, if not developmental, in facilitating the timeliness and overall effectiveness of the feedback loops informing token design
  • New thinking and tools will need to be developed to assess the relationships between blockchain governance, token design and mechanisms of change in the resulting social ecosystem. 

The opportunity cost for impact and “learning” could go up the less MERL practitioners are integrated into the due diligence of token design.  This is because the costs to adapt token design are relatively low compared to current social interventions, partly due to the ability to integrate automated feedback. 

Blockchain based interventions present us with significant learning opportunities due to our ability to use the technology itself as a data collection/management tool in learning about what does and does not work.  Feedback from an appropriate MERL strategy could inform decision making around token design that could be coded into the token on an iterative basis.  For example as incentives of stakeholder’s shift (i.e. supply chain shippers incur new costs and their value proposition changes) token adaptation can respond in a timely fashion so long as the MERL feedback that informs the token design is accurate.

There is need to determine what components of these feedback loops can be completed by automated functions and what requires a “human touch”.  For example, what dimensions of token design can be informed by smart infrastructure (i.e. temp gauges on shipping containers in the supply chain) versus household surveys completed by enumerators?  This will be a task to complete and iteratively improve starting with initial token design and lasting through the lifecycle of the intervention.  Token design dimensions, outlined in the Token Utility Canvas, and decision-making will need to result in MERL questions that are correlated to the best strategy to answer them, automated or human, much the same as we do now in current interventions. 

While many of our current due diligence tools used in both intervention and evaluation design (things like stakeholder mapping, problem analysis, cost benefit analysis, value propositions, etc.), will need to be adapted to the type of relationships that are within a tokenized eco-systems.  These include the relationships of influence between the social eco-system as well as the blockchain ledger itself (or more specifically the governance of that ledger) as demonstrated in figure #1.  

This could be our, as MERL practitioners, biggest priority.  While blockchain interventions could create incredible opportunities for social experimentation, the need for human centered due diligence (incentivizing humans for positive behavior change) in token design is critical.  Over reliance on the technology to drive social outcomes is already a well evidenced opportunity cost that could be avoided with blockchain-based solutions if the gap between technologists, social scientists and practitioners can be bridged.    

Can digital tools be used to help young mothers in Kenya form new habits?

Guest post from Haanim Galvaan, Content Designer at Every1Mobile

A phone is no longer just a phone. It’s your connection to the rest of the world, it’s your personal assistant, and now, it’s your best friend who gives you encouragement and reinforcement for your good habits.

At least that’s what mobile phones have become for those who make use of habit-boosting apps.

If you’re trying to quit smoking and want to build a streak of puff-free days, the HabitBull app can help you do that. Want to establish a habit in your team that makes use of social accountability? Try Habitica. Do you want positive reinforcement for your activities in a motivational, rewarding voice? Productive is the app for that.

But what if you’re a young mum, living in the urban slums of Nairobi and you want to improve the health and wellbeing of your children? Try U Afya’s 10-Day Challenge.

U Afya is an online community for young mothers and mothers-to-be to learn about topics related to health, hygiene and family life. The site takes a holistic approach to giving young mothers the knowledge and confidence they need to enact certain healthy behaviours. It’s a place to discuss, give and receive advice, take free online courses, and now, to establish good habits with a custom-built habit tracking tool.

The 10-Day Handwashing Challenge was launched using new habit-tracking functionality. Users were encouraged to perform an activity related to handwashing each day, e.g. wash your hands with soap for 20 seconds. The challenges were formulated around the Lifebuoy “5 Key Moments” model. Participants were required to log their activity on the site by completing a survey.

Each day the site fed users a different hygiene-related tip, as well as links to additional content. At the end of the challenge, users were pushed to take a pledge and make a commitment to handwashing.

U Afya’s Habit Tracker is different from other habit boosting apps in that it is not an app! It has been built onto a low-data usage site that has been optimised for the data-sensitive target audience in the Nairobi slums. The tracker provides a rich, visual experience, which makes use of simple functionality compatible on both feature phone and smartphone.

We created a sense of urgency.

Users were required to log their activity for 10 days within a 30-day period. Attaching a “deadline” added a measure of urgency to the activity. There is no space for procrastination. The message is: establish your habit now or you never will!

It is based on behaviour change levers.

The 10-Day Handwashing Challenge and its accompanying content around the site were all based on the behaviour change approach employed by Lifebuoy in Way of Life, namely Awareness, Commitment, Reinforcement and Reward.

The approach was executed in the following ways:

Awareness: Introducing the handwashing theme with engaging, educational content that linked to and from the 10-Day Handwashing Challenge:

  • Diseases caused by lack of handwashing (article)
  • 5 Tips for washing your hands correctly (article)
  • Global Handwashing Day! – The 5 Key times to wash our hands (article)
  • How much do you know about handwashing? (quiz)

Commitment: Encouraging users to take the Handwashing Pledge

Reinforcement: Habit tracker, come back to self-report your daily activity

Reward: Participants stood the chance to win a hygiene gift bag

Contents of the hygiene gift bag given to 5 winners.

The results

86 users started the challenge and 26 users completed it within the 30-day challenge period. That makes a completion rate of 30% overall. Considering that users had to return to the challenge 10 times, the response rate is quite high.

The biggest drop-off happened between Day 1 and Day 2, with 28 users falling away and drop-off rates decreased gradually throughout the 10 days. The graph below shows that most users who made it to day 5 ended up completing the challenge. Only 11 users dropped off between Day 5 and Day 10.

26 out of 86 users created a habit.

In addition to participatory data, additional feedback was gathered by interspersing survey questions into the challenge. This additional questioning determined that 91% of challenge-takers feel they can afford to buy soap for their families.

Feedback:

Users had overwhelmingly positive feedback about the challenge.

“It was so educating and hygienically I have improved. It’s now a routine to me, washing hands in any case”

Learnings:

Keep it simple

It’s not always necessary to create a fancy app to push a new activity. The U Afya 10-Day Challenge was built on a platform that users are already familiar with. By building it into their current environment, it offered them something new and exciting on their visit.

Users were required to do one thing each day and report it with one action i.e. taking a single-question survey. Requiring minimal effort from your users can maximise uptake.

Overall the approach was simplicity. Simplicity in the design of the functionality, simplicity in the daily action and simplicity in creating a habit.

With this approach the U Afya 10-Day Handwashing Challenge helped 26 young mothers to create a new habit of washing their hands every day at key moments.

Conclusion:

This first iteration of U Afya’s 10-Day Handwashing Challenge was a pilot, but the results suggest that it is possible to use low-cost, low-tech means to encourage habit formation. It is also possible for sophisticated behaviour change theory and practice to reach some of the most vulnerable groups, using the very phones they have in their hands.

It is also a useful tool to help us to understand the impact of our behaviour change campaigns in the real world.

Next steps

All the user feedback and learnings mentioned above will be analysed to understand how the approach can be strengthened to reach even more people, increase compliance and and encourage positive habit creation.

Oops! Satellite Imagery Cannot Predict Human Development Indicators

Guest post from Wayan Vota

In many developing country environments, it is difficult or impossible to obtain recent, reliable estimates of human development. Nationally representative household surveys, which are the standard instrument for determining development policy and priorities, are typically too expensive to collect with any regularity.

Recently, however, researchers have shown the potential for remote sensing technologies to provide a possible solution to this data constraint. In particular, recent work indicates that satellite imagery can be processed with deep neural networks to accurately estimate the sub-regional distribution of wealth in sub-Saharan Africa.

Testing Neural Networks to Process Satellite Imagery

In the paper, Can Human Development be Measured with Satellite Imagery?, Andrew Head, Mélanie Manguin, Nhat Tran, and Joshua Blumenstock explore the extent to which the same approach – of using convolutional neural networks to process satellite imagery – can be used to measure a broader set of human development indicators, in a broader range of geographic contexts.

Their analysis produces three main results:

  • They successfully replicate prior work showing that satellite images can accurately infer a wealth-based index of poverty in sub-Saharan Africa.
  • They show that this approach can generalize to predicting poverty in other countries and continents, but that the performance is sensitive to the hyperparameters used to tune the learning algorithm.
  • They find that this approach does not trivially generalize to predicting other measures of development such as educational attainment, access to drinking water, and a variety of health-related indicators.

This paper shows that while satellite imagery and machine learning may provide a powerful paradigm for estimating the wealth of small regions in sub-Saharan Africa, the same approach does not trivially generalize to other geographical contexts or to other measures of human development.

In this assessment, it is important to emphasize what they mean by “trivially,” because in truth the point they are making is somewhat circumspect. Specifically, what they have shown is that the exact framework—of retraining a deep neural network on night-lights data, and then using those features to predict the wealth of small regions in sub-Saharan Africa—cannot be directly applied to predicting arbitrary indicators in any country with uniformly good results.

This is an important point to make because absent empirical evidence to the contrary, it is likely that policymakers eager to gain quick access to micro-regional measurements of development might be tempted to do exactly what they have done in this paper, without paying careful attention to the thorny issues of generalizability that they have uncovered in this analysis.

It is not the researchers’ intent to impugn the potential for related approaches to provide important new methods for measuring development, but rather to say that such efforts should proceed with caution, and with careful validation.

Why Satellite Imagery Might Fail to Predict Development

The results showed that while some indicators like wealth and education can be predicted reasonably well in many countries, other development indicators are much more brittle, exhibiting high variance between and within countries, and others perform poorly everywhere.

Thus it is useful to distinguish between two possible reasons why the current approach may have failed to generalize to these measures of development.

  • It may be that this exercise is fundamentally not possible, and that no amount of additional work would yield qualitatively different results.
  • It is quite possible that their investigation to date has been not been sufficiently thorough, and that more concerted efforts could significantly improve the performance of these models

Insufficient “signal” in the satellite imagery.

The researchers’ overarching goal is to use information in satellite images to measure different aspects of human development. The premise of such an approach is that the original satellite imagery must contain useful information about the development indicator of interest. Absent of such a signal, no matter how sophisticated our computational model, the model is destined to fail.

The fact that wealth specifically can be measured from satellite imagery is quite intuitive. For instance, there are visual features one might expect correlate with wealth—large buildings, metals roofs, nicely paved roads, and so forth.

It may be the case that other measures of human development cannot be seen from above. For instance, it may be a fundamentally difficult task to infer the prevalence of malnutrition from satellite imagery, if the regions with high and low rates of malnutrition appear similar, even though they hypothesize that these indices should correlate with wealth index.

They were, however, surprised by the relative under-performance of models designed to predict access to drinking water, as they expected the satellite-based features to capture proximity to bodies of water, which in turn might affect access to drinking water.

(Over-) reliance on night-lights may not generalize.

Their reliance on night lights might help explain why some indicators were predicted less successfully in some countries than others. An example in their study includes Nepal, where the accuracy in predicting access to electricity was much lower (R2 = 0.24) than in the other countries (R2 = 0.69, 0.44, and 0.54 in Rwanda, Nigeria, and Haiti, respectively).

This may be partly due to the fact that Nepal has a very low population density (half as dense as Haiti and Rwanda) and very high levels of electrification (twice as high as Haiti, Rwanda, and Nigeria).

If the links between electrification, night-lights, and daytime imagery are broken in Nepal, they would expect their modeling approach to fail. More generally, they expect that when a development indicator does not clearly relate to the presence of nighttime lights, it may be unreasonable to expect good performance from the transfer learning process as a whole.

Deep learning vs. supervised feature engineering.

In this paper, the researchers focused explicitly on using the deep/transfer learning approach to extracting information from satellite images. While powerful, it is also possible that other approaches to feature engineering might be more successful than the brute force approach of the convolutional neural network.

For instance, Gros and Tiecke have recently shown how hand-labeled features from satellites, and specifically information about the types of buildings that are present in each image, can be quite effective in predicting population density. Labeling images in this manner is resource intensive, and they did not have the opportunity to test such approaches.

However, they believe that careful encoding of the relevant information from satellite imagery would likely bolster the performance of specific prediction tasks.

Neural Networks Can Still Process Satellite Imagery

Broadly, the researchers remain optimistic that future work using novel sources of data and new computational algorithms can engender significant advances in the measurement of human development.

However, it is imperative that such work proceeds carefully, with appropriate benchmarking and external calibration. Promising new tools for measurement have the potential to be implemented widely, possibly by individuals who do not have extensive expertise in the underlying algorithms.

Applied blindly, these algorithms have the potential to skew subsequent policy in unpredictable and undesirable ways. They view the results of this study as a cautionary example of how a promising algorithm should not be expected to work “off the shelf” in a context that is significantly different from the one in which it was originally developed.

The post Oops! Satellite Imagery Cannot Predict Human Development Indicators appeared first on ICTworks.

Tips for Increasing Mobile Survey Response Rates

Guest post from James Mwangi, the Deployment Lead at Echo Mobile. This post was originally published on the Echo Mobile blog.

Users often ask us, “what response rate will I get from my survey?”, or “how can I increase my survey’s response rate?”

The truth is …. it depends!

Response rates depend on your organisation, your respondents, and their motivation for responding. Most of our users assume that financial incentives are the most effective for stimulating engagement, and indeed research shows they can enhance response rates. But they are not always necessary and rarely sufficient. The design of your survey — its structure, tone and content — is equally important and often ignored.

In a recent SMS survey conducted for the third time on behalf of a UN agency and government ministry, Echo’s Deployment team demonstrated that minor adjustments to survey design can drastically increase response rates, regardless of financial incentives.

In May 2017, the team sent a survey with a KES 35 airtime incentive to 25,000 Kenyan government employees, 21% of whom completed it. In October 2017, Deployment sent the same survey to the same group with the same airtime incentive. This time only 16% completed it. In February 2018, we sent the survey again, with minor design tweaks and no financial incentives. The completion rate nearly doubled to 29%.

Win-win! Our client saved money by dropping the airtime transfers and got more results. More of their beneficiaries were able to engage and provide critical feedback. Here are the design changes we made to the survey. Consider them next time you’re using Echo for Monitoring and Evaluation (M&E):

Personalize the content

The Echo Platform allows users to personalize messages using standard fields — basic, common data points like name, ID and location, which can be stored in Echo contact profiles and can be integrated into large-scale messages.

Unlike in 2017, in the 2018 version of the UN survey, our Deployment team added the NAME field to the first SMS. As a result, all recipients immediately saw their name before automatically progressing to the first question. This builds a sense of trust, captures recipients’ attention, and is less likely to be mistaken for spam.

And you don’t need to to just stick to standard fields! Any prior response to a survey can be stored as a custom field. If you ask recipients their favorite football team and store the response as a custom field, the next time you send them SMS you can personalize your content even further: “Hi [NAME]. Hope [FOOTBALL_TEAM] is doing well this week….”

Skip the “opt-in”

The Echo platform’s survey builder allows you to add an invitation message as the first SMS sent to a contact. To move from this intro message on to the first question, recipients must “opt-in” by responding to this initial message with something like “ok” or “begin” (any word/number will do).

Sample survey designs, before optimisation.

Invitation messages are extremely useful. They help you be polite, introduce yourself if the recipient doesn’t know you, and say what your survey is about and why and how they can proceed (more below on instructions!). But they can also create a barrier to completion.

Observing that many respondents had failed to opt in to our 2017 survey, for the 2018 version of the survey we dropped the invitation message. Instead, we took that content and sent it as an info question, which, by design, automatically progress to the next question, regardless of a response or not.

Optimised survey ; personalised, does not require the respondent to opt in, and has clear instructions on how to reply.

Removing the opt-in invitation message won’t always be an option, but in this case, respondents were employees of our client and had been engaging on their shortcode for years. In some ways the intro message just added an extra step for them, as they had already provided their phone numbers and given consent to allow our client to engage them. Personally Identifiable Information (PII) is also not collected, nor shared and the respondents have an option to unsubscribe entirely from our system by sending the word STOP at any time, an option that has been communicated to them repeatedly.

In other cases, users might be suspicious of the opt-in request. Many Kenyans have encountered premium SMS services that push messages to unknowing respondents and deduct airtime from them once they opt in. Messaging with Echo is totally free for your respondents, but consider how they might react to an opt-in intro message, and design your survey accordingly!

Give clear Instructions

Keeping in mind SMS character limit, our Deployment team added quick instructions at the end of each question in the 2018 survey. These guided the respondents on how to answer specific question types. In the prior 2017 versions, each SMS had only contained the question, without instructions on how to answer:

Send reminders

For the 2017 surveys, we automated a reminder, sent 24 hours after the survey to those who had not yet started or completed it. For the 2018 version we added a second reminder, sent 12 hours later.

Reminders like these nudge contacts who are willing to respond to the survey but may have become distracted before completing it. This is especially true for long surveys like the one we have been deploying for the UN, which risk respondent fatigue. Reminders are a subtle way of urging them to finish the survey. Better yet — keep it lean!

So, what’s the take away here?

While research on the potential impact of financial incentives is clear, no amount of money or airtime can make up for suboptimal survey design!

Monetary rewards can move the response rate in the margins, but not always, and only if you get the design right first. Financial incentives are complementary to a well designed survey that has useful and clear content, an efficient structure, and a personal tone.

That said, non-financial incentives — the broader reasons why your contacts might want to engage with you at all — are an extremely important consideration. Not everyone’s time and information can be bought.

Consider for your next survey or engagement what informational, relational, or emotional incentives you might be explicitly or implicitly offering up front. As with any relationship, both sides ultimately need to feel like there is some benefit to the commitment. We’ll blog more about this idea soon!

Want to learn more from the Echo Deployment team? We consult on mobile engagement strategy and techniques, and can provide implementation support for survey creation, setup, optimization, deployment, and tracking on the Echo Platform.

Early Concepts for Designing and Evaluating Blockchain Interventions for Behavior Change

Guest post by Michael Cooper, a former DoS, MCC Associate Director for Policy and Evaluation who now runs Emergence.  Mike advises numerous donors, private clients and foundations on program design, MEL, adaptive management and other analytical functions.

International development projects using the blockchain in some way are increasing at a rapid rate and our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.  Given that blockchain is a highly disruptive technology, these un-intended consequences could be significant, creating a higher urgency to generate the evidence to guide how we design and evaluate blockchain applications. 

Our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.

To inform this discussion, Emergence has put out a working paper that outlines 1.) what the blockchain is, 2.) how it can be used to leverage behavior change outcomes in international development projects and 3.) the implications for how we could design and evaluate blockchain based interventions.  The paper utilizes systems and behaviorism principles in comparing how we currently design behavior change interventions to how we could design/evaluate the same interventions using the blockchain.  This article summarizes the main points of the paper and its conclusions to generate discussion around how to best produce the evidence we need to fully realize the potential of blockchain interventions for social impact.

Given the scope of possibilities surrounding the blockchain, both in how it could be used and in the impact it could leverage, the implications for how MEL is conducted are significant.  The time is long gone where value adding MEL practitioners are not involved in intervention design.  Blockchain based interventions will require additional integration of MEL skill sets in the early design phases since so much will need to be “tested” to determine what is and is not working.  While rigid statistical evaluations will needed for some of these blockchain based interventions, the level of complexity involved and the lack of an evidence base indicate that more flexible, adaptive and more formative MEL approaches will be needed.  The more these approaches are proactive and involved in intervention design, the more frequent and informative the feedback loops will be into our evidence base. 

The Blockchain as a Decentralizing Technology

At its core, the blockchain is just a ledger but the importance of ledgers in how society functions cannot be understated.  Ledgers, and the control of them, are crucial in how supply chains are managed, financial transactions are conducted, how data is shared, etc.  Control of ledgers is a primary factor in limiting access to life changing goods and services, especially for the worlds’ poor. In part, the discussion over decentralization is essentially a discussion over who owns and how ledgers are managed. 

Decentralization has been a prominent theme in international development and there is strong evidence of its positive impact across various sectors, especially regarding local service delivery.  One of the primary value adds of decentralization is empowering those further from traditional concentrations of power to have more authority over the problems that impact them.  As a decentralizing technology, the blockchain holds a lot of potential in reaching these same impacts from decentralization (empowerment, etc.) in a more efficient and effective manner partly due to its ability to better align interests around common problems.  With better aligned interests, less resources (inputs) are needed to try and facilitate a desired behavior change. 

Up until now, efforts of international development actors have focused on “nudging” behavior change amongst stakeholders and in very rare cases, such as in results based financing, give loosely defined parameters to implementers with less emphasis on the manner in which outcomes are achieved.  Both of these approaches are relevant in the design and testing of blockchain based interventions but they will be integrated in unique new ways that will require new thinking and skills sets amongst practitioners. 

Current Designing and Evaluating for Behavior Change

MEL usually starts with the relevant theory of change, namely what mechanisms bring about targeted behavior change and how.  Recent years have seen a focus on how behavior change is achieved through an understanding of mindsets and how they can be nudged to achieve a social outcome.  However the international development space has recognized the limitations of designing interventions that attempt to nudge behavior change.  These limitations center around the level of complexity involved, the inability to recognize and manage this complexity and lack of awareness about the root causes of problems.  Hence the rise in things like results based financing where the type of prescribed top-down causal pathway (usually laid out in a theory of change) is not as heavily emphasized as in more traditional interventions.  Donors using this approach can still mandate certain principles of implementation (such as the inclusion of vulnerable populations, environmental safeguards, timelines, etc.) but there is much more flexibility to create a causal pathway to achieve the outcome. 

Or, for example, take the popular PDIA approach where the focus is on iteratively identifying and solving problems encountered on the pathway to reform.  These efforts do not start with a mandated theory of change, but instead start with generally described targeted outcomes and then the pathway to those outcomes is iteratively created, similar to what Lant Pritchett has called “crawling the design space”.  Such an approach has large overlaps with adaptive management practices and other more integrative MEL frameworks and could lend themselves to how blockchain based interventions are designed, implemented and evaluated. 

How the Blockchain Could Achieve Outcomes and Implications for MEL

Because of its decentralizing effects, any theory of change for a blockchain based intervention could include some possible common attributes that influence how outcomes are achieved:

  • Empowerment of those closest to problems to inform the relevant solutions
  • Alignment of interests around these solutions
  • Alleviation of traditional intermediary services and relevant third party actors

Assessing these three attributes, and how they influence outcomes, could be the foundation of any appropriate MEL strategy for a blockchain-based intervention.  This is because these attributes are the “value add” of a blockchain-based intervention.  For example, traditional financial inclusion interventions may seek to extend financial services of a bank to rural areas through digital money, extension agents, etc.  A blockchain-based solution, however, may cut out the bank entirely and empower local communities to receive financial services from completely new providers from anywhere in the world on much more affordable terms in and in a much more convenient manner.  Such a solution could see an alignment of interests amongst producers and consumers of these services since the new relationships are mutually serving.  Because of this alignment there is a less of a need, or even less of a benefit, of having donors script out the causal pathway for the outcomes to be achieved.  Because of this alignment of interests, those closest to the problem(s) and solutions can work it out because it is in their interest to do so. 

Hence while a MEL framework for such a project could still use more standardized measures around outcomes like increased access to financial services and could even use statistical methods to evaluate questions around attributable changes in poverty status; there will need to be adaptive and formative MEL that assess the dynamics of these attributes given their criticality to whether and how outcomes could be achieved.  The dynamics between these attributes and the surrounding social eco-system have the potential to be very fluid (going back to the disruptive nature of blockchain technology), hence flexible MEL will be required to respond to new trends as they emerge. 

Table: Blockchain Intervention Attributes and the Skill Sets to Assess Them

Blockchain Attributes Possible MEL Approaches
Empowerment of those closest to problems to inform the relevant solutions   Problem driven design and MEL approach, stakeholder mapping (to identify relevant actors) Decentralization focused MEL (MEL that focuses on outcomes associated with decentralization)
Alignment of interests Political economy analysis to identify incentives and interests Adaptive MEL to assess shifting alignment of interest between various actors
Alleviation of traditional intermediary services Political economy analysis to inform risk mitigation strategy for potential spoilers and relevant MEL

While there will need to be standard accountability and other uses, feedback from an appropriate MEL strategy could have two primary end uses in a blockchain based intervention: governance and trust.

The Role of Governance and Trust

Blockchain governance sets outs the rules for how consensus (ie. agreement) is achieved for deciding what transactions are valid on a blockchain.  While this may sound mundane it is critical for achieving outcomes since how the blockchain is governed decides how well those closest to the problems are empowered to identify and achieve solutions and aligned interests. Hence the governance framework for the blockchain will need to be informed by an appropriate MEL strategy.  A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions.  Closing this gap will be critical to assessing the cost effectiveness of blockchain based solutions over other solutions (ie. alternatives/cost benefit analysis tools) as well as maximizing impact. 

A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions. 

Another focus of an appropriate MEL strategy would be to facilitate trust in the blockchain-based solution amongst users much the same as other technology-led solutions like mobile money or pay as you go metering for service delivery.  This includes not only the digital interface between the user and the technology (a phone app, SMS or other interface) but other dimensions of “trust” that would facilitate uptake of the technology.  These dimensions of trust would be informed by an analysis of the barriers to uptake of the technology amongst intended users, given it could be an entirely new service for beneficiaries or an old service delivered in a new fashion.  There is already a good evidence base around what works in this area (ie. marketing and communication tools for digital financial services, assistance in completing registration paperwork for pay as you go metering, etc.). 

The Road Ahead

There is A LOT we need to learn and a short time to do it in before we feel the negative effects from a lack of preparedness.  This risk is heightened when you consider that the international development industry has a poor track record of designing and evaluating technology-led solutions (primarily due to the fact that these projects usually neglect uptake of the technology and operate on the assumption that the technology will drive outcomes instead of users using the technology as a tool to drive the outcomes). 

The lessons from MEL in results based financing could be especially informative to the future of evaluating blockchain-based solutions given their similarities in letting solutions work themselves out and the role of the “validator” in ensuring outcomes are achieved.  In fact the blockchain has already been used in this role in some simple output based programming. 

As alluded to, pre-existing MEL skill sets can add a lot of value to building an evidence base but MEL practitioners will need to develop a greater understanding of the attributes of blockchain technology, otherwise our MEL strategies will not be suited to blockchain based programming.

Mobile survey completion rates: Correlation versus causation

by Kim Rodgers, Software Engineer at Echo Mobile. Original post appeared on Medium.

Introduction

We hear the terms “correlation” and “causation” a lot, but what do they actually mean?

Correlation: defines how two variables relate with each other when they change. When one variable increases, the other may increase, decrease or remain the same. For example, when it rains more, people tend to buy more umbrellas.

Causation: implies that one variable causes another variable to change. For example, we can confidently conclude that more rain causes more people to acquire umbrellas.

In this post, I will explore the meaning of the terms and try to explain a way of deciding how they relate. I will use a real-world example to explore and explain.

Survey completion rate correlations

Echo Mobile helps organizations in Africa engage, influence, and understand their target audience via mobile channels. Our core product is a web-based SaaS platform that, among many other things, enables users to design, send and analyze the results of mobile surveys. Our users can deploy their surveys via SMS (Short Messaging Service), USSD (Unstructured Supplementary Service Data), IVR (Interactive Voice Response), and Android apps, but SMS is the most heavily used channel.

Surveys are key to our overall mission, as they give our users a tool to better understand their target audiences — usually their customers or beneficiaries. To optimize the effectiveness of this tool, one thing that we really wanted to do was identify key factors that lead to more people completing surveys sent by our users from the Echo platform. This would enable us to advise our users on how to get more value from our platform through better engagement and understanding of their audiences.

The completion rate of a survey is the percentage of people who complete a survey after being invited to take part in it. We came up with different factors that we thought could effect the completion rate of surveys:

  • post_incentive: The incentive (a small amount of money or airtime) offered after completing the survey
  • invite_day_of_month: The date of the month a respondent was invited to the survey
  • invite_day_of_the_week: The day of the week a respondent was asked to take part in the survey
  • invite_hour: The hour of the day the respondent was invited to the survey
  • num_questions: The number of questions in the survey
  • reminded: whether the respondent was reminded to complete the survey or not
  • channel: The manner in which the survey was done. These were either by use of SMS, USSD, IVR, web, or Android app. SMS is the most popular channel and accounts for over 90% of surveys
  • completion_rate: Of those invited to a survey, the percentage that completed

We used the performance of surveys deployed from the beginning of 2017 to August of 2017 to look for the correlations between the sample factors above. The correlations between the factors are shown in the table below. Since the focus was more on how the completion rate relates with other factors, I will focus on those relationships more.

The bigger the correlation magnitude, the stronger the correlation relationship. A positive correlation indicates that when one factor is increased the other should also increase. For a negative correlation value, the relationship is inverse. When one increases, the other decreases.

Correlations between different survey factors. completion_rate has the strongest correlation with invite_hour

The rows of the table are arranged in a descending order of the correlation between completion rate and other factors. Looking at the table, invite_hour with a positive correlation of 0.25 is the factor with strongest correlation with the completion rate. It is then followed by reminded while invite_day_of_the_month is the most negatively correlated with the completion_rate. The correlation between any other factors can also be obtained from the table, for example the correlation between number_of_questions and reminded is 0.05.

Survey completion causations?

The findings above can lead to incorrect conclusions if one is not careful. For example, a conclusion can be made that the invite hour with a correlation of 0.25 has the highest causal influence on the completion_rate of a survey. As a result, you might start trying to find the right time to send out surveys with the hope of getting more of them completed. With this mentality, it might be concluded that some invite hour is the optimum time to send out a survey. But that would be to hold to the (incorrect) idea that correlation implies causation.

The high correlation may mean that either one factor causes the other, the factors jointly cause each other, both factors are caused by the same separate third factor, or even that the correlation is as a result of coincidence.

We can, therefore, see that correlation does not always imply causation. With careful investigation, however, it is possible to more confidently conclude whether correlation implies that one variable causes the other.

How can we verify if correlation might imply causation?

1. Use statistically sound techniques to determine the relationship.

Ensure that you use statistically legitimate methods to find the correlation. These include:

  • use of variables that correctly quantify the relationship.
  • make sure there are no outliers .
  • ensure the sample is an appropriate representation of the population.
  • use of an appropriate correlation coefficient based on the scales of the relationship metrics.

2. Explain the relationships found

  • exposure always precedes the outcome. If A is supposed to cause B, check that A always occurs before B.
  • check if the relationship ties in with other existing theories.
  • check if the proposed relationship is similar to other relationships in related fields.
  • check if there is no other relationship that can explain the relationship. In the case above, a proper explanation for the headaches could be drinking instead of sleeping with shoes.

3. Validate the relationships

  • Conditions 1 and 2 above should be tested to determine if they are true or false. The common methods of testing are experiments and checking for consistency of the relationship. An experiment usually requires a model of the relationship, a testable hypothesis based on the model, incorporation of variance control measures, collection of suitable metrics for the relationship, and an appropriate analysis. Experiments done several times should lead to consistent conclusions.

We have not yet carried out these tests on our completion rate correlations. So we don’t yet know, for example, whether particular invite hours cause higher completion rates — only whether they are correlated.

Conclusion

We need to be careful before concluding that a particular relationship implies causation. It is generally better not to have a conclusion than to land on an incorrect one which might lead to wrong actions being taken!


The original version of this post was written by Rodgers Kim.  Kim works at Echo Mobile as a Software Engineer and is interested in data science and enjoys writing.