By Mala Kumar, GitHub Social Impact, Open Source for Good
I lead a program on the GitHub Social Impact team called Open Source for Good — detailed in a previous MERL Tech post and (back when mass gatherings in large rooms were routine) at a lightning talk at the MERL Tech DC conference last year.
Before joining GitHub, I spent a decade wandering around the world designing, managing, implementing, and deploying tech for international development (ICT4D) software products. In my career, I found open source in ICT4D tends to be a polarizing topic, and often devoid of specific arguments. To advance conversations on the challenges, barriers, and opportunities of open source for social good, my program at GitHub led a year-long research project and produced a culminating report, which you can download here.
One of the hypotheses I posed at the MERL Tech conference last year, and that our research subsequently confirmed, is that IT departments and ICT4D practitioners in the social sector* have relatively less budgetary decision-making power than their counterparts at corporate IT companies. This makes it hard for IT and ICT4D staff to justify the use of open source in their work.
In the past year, Open Source for Good has solidified its strategy around helping the social sector more effectively engage with open source. To that aim, we started the MERL Center, which brings together open source experts and MERL practitioners to create resources to help medium and large social sector organizations understand if, how, and when to use open source in their MERL solutions.**
With the world heading into unprecedented economic and social change and uncertainty, we’re more committed than ever at GitHub Social Impact to helping the social sector effectively use open source and to build on a digital ecosystem that already exists.
Thanks to our wonderful working group members, the MERL Center has identified its target audiences, fleshed out the goals of the Center, set up a basic content production process, and is working on a few initial contributions to its two working groups: Case Studies and Beginner’s Guides. I’ll announce more details in the coming months, but I am also excited to announce that we’re committing funds to get a MERL Center public-facing website live to properly showcase the materials the MERL Center produces and how open source can support technology-enabled MERL activities and approaches.
As we ramp up, we’re now inviting more people to join the MERL Center working groups! If you are a MERL practitioner with an interest in or knowledge of open source, or you’re an open source expert with an interest in and knowledge of MERL, we’d love to have you! Please feel free to reach out me with a brief introduction to you and your work, and I’ll help you get on-boarded. We’re excited to have you work with us!
*We define the “social sector” as any organization or company that primarily focuses on social good causes.
In their MERL Tech DC session on qualitative coding, Charles Guedenet and Anne Laesecke from IREX together with Danielle de Garcia of Social Impact offered an introduction to the qualitative coding process followed by a hands-on demonstration on using Excel and Dedoose for coding and analyzing text.
They began by defining content analysis as any effort to make sense of qualitative data that takes a volume of qualitative material and attempts to identify core consistencies and meanings. More concretely, it is a research method that uses a set of procedures to make valid inferences from text. They also shared their thoughts on what makes for a good qualitative coding method.
Their belief is that: it should
consider what is already known about the topic being explored
be logically grounded in this existing knowledge
use existing knowledge as a basis for looking for evidence in the text being analyzed
With this definition laid out, they moved to a discussion about the coding process where they elaborated on four general steps:
develop codes and a codebook
decide on a sampling plan
code your data
go back and do it again!
test for reliability
Developing codes and a codebook is important for establishing consistency in the coding process, especially if there will be multiple coders working on the data. A good way to start developing these codes is to consider what is already known. For example, you can think about literature that exists on the subject you’re studying. Alternatively, you can simply turn to the research questions the project seeks to answer and use them as a guide for creating your codes. Beyond this, it is also useful to go through the content and think about what you notice as you read. Once a codebook is created, it will lend stability and some measure of objectivity to the project.
The next important issue is the question of sampling. When determining sample size, though a larger sample will yield more robust results, one must of course consider the practical constraints of time, cost and effort. Does the benefit of higher quality results justify the additional investment? Fortunately, the type of data will often inform sampling. For example, if there is a huge volume of data, it may be impossible to analyze it all, but it would be prudent to sample at least 30% of it. On the other hand, usually interview and focus group data will all be analyzed, because otherwise the effort of obtaining the data would have gone to waste.
Regarding sampling method, session leads highlighted two strategies that produce sound results. One is systematic random sampling and the other is quota sampling–a method employed to ensure that the proportions of demographic group data are fairly represented.
Once these key decisions have been made, the actual coding can begin. Here, all coders should work from the same codebook and apply the codes to the same unit of analysis. Typical units of analysis are: single words, themes, sentences, paragraphs, and items (such as articles, images, books, or programs). Consistency is essential. A way to test the level of consistency is to have a 10% overlap in the content each coder analyzes and aim for 80% agreement between their coding of that content. If the coders are not applying the same codes to the same units this could either mean that they are not trained properly or that the code book needs to be altered.
Along a similar vein, the fourth step in the coding process is to test for reliability. Challenges in producing stable and consistent results in coding could include: using a unit of analysis that is too large for a simple code to be reliably applied, coding themes or concepts that are ambiguous, and coding nonverbal items. For each of these, the central problem is that the units of analysis leave too much room for subjective interpretation that can introduce bias. Having a detailed codebook can help to mitigate against this.
After giving an overview of the coding process, the session leads suggested a few possible strategies for data visualization. One is to use a word tree, which helps one look at the context in which a word appears. Another is a bubble chart, which is useful if one has descriptive data and demographic information. Thirdly, correlation maps are good for showing what sorts of relationships exist among the data. The leads suggested visiting the website stephanieevergreen.com/blog for more ideas about data visualization.
Finally, the leads covered low-tech and high-tech options for coding. On the low-tech end of the spectrum, paper and pen get the job done. They are useful when there are few data sources to analyze, when the coding is simple, and when there is limited tech literacy among the coders. Next up the scale is Excel, which works when there are few data sources and when the coders are familiar with Excel. Then the session leads closed their presentation with a demonstration of Dedoose, which is a qualitative coding tool with advanced capabilities like the capacity to code audio and video files and specialized visualization tools. In addition to Dedoose, the presenters mentioned Nvivo and Atlas as other available qualitative coding software.
Despite the range of qualitative content available for analysis, there are a few core principles that can help ensure that it is analyzed well, these include consistency and disciplined methodology. And if qualitative coding will be an ongoing part of your organization’s operations, there are several options for specialized software that are available for you to explore. [Click here for links and additional resources from the session.]
For this year’s MERL Tech DC, we teamed up to do a session on Responsible Data. Based on feedback from last year, we knew that people wanted less discussion on why ethics, privacy and security are important, and more concrete tools, tips and templates. Though it’s difficult to offer specific do’s and don’ts, since each situation and context needs individualized analysis, we were able to share a lot of the resources that we know are out there.
To kick off the session, we quickly explained what we meant by Responsible Data. Then we handed out some cards from Oxfam’s Responsible Data game and asked people to discuss their thoughts in pairs. Some of the statements that came up for discussion included:
Being responsible means we can’t openly share data – we have to protect it
We shouldn’t tell people they can withdraw consent for us to use their data when in reality we have no way of doing what they ask
Biometrics are a good way of verifying who people are and reducing fraud
Following the card game we asked people to gather around 4 tables with a die and a print out of the data lifecycle where each phase corresponded to a number (Planning = 1, collecting = 2, storage = 3, and so on…). Each rolled the die and, based on their number, told a “data story” of an experience, concern or data failure related to that phase of the lifecycle. Then the group discussed the stories.
For our last activity, each of us took a specific pack of tools, templates and tips and rotated around the 4 tables to share experiences and discuss practical ways to move towards stronger responsible data practices.
Responsible data policy, practices and evaluation of their roll-out
Oxfam released its Responsible Program Data Policy in 2015. Since then, they have carried out six pilots to explore how to implement the policy in a variety of countries and contexts. Emily shared information on these these pilots and the results of research carried out by the Engine Room called Responsible Data at Oxfam: Translating Oxfam’s Responsible Data Policy into practice, two years on. The report concluded that the staff that have engaged with Oxfam’s Responsible Data Policy find it both practically relevant and important. One of the recommendations of this research showed that Oxfam needed to increase uptake amongst staff and provide an introductory guide to the area of responsible data.
In response, Oxfam created the Responsible Data Management pack, (available in English, Spanish, French and Arabic), which included the game that was played in today’s session along with other tools and templates. The card game introduces some of the key themes and tensions inherent in making responsible data decisions. The examples on the cards are derived from real experiences at Oxfam and elsewhere, and they aim to generate discussion and debate. Oxfam’s training pack also includes other tools, such as advice on taking photos, a data planning template, a poster of the data lifecycle and general information on how to use the training pack. Emily’s session also encouraged discussion with participants about governance and accountability issues like who in the organisation manages responsible data and how to make responsible data decisions when each context may require a different action.
Nina shared early results of four case studies mSTAR is conducting together with Sonjara for USAID. The case studies are testing a draft set of responsible data guidelines, determining whether they are adequate for ‘on the ground’ situations and if projects find them relevant, useful and usable. The guidelines were designed collaboratively, based on a thorough review and synthesis of responsible data practices and policies of USAID and other international development and humanitarian organizations. To conduct the case studies, Sonjara, Nina and other researchers visited four programs which are collecting large amounts of potentially sensitive data in Nigeria, Kenya and Uganda. The researchers interviewed a broad range of stakeholders and looked at how the programs use, store, and manage personally identifiable data (PII). Based on the research findings, adjustments are being made to the guidelines. It is anticipated that they will be published in October.
Linda mentioned that a literature review of responsible data policy and practice has been done as part of the above mentioned mSTAR project (which she also worked on). The literature review will provide additional resources and analysis, including an overview of the core elements that should be included in organizational data guidelines, an overview of USAID policy and regulations, emerging legal frameworks such as the EU’s General Data Protection Regulation (GDPR), and good practice on how to develop guidelines in ways that enhance uptake and use. The hope is that both the Responsible Data Literature Review and the of Responsible Data Guidelines will be suitable for adopting and adapting by other organizations. The guidelines will offer a set of critical questions and orientation, but that ethical and responsible data practices will always be context specific and cannot be a “check-box” exercise given the complexity of all the elements that combine in each situation.
Check out this responsible data resource list, which includes additional tools, tips and templates. It was developed for MERL Tech London in February 2017 and we continue to add to it as new documents and resources come out. After a few years of advocating for ‘responsible data’ at MERL Tech to less-than-crowded sessions, we were really excited to have a packed room and high levels of interest this year!