Responsible Data Resource List

Curated by participants of MERL Tech.

Responsible data is:
The duty to ensure people’s rights to consent, privacy, security and ownership around the information processes of collection, analysis, storage, presentation and reuse of data, while respecting the values of transparency and openness.  (Responsible Data Forum, working definition, September 2014)

A slightly different definition (as outlined in work by Sonjara/mSTAR for USAID) considers responsible data as a balance between 1) responsible data use (e.g., being sure that development organizations actually use the data we collect/that have been collected and shared by others); 2) transparency and accountability (to those we collect data from, to those who fund data collection/use, to the public); and 3) data privacy and security when we are stewarding or using data we have collected.

DISCUSSIONS

The Responsible Data Forum and mailing list
The Responsible Data Forum is a collaborative effort to develop useful tools and strategies for dealing with the ethical, security and privacy challenges facing data-driven advocacy. It is a collaboration between Amnesty International, Aspiration, The Engine Room, Greenhost, HURIDOCS, Leiden University’s Peace Informatics Lab, Open Knowledge and Ushahidi.
https://responsibledata.io/
https://lists.theengineroom.org/lists/subscribe/responsible_data

How to Balance the Tension Between Open Data and Privacy and Security (2016) blogpost by Siobhan Green and Linda Raftree
Major opportunities for open, interoperable and shared data in international development include improved data for monitoring and evaluation and performance management; improved subnational data; creation of data “assets” by different audiences; re-use and validation of data. However, most of the pros are also cons, and vice versa. For example, open data can improve accountability but it can also increase liability. Tracking personally identifiable information can mean improved transparency but also greater vulnerability. This post provides a summary of a Technology Salon round table discussion.
https://www.ictworks.org/how-to-balance-the-tension-between-open-data-and-privacy-and-security/#.W1nd5dj0mgQ

POLICY, STANDARDS & FRAMEWORKS

Oxfam Responsible Data Policy (2016)
A policy which focuses on Oxfam’s commitment to treat programme data with respect and uphold the rights of those whom data is about
http://policy-practice.oxfam.org.uk/publications/oxfam-responsible-program-data-policy-575950

ICRC Professional Standards for Protection Work (2013)
The ICRC standards Chapter 6 focuses on managing protection information
https://www.icrc.org/eng/assets/files/other/icrc-002-0999.pdf

ICRC Data Protection in Humanitarian Action (2017)
This publication builds on previous guidance from the ICRC and includes new guidance on the management of personal data in humanitarian situations, including guidance on data analytics and big data; use of UAVs, drones and satellite imagery; remote sensing; biometrics; cash transfer programming; cloud services and mobile messaging apps.   http://reliefweb.int/sites/reliefweb.int/files/resources/4305_002_Data_protection_and_humanitarian_action.pdf

Using Spatial Data Wisely and Ethically (2017)
A framework for addressing privacy in Geospatial data from Measure Evaluation.
https://www.measureevaluation.org/resources/publications/sr-17-143/at_download/document

Ethical considerations when using geospatial technologies for evidence generation (2018)
Geospatial technologies have transformed the way we visualize and understand social phenomena and physical environments. There are significant advantages in using these technologies and data however, their use also presents ethical dilemmas such as privacy and security concerns as well as the potential for stigma and discrimination resulting from being associated with particular locations. Therefore, the use of geospatial technologies and resulting data needs to be critically assessed through an ethical lens prior to implementation of programmes, analyses or partnerships. This paper examines the benefits, risks and ethical considerations when undertaking evidence generation using geospatial technologies. It is supplemented by a checklist that may be used as a practical tool to support reflection on the ethical use of geospatial technologies.
https://www.unicef-irc.org/publications/971-ethical-considerations-when-using-geospatial-technologies-for-evidence-generation.html

The International Organization on Migration’s Data Protection Manual (2010)
A comprehensive data protection guide from the IOM.
http://publications.iom.int/system/files/pdf/iomdataprotection_web.pdf

World Food Program’s Guide to Personal Data Protection and Privacy (2016)
A comprehensive data protection guide from the WFP. https://docs.wfp.org/api/documents/e8d24e70cc11448383495caca154cb97/download/

Mapping and Comparing Responsible Data Approaches (2016)
The need for a responsible data governance approach has now been recognized by a wide variety of organizations, both within and outside the humanitarian space. The goal of this paper is to examine some of these existing approaches and, based on a comparative analysis, to identify best practices and innovative approaches to governing data in humanitarian contexts. To that end, Leiden University’s Centre for Innovation and the Governance Laboratory at New York University (The GovLab) have together undertaken a mapping exercise of 17 existing responsible data approaches. Taken together, these can serve as a toolkit for any organizations—particularly those operating in the humanitarian space—seeking to use data more responsibly and effectively.
http://www.thegovlab.org/static/files/publications/ocha.pdf

Girl Effect’s Girl’s Digital Safeguarding Guidelines (2016 and 2018)
The document offers staff and partners guidance on how to protect the privacy, security and safety of adolescent girls when developing digital tools and platforms, partnering with others, or using data in monitoring, evaluation and learning efforts.

2018 version (now with GDPR!):
https://prd-girleffect-corp.s3.amazonaws.com/documents/Digital_Safeguarding_-_FINAL.pdf

2016 version:
http://www.ictworks.org/wp-content/uploads/2016/05/GE-Girl-Digital-Privacy-Security-Safety-v-May-2016.pdf

Protection Information Management (PIM) Training Resource Pack (2018)
Protection Information Management (PIM) refers to the principled, systematized, and collaborative processes to collect, process, analyse, store, share, and use data and information to enable evidence-informed action for quality protection outcomes. These 5 training modules aim to help protection staff learn to manage data responsibly.
http://pim.guide/uncategorized/pim-training-resource-pack/

Protecting data, protecting residents (2017)
10 principles for municipal authorities on managing data
https://sunlightfoundation.com/wp-content/uploads/2017/02/Protecting-data-protecting-residents-whitepaper.pdf

Blockchain Ethical Design Framework (2018)
Provides a set of guidelines for ethical design of blockchain when being used by social impact organizations. “This is particularly important in blockchain, in which the rules governing the human interactions with the technology are determined from the earliest stages of design and can be exceedingly difficult to change once the technology is implemented.”
http://beeckcenter.georgetown.edu/wp-content/uploads/2018/06/The-Blockchain-Ethical-Design-Framework.pdf

Ethical considerations when using social media for evidence generation (2018)
There are significant ethical implications in the adoption of technologies and the production and use of the resulting data for evidence generation. The potential benefits and opportunities need to be understood in conjunction with the potential risks and challenges. When using social media to directly engage children and their communities, or when establishing partnerships with these organizations for data collection and analysis, adoption of these technologies and their resultant data should not be exclusively driven by short-term necessity but also by the long-term needs of our younger partners. When engaging with social media and indeed most technology, thoughtfulness, reflection and ongoing interrogation is required. This paper examines the benefits, risks and ethical considerations when undertaking evidence generation: (a) using social media platforms and (b) using third-party data collected and analysed by social media services. It is supplemented by practical tools to support reflection on the ethical use of social media platforms and social media data.
https://www.unicef-irc.org/publications/967-ethical-considerations-when-using-social-media-for-evidence-generation-discussion.html

Getting to Good Human Trafficking Data: Everyday Guidelines for Frontline Practitioners (2018)
Ultimately this document serves as a catalyst to assess and enhance existing data collection efforts – tailored to the local context with a view to the regional potential – for good, responsible data to combat human trafficking. This guide is intended to serve as a reference document, offering baseline standards and recommendations based on current understanding around good, responsible data practices.
https://handacenter.stanford.edu/publications/getting-good-human-trafficking-data-everyday-guidelines-frontline-practitioners

GENERAL BACKGROUND, TOOLS & TEMPLATES

Responsible Data Frameworks in their Own Words (2017)
This paper reviews 18 frameworks.  Part I of this literature review discusses these foundational principles. Part II discusses how responsible data frameworks combine FIPs-based data protection principles and research ethics principles to form a baseline framework for responsible data. As one of the first examples of an organization attempting to implement a transparent data protection policies, we use Oxfam’s Responsible Data Policy as an exemplar of how organizations are embracing responsible data governance. Part III reviews 18 data use frameworks and organizes their principles into six common themes that exist across the frameworks.
https://cdt.org/files/2018/06/2018-06-25-Responsible-Data-Frameworks-In-Their-Own-Words-FULL.pdf

Responsible Data in Development Toolkit (2016)
A practical guide to help people and organisations think through responsible data issues that might apply to their project.
https://responsibledata.io/resources/handbook/

Data Starter Kit for humanitarian staff (2016)
Concise tip sheets on privacy impact assessments, data minimisation, encryption, archiving and more by Electronic Cash Transfer Learning Action Network. with content from Oxfam, SIMLab, FHI360, The Engine Room, Norwegian Refugee Council and more.
http://elan.cashlearning.org/

Oxfam Responsible Data Training Resources (2016)
A stimulating, adaptable training pack for humanitarian organizations on managing programme data. Available in English, Arabic, French and Spanish.
oxfam.org.uk/responsibledata

A Harm Reduction Framework for Algorithmic Fairness (2018)
This article recognizes the profound effects that algorithmic decision-making can have on people’s lives and propose a harm-reduction framework for algorithmic fairness. The authors argue that any evaluation of algorithmic fairness must take into account the foreseeable effects that algorithmic design, implementation, and use have on the well-being of individuals. They further demonstrate how counterfactual frameworks for causal inference developed in statistics and computer science can be used as the basis for defining and estimating the foreseeable effects of algorithmic decisions. Finally, they argue that certain patterns of foreseeable harms are unfair. An algorithmic decision is unfair if it imposes predictable harms on sets of individuals that are unconscionably disproportionate to the benefits these same decisions produce elsewhere. Also, an algorithmic decision is unfair when it is regressive, i.e., when members of disadvantaged groups pay a higher cost for the social benefits of that decision.
https://poseidon01.ssrn.com/delivery.php?ID=490074106114121095124080010025092104041021014087045043087028011025124107014085004075048120001122008096009117106065084014023080008007035089016082081078093069010006038012032125092095021066107067125122024107114118108127093121075006100113126078120081115&EXT=pdf

Signal code (2017)
The Signal Code by the Harvard Humanitarian Initiative (HHI) aims to help advance current and future efforts to create shared ethical obligations for practitioners. Most importantly, the primary goal of the code is to help reduce and prevent the threat of harm to vulnerable populations negatively affected by humanitarian information activities that may violate their rights.
https://signalcode.org/

Indigenous Peoples and Responsible Data: An Introductory Reading List (2017)
Compiles a number of responsible data protocols by and for indigenous communities.
https://responsibledata.io/indigenous-peoples-responsible-data-readings/

Responsible Data: Sustainability of ICTs by Siobhan Green (2017)
Discusses the importance of having realistic expectations for sustainability for any ICT4D project, since our industry, (like most others) have headwinds against transferal to local partners. Gives four definitions of sustainability to consider.
http://sonjara.com/blog?article_id=158

DATA COLLECTION

Improving Data Quality in Mobile Community-Based Health Information Systems–Guidelines for Design and Implementation from Measure Evaluation 
Increasingly, community-based health programs collect data that flow into donor programs and national health information system(s) (HIS). Programs are turning to mobile health (mHealth) technology to address a variety of challenges, including those associated with paper-based reporting systems, such as inefficient filing systems and operational challenges including storage space associated with transporting paper forms and receiving data in a timely manner. Mobile technologies can help programs improve the completeness and accuracy of data, tap the potential for real-time reporting, and strengthen communication and supervisory feedback practices.
https://www.measureevaluation.org/resources/publications/tr-17-182

mHealth Data Security, Privacy, and Confidentiality: Guidelines for Program Implementers and Policymakers and Companion Checklist
The mHealth Data Security, Privacy, and Confidentiality: Guidelines for Program Implementers and Policymakers are intended to strengthen national health information systems (HIS), by providing a tool to guide decisions on security, privacy, and confidentiality of personal health information collected and managed using mobile devices. The guidelines have a companion checklist to guide users through these decisions.
https://www.measureevaluation.org/resources/publications/ms-17-125a

mVAM: Guidance on remote mobile technology for household food security data collection
So, you want to use mobiles to collect food security data? You’re in the right place. This page provides a collection of tutorials and articles on how to use SMS polling software, run a call center and implement an Interactive Voice Response (IVR) system to automatically call households or to disseminate information to beneficiaries.
https://resources.vam.wfp.org/mVAM

Conducting Mobile Surveys Responsibly (from World Food Program) (2017)
WFP first circulated a corporate policy on data privacy and security in 2016. To implement this policy through practical guidance at the field level, the organization has issued this guide for field staff in collaboration with the International Data Responsibility Group. The field book outlines the main risks for staff engaged in mobile data collection and helps promote responsible data collection/storage/sharing in the very complex environment in which WFP operates. http://documents.wfp.org/stellent/groups/public/documents/manual_guide_proced/wfp292067.pdf

DATA MANAGEMENT

De-identifying data for non-profit organisations: a webinar series
This series discusses de-identification basics with Mark Elliot, risk analysis and mitigation strategies with Sara-Jayne Terp, and technical de-identification frameworks that are available to practitioners.
https://responsibledata.io/online-discussion-k-anonymity-and-other-de-identification-frameworks-an-introduction/

NIST guidance on de-identification of personal information (2015) 
by Simon Garfinkle
https://nvlpubs.nist.gov/nistpubs/ir/2015/NIST.IR.8053.pdf

NIST guidance on de-identifying government data sets (2016)
by Simon Garfinkle
https://csrc.nist.gov/CSRC/media/Publications/sp/800-188/draft/documents/sp800_188_draft2.pdf.

MCC Evaluation Microdata Documentation and De-Identification Guidelines (2017)
These guidelines provide guidance to Millennium Challenge Corporation (MCC) staff and contractors, as well as the staff and contractors of partner governments that receive MCC funding on how to store, manage, and disseminate evaluation microdata collected as part of an MCC-funded program.
https://www.mcc.gov/resources/doc/guidance-evaluation-microdata-guidelines

UK Data Archive
Slightly hard to navigate site, but has some useful information on data management, ethics etc. http://www.data-archive.ac.uk/create-manage/life-cycle

University of Michigan’s Data Stewardship Publications for Researchers
Inter-university Consortium for Political and Social Research (ICPSR) staff and researchers have authored and compiled here a number of white papers, reports, and published articles covering data confidentiality, data curation, data preservation, data sharing, meta data, research transparency, and sustaining domain repositories. 
https://www.icpsr.umich.edu/icpsrweb/content/about/data-stewardship.html#metadata 

DATA DELETION / CLOSING DOWN A PROJECT

Your Project Deserves a Good Death
What does a Do Not Resuscitate look like for your organization? Tips on making an imaginary plan for what you would do if you had to shut everything down or step away tomorrow.
https://medium.com/chrysaora-weekly/your-project-deserves-a-good-death-f345026b6e77#.411zgwg03

What it takes to truly delete data
Three key principles to how data gets deleted in today’s technological age.
https://fivethirtyeight.com/features/what-it-takes-to-truly-delete-data/

RISK BENEFITS ANALYSIS

Future of Privacy Forum: Benefit-Risk Analysis for big data projects (2014)
Offers a pathway for combining a privacy impact assessment (PIA), with a data benefit analysis and mapping benefits to privacy risks.
https://fpf.org/wp-content/uploads/FPF_DataBenefitAnalysis_FINAL.pdf

UK Information Commissioner’s Office guidance on Data Privacy Impact Assessments (2018)
https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/data-protection-impact-assessments-dpias/

BIG DATA

GSMA guidelines on the protection of privacy in the use of mobile phone data for responding to the Ebola outbreak (2014)
When Call Data Records (CDRs) are used to help in the response to the Ebola outbreak, mobile operators wish to ensure mobile users’ privacy is respected and protected and any associated risks are addressed. This document outlines, in broad terms, the privacy standards that mobile operators will apply when subscriber mobile phone data is used, in these exceptional circumstances, for responses to the Ebola outbreak.
https://www.gsma.com/mobilefordevelopment/wp-content/uploads/2014/11/GSMA-Guidelines-on-protecting-privacy-in-the-use-of-mobile-phone-data-for-responding-to-the-Ebola-outbreak-_October-2014.pdf

Children and the Data Cycle: Rights and Ethics in a Big Data World (2017)
This is a paper that discusses issues related to using big data of children.
https://www.unicef-irc.org/publications/pdf/IWP_2017_05.pdf

UNDG Big Data Guidance note (2017)
Sets out general guidance on data privacy, data protection and data ethics concerning the use of big data, collected in real time by private sector entities as part of their business offerings, and shared with the UN for the purposes of strengthening operational implementation of program to support the achievement of the 2030 Agenda. The Guidance Note is designed to: establish common principles, serve as a risk-management tool taking into account fundamental human rights; and set principles for obtaining, retention, use and quality control for data from the private sector.
https://undg.org/wp-content/uploads/2017/03/UNDG-Big-Data-Guidance-Note.pdf

A Guide to Data Innovation for Development (2016)
A guide from UN Global Pulse that provides a step-by-step approach to a big data project, including data applicability and data responsibility aspects. http://unglobalpulse.org/sites/default/files/UNGP_BigDataGuide2016_%20Web.pdf

The Privacy Tools Project (Harvard University)
The Privacy Tools Project is a broad effort to advance a multidisciplinary understanding of data privacy issues and build computational, statistical, legal, and policy tools to help address these issues in a variety of contexts. It is a collaborative effort between Harvard’s Center for Research on Computation and Society, Institute for Quantitative Social Science, Berkman Klein Center for Internet & Society, and Data Privacy Lab, and MIT Libraries’ Program on Information Science.
https://privacytools.seas.harvard.edu/

OPEN DATA

Responsible Data in Agriculture
This paper talks about how to improve the quality, use and access of open data in the Agriculture sector so that more people are able to use it.
http://www.godan.info/documents/responsible-data-agriculture

CASE STUDIES

Security lapses at aid agency leave beneficiary data at risk (IRIN, 2017)
An article about a breach at CRS when a system called Red Rose was breached, exposing financial and personal data of vulnerable groups at risk
https://www.irinnews.org/investigations/2017/11/27/security-lapses-aid-agency-leave-beneficiary-data-risk

Legal battle over information collection on undocumented migrants
Lawmakers arguing that information collected for one purpose (access to social and educational services) should be made available for the public to inspect.
http://www.kingscountypolitics.com/malliotakis-de-blasio-duke-idnyc-records/

HIV clinic reveals hundreds of patients’ identities
The 56 Dean Street clinic in London apologises after sending newsletter disclosing names and email addresses of 780 people, many living with HIV
https://www.theguardian.com/technology/2015/sep/02/london-clinic-accidentally-reveals-hiv-status-of-780-patients

Public bodies regularly releasing personal information by accident in Excel files
When officers within public bodies release FOI information that they think they have anonymised, they import personally identifiable information and an attempt is made to summarise it in anonymous form, often using pivot tables or charts. We have seen a variety of public bodies, including councils, the police, and parts of the NHS, accidentally release personal information in this way.
https://www.mysociety.org/2013/06/13/whatdotheyknow-team-urge-caution-when-using-excel-to-depersonalise-data/

Ebola: A big data disaster
The paper highlights the absence of a dialogue around the significant legal risks posed by the collection, use, and international transfer of personally identifiable data and humanitarian information, and the grey areas around assumptions of public good. The paper calls for a critical discussion around the experimental nature of data modeling in emergency response due to mismanagement of information has been largely emphasized to protect the contours of human rights. https://cis-india.org/papers/ebola-a-big-data-disaster

The difficulty of truly anonymising data
Yahoo released data from 20 million users, which the company says have been “anonymized.” But Yahoo’s data could be used by someone with access to other datasets or public records to infer someone’s identity. https://motherboard.vice.com/en_us/article/yahoos-gigantic-anonymized-user-dataset-isnt-all-that-anonymous

‘Data is a fingerprint’: why you aren’t as anonymous as you think online
This article gives a fairly straightforward explanation of several cases where “anonymized” data was re-identified. https://www.theguardian.com/world/2018/jul/13/anonymous-browsing-data-medical-records-identity-privacy

Home office uses charity data map to deport rough sleepers
The Home Office secretly acquired sensitive data from , showing the nationality of people sleeping rough on the streets, in order to remove them from Britain, the Observer can reveal. Internal correspondence shows the Home Office repeatedly requesting and finally gaining access to a map created by the Greater London Authority (GLA) that identified and categorised rough sleepers by nationality. The secret arrangement meant frontline outreach workers tasked with helping the homeless by collating data for the GLA were inadvertently helping the Home Office to remove people who were from the EU or central eastern Europe. https://www.theguardian.com/uk-news/2017/aug/19/home-office-secret-emails-data-homeless-eu-nationals

AI is in desperate need of an ethical watchdog – Wired Magazine
https://www.wired.com/story/ai-research-is-in-desperate-need-of-an-ethical-watchdog?mbid=social_fb

Scarcity of Data Protection Laws in Africa Leaves NGOs Exposed
The level of data protection regulations varies widely across the African continent. Some countries, such as Senegal, have rushed to adopt and implement regulations. But the majority have no legislation in place. And where efforts at regulations exist, experts are concerned that governments are primarily interested in giving themselves more leeway to pursue cybercrimes — or cybercritics — than they are in protecting people’s data. This landscape of patchy privacy legislation presents a significant risk to local organizations working with marginalized communities or around sensitive subjects, leaving both themselves and the people they are trying to help vulnerable. It also creates problems for international groups looking to partner with local agencies, raising questions about what information can safely be gathered and shared.
https://www.devex.com/news/scarcity-of-data-protection-laws-in-africa-leaves-ngos-exposed-93008

DATA PLATFORMS

Humanitarian Data Exchange
Find, share and use humanitarian data all in one place, powered by UNOCHA.
https://data.humdata.org/

Global Healthsites Mapping Project
Healthsites is an initiative to build an open data commons of health facility data with OpenStreetMap
https://healthsites.io/

STANDARDS 

Humanitarian Exchange Language
HXL is a different kind of data standard, designed to improve information sharing during a humanitarian crisis without adding extra reporting burdens.
http://hxlstandard.org/

50 Humanitarian IM tips
A quick guide  for anyone who has or is going to be working in information management in a humanitarian context by Simon Johnson.
http://simonbjohnson.github.io/im-tips/#/frontcover