Tag Archives: excel

Making (some) sense of data storage and presentation in Excel

By Anna Vasylytsya. Anna is in the process of completing her Master’s in Public Policy with an emphasis on research methods. She is excited about the role that data can play in improving people’s lives!

At the MERL Tech Conference, I attended a session called “The 20 skills that solve 80% of M&E problems” presented by Dr. Leslie Sage of DevResults. I was struck by the practical recommendations Leslie shared that can benefit anyone that uses Excel to store and/or present data.

I boiled down the 20 skills presented in the session into three key takeaways, below.

1. Discerning between data storage and data presentation

Data storage and data presentation serve two different functions and never the two shall meet. In other words, data storage is never data presentation.

Proper data storage should not contain merged cells, subheadings, color used to denote information, different data types within cells (numbers and letters), more than one piece of data in a cell (such as disaggregations). Additionally, in proper data storage, columns should be the variables and rows as the observations or vice versa. Poor data storage practices need to be avoided because they mean that you cannot use Excel’s features to present the data.

A common example of poor data storage:

Excel 1

 

One of the reasons that this is not good data storage is because you are not able to manipulate this data using Excel’s features. If you needed this data in a different format or you wanted to visualize it, you would have to do this manually, which would be time consuming.

Here is the same data presented in a “good” storage format:

2Good_Data_Storage

 

Data stored this way may not look as pretty, but it is not meant to be presented or read in within the sheet. This is an example of good data storage because each unique observation gets a new row in the spreadsheet. When you properly store data, it is easy for Excel to aggregate the data and summarize it in a pivot table, for example.

2. Use Excel’s features to organize and clean data

You do not have to use precious time to organize or clean data manually. Here are a few recommendations on Excel’s data organization and cleaning features:

  • To join to cells that have text into one cell, use the concatenate function.
  • To split text from one cell into different cells, use the text to columns
  • To clean text data, use Excel’s functions: trim, lower, upper, proper, right, left, and len.
  • To move data from rows into columns or columns into rows, use Excel’s transpose feature.
  • There is a feature to remove duplicates from the data.
  • Create a macro to automate simple repetitive steps in Excel.
  • Insert data validation in an excel spreadsheet if you are sending a data spreadsheet to implementers or partners to fill out.
    • This restricts the type of data or values that can be entered in certain parts of the spreadsheet.
    • It also saves you time from having to clean the data after you receive it.
  • Use the vlookup function in Excel in your offline version to look up a Unique ID
    • Funders or donors normally require that data is anonymized if it is made public. While not the best option for anonymizing data, you can use Excel if you haven’t been provided with specific tools or processes.
    • You can create an “online” anonymized version that contains a Unique ID and an “offline version” (not public) containing the ID and Personally Identifiable Information (PII). Then, if you needed to answer a question about a Unique ID, for example, your survey was missing data and you needed to go back and collect it, you can use vlookup to find a particular record.

3. Use Excel’s features to visualize data

One of the reasons to organize data properly so that you can use Excel’s Pivot Table feature.

Here is an example of a pivot table made from the data in the good data storage example above (which took about a minute to make):

3Pivot_Table

Using the pivot table, you can then use Excel’s Create a Chart Feature to quickly make a bar graph:

4BarGraph

In the Future

I have fallen prey to poor data storage practices in the past. Now that I have learned these best practices and features of Excel, I know I will improve my data storage and presentation practices. Also, now that I have shared them with you; I hope that you will too!

Please note that in this post I did not discuss how Excel’s functions or features work or how to use them. There are plenty of resources online to help you discover and explore them. Some helpful links have been included as a start. Additionally, the data presented here are fictional and created purely for demonstration purposes.

Exploring Causality in Complex Situations using EvalC3

By Hur Hassnain, Monitoring, Evaluation, Accountability and Learning Adviser, War Child UK

At the 2017 MERL Tech London conference, my team and I gave a presentation that addressed the possibilities for and limitations of evaluating complex situations using simple Excel-based tools. The question we explored was: can Excel help us manipulate data to create predictive models and suggest  promising avenues  to project success? Our basic answer was “not yet,” at least not to its full extent. However, there are people working with accessible software like Excel to make analysis simpler for evaluators with less technical expertise.

In our presentation, Rick Davies, Mark Skipper and I showcased EvalC3, an Excel based evaluation tool that enables users to easily identify sets of attributes in a project dataset and to then compare and evaluate the relevance of these attributes to achieving the desired outcome. In other words, it helps answer the question ‘what combination of factors helped bring about the results we observed?’ In the presentation, after we explained what EvalC3 is and gave a live demonstration of how it works, we spoke about our experience using it to analyze real data from a UNICEF funded War Child UK project in Afghanistan–a project that helps children who have been deported back to Afghanistan from Iran.

Our team first learned of EvalC3 when, upon returning from a trip to our Afghanistan country programme, we discussed how our M&E team in Afghanistan uses Excel for storing and analysing data but is not able to use the software to explore or evaluate complex causal configurations. We reached out to Rick with this issue, and he introduced us to EvalC3. It sounded like the solution to our problem, and our M&E officer in Afghanistan decided to test it by using it to dig deeper into an Excel database he’d created to store data on one thousand children who were registered when they were deported to Afghanistan.  

Rick, Hosain Hashmi (our M&E Officer in Afghanistan) and I formed a working group on Skype to test drive EvalC3. First, we needed to clean the data. To do this, we asked our social workers to contact the children and their caretakers to collect important missing data. Missing data is a common problem when collecting data in fragile and conflict affected contexts like those where War Child works. Fortunately, we found that EvalC3 algorithms can work with some missing data, with the tradeoff being slightly less accurate measures of model performance. Compare this to other algorithms (like Quine-McCluskey used in QCA) which do not work at all if the data is missing for some variables. We also had to reduce the number of dimensions we used. If we did not, there could be millions of combinations that could be possible outcome predictors, and an algorithm could not search all of these possibilities in a reasonable span of time. This exercise spoke to M. A. Munson’s theory that “model building only consumes 14% of the time spent on a typical [data mining] project; the remaining time is spent on the pre and post processing steps”.

With a few weeks of work on the available dataset of children deported from Iran, we found that the children who are most likely to go back to Iran for economic purposes are mainly the children who:

  • Are living with friends (instead of with. relatives/caretakers)
  • Had not been doing farming work when they were in Iran
  • Had not completed 3 months vocational training
  • Are from adult headed households (instead of from child headed households).

As the project is still ongoing, we will continue to  investigate the cases covered by the model described here in order to better understand the causal mechanisms at work.

This experience of using EvalC3 encouraged War Child to refine the data it routinely collects with a view to developing a better understanding of where War Child interventions help or don’t help. The in-depth data-mining process and analysis conducted by the national M&E Officer and programmes team resulted in improved understanding of the results we can achieve by analyzing quality data.  EvalC3 is a user-friendly evaluation tool that is not only useful in improving current programmes but also designing new and evidence based programmes.