Data Is Not Neutral: Why Gender-Based Statistics Don’t Tell The Full Story (Yet.)



I’m on the line with Emily Courey Pryor, listening to the thrum of construction work in the background, when she says something that grabs my full attention.

“Data is not neutral.”

Behind her, the team in hard hats proves her point. The US construction market accounted for 3.6% of the nation’s GDP in 2014, has seen 32.5% growth since 2011, and employs 7.8 million people with an average salary of $35,020. From this data, one could extrapolate that construction work is financially stable. However, that raw (and seemingly positive) data fails to account for the 19% of jobs in the industry have been lost since 2007, unpaid and illegal employment, and rising competition. When we include DIY repairs, unpaid internships, and underpaid immigrant labour, the average salary sinks. Add in the high-cost medical bills inherent to dangerous work, as well as the predicted growth of ambitious sectors muscling into profits, and suddenly the sector seems much less stable.

Those four words are a wakeup call for those of us who collect and utilize data.

Emily continued, “As a citizen who consumes data, I often hear ‘the numbers don’t lie’... but data is neutral. Who is being asked the questions and how are they being asked: that's what matters. People think measurement is value-neutral, but it’s not.”

The assumption that data is neutral; that numbers alone tell a full story, without accounting for how those numbers are collected, who they were collected from and by, and how we should read them in the greater societal context; can lead to unwise and even dangerous policy investment, while cementing commonly-held stereotypes and stigmas. Worse, when that bias informs how data is collected, it can lead to Bad Data, or in cases of social inequality where insight isn’t prioritized, No Data.

Emily is on the front lines of the battle for Good Data, in one of the sectors where implied data neutrality, Bad Data, and a lack of data, hit the hardest: Gender Equality.

Emily, a member of the UN Foundation, is on the Organizing Committee for Data2x. The initiative was announced by U.S. Secretary of State Hillary Clinton in 2012, with the aim of closing key gaps in gender-based data, to help empower women by creating a foundation for evidenced-based “smart policies”. Gender-based data has been plagued by inefficient and insufficient data collection, across the globe, in five crucial sectors: Health, Education, Economic Opportunity, Political Participation, and Human Security.

“There’s a lot of work to do in this field, but a big, significant piece is to familiarize people with what gender-based data is, what it means.” What gender has to do with data can often be a difficult idea to export, as gender inequalities; and thus, gaps in data; express themselves differently in each society. There are no broad strokes, which makes it incredibly difficult to tackle, necessitating each data collection system be tailored to the ecosystem it’s being applied to. Data2x’s efforts, supported by the Hewlett Foundation and the Gates Foundation, work to bring together global partners to address these issues on a worldwide scale.

The light at the end of the tunnel is progress: “Good Data helps mobilize positive action, which gets a positive reaction. It’s a feedback loop.” With the inclusion of technical experts, policy advisors, private sector business, and governmental input, Data2x hopes to ignite a “gender data revolution” that will guide investment and development to areas where it’s the most needed, while also shedding light on a subject that has been much in the dark: the key difference between Good Data, Bad Data, and No Data.

Bad Data vs. No Data: The Repercussions of an Undiagnosed Problem

“There is a historic lack of focus on gender-based data. [It’s not that there’s] a reticence towards the idea that gender is important… it’s more the ability to advocate and communicate just why it’s important. There can be data that already exists on girls and women… but often the case is that there’s no data, or it’s bad data.”

“No Data” refers to data that cannot be collected, or data that cannot be disaggregated, either due to a lack of prioritization or resources. It’s an important distinction to note that data in this category doesn’t just ‘not exist’, but that there are roadblocks to gathering it.

Perhaps obviously, real data can’t exist on people who aren’t included in its collection. Less obviously, the typical wording on many household surveys often bars women’s role in expressing their own societal inputs.

“Generally, one finds No Data especially applies to aspects of the lives of women and girls, [as they] are not highly valued by society.”

Examples of ‘No Data’ in female communities may refer to female output being designated broadly as ‘housework’ (e.g. textiles, cooked meals, home repairs, manual labour, ferrying water and fuel, farming and market sales, childcare and education.) Women’s economic independence and dependence is another area where ‘No Data’ rears. Household income is often recorded as a whole, which gives an inaccurate view of female dependency and individual poverty in societies where she cannot own or inherit. Further, ‘No Data’ strikes particularly at Human Security and Health issues, where Data2x notes: “Beliefs on boundaries of the home to public action” have stanched data collection on violence against women, with systems turning a blind eye to ‘personal matters’. Similarly, a combination of cultures that value “female modesty” and societies that have traditionally assumed female health follows the same paradigms as males has meant a real lag in proper medical support to women and girls.

Where there is No Data, there is no accountability, and where there is no accountability, there can be no real policy change, advocacy, or solutions.

However, Bad Data is even more Machiavellian.

Bad Data is any instance where data is collected in a way that doesn’t tell the full story, or tells an untrue story. Because data informs where governmental and organizational aid and money goes, the false pretense of Bad Data means that available resources are not meted out to those most in need. Gaps become not only invisible, but mythical.

Data2x’s newest research paper, “What is Wrong with Data on Women and Girls?”, presents a prime example of how Bad Data is gathered. It notes that in many regions, household surveys approach the “head of household” for answers. The head of household, is often de facto a male. (Indeed, many surveys defines men as the head-of-house, bypassing female-led families altogether.) This “sex-role stereotyping” presents the predisposed idea of a male breadwinner, and a female caretaker, whether or not that is truly the case.

“The questions you ask a man, and the questions you ask a woman, will be different.” Even if you did ask a woman, it turns out that with a husband or father in the room, her answer may be influenced and not a true reflection of her situation. “Women are asked about their roles as reproducers, not producers.”

In this way, Bad Data reinforces stereotypes and stymies effective policy-building. The data collection process misconstrues women’s roles as being unproductive, and does not offer survey options to prove otherwise. Female employment, which in many areas is relegated to part-time, unpaid, seasonal, or market work, including agricultural labor on family farms, are not considered valid forms of employment by many data collection systems. Another way these surveys undermine women is by allowing families to list each member’s ‘primary’ role: so a mother who works may be recorded as a ‘mother’, and only that. In some cases, even informal employment (anything outside of a contractual enterprise, which many women’s jobs are) are invalidated by the criteria. It reinforces the idea that women don’t work, which in turn supports the use of biased questions which self-congratulate their own results.

This Bad Data becomes insidious not only on a social level, but also leaves government unable to deal with the reality. For example, the prevalent idea that ‘women don’t work’ has led to a number of education and training reforms across countries, devoted to offering multi-hour courses to women, to ‘encourage their participation’. However, as women already have full loads between housework, child-rearing, and ‘illegitimate work’, these programs see a 40% dropout rate; much higher than that of the men who enroll, who are often actually unemployed and have time to devote. This setup by Bad Data would seem to ‘prove’ that women are less capable, while missing the fact that both the assessment and the policy are inherently flawed.

Bad Data is particularly problematic in countries stricken by poverty, where gender data gaps are at their most dramatic, because the capacity to collect data or update current systems are strapped by financial realities.

On the other hand, Uganda is a positive example of reconciling Bad Data systems. In 1992-93, the country posed labour questions to certain regions with new wording that covered secondary activities along with primary activities. The result was was that the labour force saw an uptick from 78.3% to 86.6% of the population, accounting for 702,149 new workers-- the majority of them, female. This is not entirely unexpected, as women fill in a majority of “informal employment roles”, especially in agriculture, the world over. (Notably, they even have “higher participation rates than men in three world regions.”)

This turnover begs the question: how can Bad Data be avoided?

Some effective ideas so far have been using broader language, and less detailed descriptions when presenting ideas in surveys: this allows for commonalities to be found, in place of biases being presented. Self-reporting, rather than household reporting, is another means to differentiate the needs of men, women, and children within a society. It is also time to drop the idea of the male breadwinner: Data2x notes, “The proportion of rural female-headed households doubled in Costa Rica and El Salvador, and increased by more than 50 percent in Honduras and Nicaragua when an unbiased gender survey was carried out…”, while in Latin America, the Caribbean, and parts of Africa there have been increasing numbers of female heads of house.

“The way many surveys are set up make it impossible to accurately portray women’s participation in society. If you can change the way definitions are applied, you can change the future of women’s roles in production.”

The Data2x Story: Global Partners for Good

Data2x recognizes that to support Good Data properly, they must not only focus on collection methodologies, but also help promote systems that can measure successes.

Measuring success is vital to keeping data relevant in the mainstream, and validating the link between Good Data and positive change. Good Data informs who should be making what changes, and how to do so. The measure of that progress proves that informed legislation is affecting its citizens positively. Measurable success, in turn, quiets naysayers, garners more support and funding for social good movements, and puts a greater focus onto the relevance of research into Good Data collection and interpretation.

Implementing this, however, is no easy task.

Good Data is typically generated institutionally (the DHS health survey, for example), and Data2x’s institutional partnerships are currently working towards incorporating new and varied sources of data, to cross-hatch resources and fill in the gaps. Data2x is bringing capable people together to tackle these broad-spectrum problems.

CR (Civil Registry) focused partnerships are the cornerstone of this philosophy. CR data collection doesn’t focus on a topical area, but on big data collection streams: births, deaths, marriages, and divorces. Strengthening CR systems forms bedrock on which to launch Good Data collection systems. To solve niche problems, it is essential to nominally know: how many people are there in a region? How many have recently died? What did they die of? How many are married, how does that number effect birth outlook, and who has access to a divorce? The implications of these baseline statistics can give us a window into why certain gaps exist, especially in Gender Inequality.

“One-third of all births and one-half of all deaths aren’t registered. Marriage and divorce are even worse. ...This lack of data has a huge impact on everybody, but especially girls and women.”

Without births being registered, for example, how can Health ministries plan for proper prenatal care, and how can Education ministries plan for schools? Without knowing class size, how can new schools be built accordingly, lunches offered appropriately, textbooks and other resources allocated? In societies that value men as breadwinners, who suffers when there aren’t enough desks for the students-- who drops out?

Women who are poor or uneducated have even more limited access to registering their life events. In many locations, a marriage certificate is necessary to legitimize children, and often too expensive for families. In some cases, only the father can register the birth of a new child--complications arise when either there is no father present, or no marriage certificate to legitimize the birth of a couple’s child. Another issue is that the laws regarding CR vary wildly from country to country, making it difficult to employ a wide-spread systematic change. But when addressing these systems, the concept of gender is understood: that children who are born to unmarried women should be legitimate, and that marriage should not be considered a prerequisite of motherhood, for example.

The gaps that Data2x works with specifically, down to the local CR level, can seem harrowing. They have identified 28 main gaps across their five main sectors, which seems brutally large. However, the framework followed to pinpoint these gaps was thorough: Data2x looked at Need (disparities that were affecting women), Population (how many women and girls would be helped by data), and Policy Relevance (where it was possible for policy to actually be changed, on a global or local level). The resulting map of these gaps was then reviewed by a technical Advisory Board, and consulted on by WikiGender.

With such a broad spectrum of critical issues affecting women and girls, from exclusion to migration, it becomes obvious why measures of success are so important.

To that end, when asked what area is perhaps the most in need of Good Data and policy backing, Emily decided “Economics”, and ranked Education as the least in-need. Why? Because Education, while still a work in progress, is also a measurable success story.

Fifteen years ago, at The Millennium Development Goals, very few gender-based goals were identified. However, the ones that were selected, were Education and Women’s Reproductive Health. Setting sustainable goals to improve inequality in those areas, girl’s Education and Women’s Reproductive Health became a cornerstone of activism, investment, and policy change. As systems were in place to measure whether or not goals were met, when they were, people were able to visualize that greater gender equality was truly possible. That positivity begot more of the same-- “You can’t have a goal without having a way to measure your success and progress on that goal.”

Now, 80% of countries report gender-disaggregated data on education and 70% on reproductive health to the UN Statistics Division (UNSD) survey. Contrastingly, as Emily suggested, only about 35% report on economic and employment topics for women.

Data’s Golden Age: Why Now?

Data2x presented at the Social Good Summit, alongside Humanise CEO, Melissa Jun Rowley, and described why we are in a Golden Age for data.

People have a genuine interest in data right now, not only in what it can measure, but in how it measures. Capturing that interest (and the capital that comes with it) is essential to highlighting topics that have long been overlooked, and Gender Data as an open frontier for data collection and data improvement can attract the proper resources. Data2x is playing a bridge role in this, to bring the topic of women and girls to a different sort of audience, the companies invested in data topics.

“If the data exists, people want to prioritize the issue… but you have to create the systems that collect it.” Emily mentions.

A caveat in all of this is the problematic mindset that new data collection methods are always best, or the most wieldy. While geo satellite and social media data are seeing a massive increase in appeal, implementing these high-tech systems into certain regions can be more costly and time-consuming than engaging in census and survey data, which is just as valid. Interest in big data sources can cause confusion about how quickly data can and should be collected, and shun the importance of “passive data” gathered through local channels. New systems can be helpful, however, in instances where systems are worn and tapped of resources: regional statistics offices with small teams stretched too thin.

“New data collection methodologies and what they can tell us are important; it’s about finding a happy medium where those collection channels can work together. Data will tell us a lot more about the trajectory of women and girls. That’s what Data2x is doing.”

And that’s what we need from Data2x, its partners, and organisations. The obligation is longer up for discussion. And if advocacy can’t inspire a positive change, then the economics should.

Female infanticide promotes male poverty, and violence against women costs a country between 1.2-3.7% of its GDP yearly, through criminal proceedings as well as mental and physical healthcare costs. Conversely, educating girls in Ethopia could add as much as $4 billion to the national GDP.

Showing this economic impetus is vital to change, because it keeps governments and businesses engaged.

Successful measures have given proof positive that bridging gender data gaps revitalizes societies and economies, which means the future looks bright for gender data collection, and Good Data. Societies are learning not only how to apply data in a way that holds change-makers accountable, but how to collect it properly from day one. When it comes to gendered data, that means overcoming bias, phrasing questions correctly, and looking at the fundamental way society views women and their roles. Data must be directed by individuals: the data must not lead the witness.

Data2x just launched major partnerships 10 months ago, so they’re expecting outcomes in 3-12 months. They hope to show more markers of success in the field. Moreover, they’re getting the world thinking about gender data. We can only build constructive systems if we recognize that such data exists, and that its current iteration leaves much to be desired. But, Data2x hopes to get there. After all…

“Data not only measures progress, it inspires it. Gender equality and data are synonymous.”


This article is part of's #WomenInFront series, powered by Humanise. Learn more about the initiative at, or join the conversation on social media: TwitterFacebook and Instagram.

comments powered by Disqus