In 2016, we joined a project in South Africa working to make tax data available for research purposes. Tax authorities collect an enormous amount of data on companies and individuals, data that is useful for economics research and economic policy-making, and the potential to learn from such a large collection of data on the economy is huge.
But most revenue authorities around the world don’t have the data science capacity or financial resources to build such a library of knowledge in a manner which meets global best practice in data security. In 2016, when our journey began, only a handful of very rich countries had taken on the task of making this data available to researchers as a public good, understanding that big data held the potential to improve policy and, by extension, economic fairness and economic performance.
It is an arduous task to anonymise, secure, and create administrative tax datasets, but we were excited to join the team working to make South Africa a data pioneer, and eager to do research with the data ourselves. When the datasets were finally ready and made available to researchers, South Africa became the first developing country in the world to provide such a public good.
‘Big data’ is revolutionizing our ability to understand economies
In economics big data often describes tax administrative microdata. These data are crucial to answer questions that we care about — and to providing actionable recommendations on issues we care about, such as gender equality.
Since 2015, the South African Revenue Service (SARS), in partnership with the National Treasury (NT) and UNU-WIDER, has worked to make anonymised tax data available to researchers at the Secure Data Facility in South Africa’s National Treasury. The tax data available includes personal, company, and trade-related information for the full population of taxpayers in South Africa.
But, until 2018, these data did not include taxpayers’ gender identifiers. Without this variable, researchers were unable to answer crucial questions about gender equality in the labour market, such as:
- How big is the gender pay gap in South Africa, and is South Africa doing enough to promote gender equality?
- How do women move between firms, and in and out of employment, throughout their careers?
- What is the relationship between time employed and career advancement, and does this differ for men and women?
- Is the gender pay gap higher or lower in firms engaged in international trade?
- Which types of firms are more likely to employ more women and pay higher wages and can national policy promote the growth of such firms to help outcomes for women become more equal to outcomes for men?
By adding this variable, our team has now enabled big data research on important gender questions in South Africa, for the first time.
This new addition will empower researchers to more thoroughly answer questions crucial to making policy on the basis of good evidence. This can improve our ability to create employment, raise wages, and fight inequality.
The advantages of tax microdata
Internationally, the use of administrative tax data for research is becoming more common. Over and above survey data, administrative data is appealing for researchers as it offers several advantages.
First, studies which use administrative data are often less costly to finance than those which rely on carrying out surveys. As the data has already been collected as a part of normal government operations, making it available for research can provide major cost-savings to answering policy questions. While it is predominantly rich countries which have taken on this task, given the cost effectiveness, it makes sense for more developing countries to invest the resources required to do the same.
Additionally, administrative records can provide more dependable information — tax records are usually audited so they provide an added level of reliability. And, of course, administrative data covers a much greater number of people than any researcher could hope for in survey data, and represent the entire population rather than just a sample.
Finally, the regular collection of tax records also means that tax data tracks firms and individuals over time. This means we can use data for policy evaluation as we have access to key variables from before, during, and after policy implementation. With the addition of gender in the data, we now have access to information that allows researchers to critically and credibly examine policy interventions as they happen, and how they affect men and women differently.
Making tax data a public good provides an opportunity for researchers to re-examine old questions and the chance to ask and answer new ones. Already several research papers using South Africa’s tax microdata have been published, and many are available open access here. These papers are a testament to the policy relevance of the research produced using tax microdata, and they have already been used to inform policy-making in South Africa.