Industry classification in the South African tax microdata
This paper documents the industry classification variables in the anonymized tax microdata available for research at the National Treasury Secure Data Facility in Pretoria. It discusses how the variables in the data are related to the raw records captured in various tax forms and outlines the various industry classification systems. We discuss and present a recoding by which idiosyncratic industrial classifications are transformed into one comparable system. For each of the industry variables, we examine its internal consistency (across years and other industry variables), external validity (by comparison with other data sources), and completeness (for important subsets of the data). On this basis, we suggest a set of ‘best’ industry variables for researcher use based on the underlying raw variables, while noting potential issues with the major options.