chapter 9 Understanding
The case for understanding in data
In 2018, I began a large-scale qualitative research project to understand data and diversity in the cultural sector. More specifically, Arts Council England (ACE)((ACE is a non-departmental public body (NDPB) and the largest funder of the arts in England. ACE wanted to introduce a measure of social mobility or class inequality to its data-monitoring processes. I was asked to conduct research and to recommend a new inequality metric.)) wanted to introduce additional questions to its existing equality monitoring processes.((There has been pressure on organisations and the public sector to collect workforce demographic data as a result of the Equality Act 2010 and the Equality and Human Rights Commission Employment Statutory Code of Practice (EHRC 2015). This typically involves ‘Equal Opportunities’ forms that draw on the same questions as national surveys, although the formatting and wording may differ. In the cultural sector, equality of access to jobs and access to commercial content, such as cinema visits, or publicly funded culture, such as the BBC’s broadcasts, is ascertained using national level survey data, consumer insight data and these mandatory monitoring processes. The BBC has, for example, added proxy questions to its data processes to understand the class of its workforce—in line with recent Civil Service developments (BBC 2017; Cabinet Office 2016).)) The research was undertaken in partnership with ACE to advise on how to improve data in the sector and introduce the potential new data to measure inequality better.
Inequality and inequality data are contentious issues across the UK cultural sector.((There is so much rich evidence on lack of diversity in the sector, although the arguments about this and data are summarised in Brook et al. (2020) and Oman (2019c); it is crucial to acknowledge the wider research across film, museums, television and broadcast, music, theatre and so on.)) Commitment to social inclusion is integral to the sector’s identity and values, as this book has argued. However, qualitative and quantitative data reveal, first, the failure to achieve diversity goals in terms of who gets to participate in, and work in the arts1 and, second, the amount of missing data from administrative processes2. What does ‘missing data’ mean? In this instance, it means a gap where there should be a value. For example, all those households who did not complete the census in March 2021 become missing data, and so people were hired to knock on your door to remind you to complete the census. Missing data reduce the accuracy of understanding that is possible from data, which can affect government decisionmaking, including how resources are allocated.
An example of missing data in the cultural sector equality monitoring story can be found in organisations that refused to ask people about their sexuality. One organisation I spoke with heartily believed that this question was irrelevant to their workplace, especially as they had such good LGBTQI representation in their senior workforce. They therefore did not collect these data, or report them to ACE for a sector-wide picture. Linked to this are longstanding discussions between people who don’t like feeling audited by existing data collection processes that aim to understand inequality issues. It feels like this organisation took a pretty understanding position, then. However, an organisation may think it is being sensitive to people’s privacy in not asking them the question and may not think it has issues of discrimination, but how could it know? When asked about their sexuality in a subsequent sub-study at this organisation, one person wrote that they were relieved this issue was finally being looked at, as they had experienced discrimination. Understanding what is best for knowledge and understanding is therefore far from easy.
We can see a disconnect emerging: between collecting data for good, but it feeling bad while it is happening. This tension has exacerbated issues related to data practices and diversity practices in the sector that required attention—and at the same time. How can the sector know how to change, when it doesn’t know what changes to make and where? Data and research can help answer these questions in different ways, but research on data needed to be done first.
The thrust of the empirical research I was doing was to understand how inequality data currently worked in organisations funded by ACE and, crucially, how this might be improved (in terms of data quality and process). In essence, this was very much a project to understand the complexities of the existing context before we might know what to do to improve it. To do this, I collected and analysed many different types of data((More detail on the data and the methodology can be found in Oman 2019c and Oman 2021, forthcoming.)) to help me understand the main problems across various areas and layers of the sector, and in different ways. You may remember that in Chap. 3 we covered how different kinds of data help us understand things from different standpoints. I describe the value of understanding a complex issue like this ‘in the round’3. Here, I needed to capture the complex ecosystem of data collection and analysis that informs inequality policy in the publicly funded cultural sector.
As well as various desk-based policy research, 15 organisations that were funded by ACE, called National Portfolio Organisations (NPOs), were sampled. Each NPO was chosen for a balanced distribution of geography, size of organisation, size of grant from ACE, discipline area (i.e. dance or visual arts) and social mission (i.e. reaching local working class communities or working with disabled performers). In each NPO, I undertook participant observation, interviews with experts in data or diversity and focus groups with staff who held no management responsibilities in these areas.
One crucial aspect of this as a project was to improve understanding of how people feel about questions that are used to gather data about class and social mobility alongside other inequalities((The Equality and Human Rights Commission Employment Statutory Code of Practice (EHRC 2015) has also placed pressure on organisations and the public sector to collect workforce demographic data, again of protected characteristics. These are currently listed as age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual orientation (EHRC n.d.).)) that are protected by the Equality Act4. So, I am going to concentrate on my focus groups here—as these were about how people understood data in their everyday lives. People were grouped together in teams within their workplace and asked to fill in ‘fake’ equality monitoring forms. When I say fake, I mean that they were fabricated through bringing questions used elsewhere onto one form for people to answer, and then reflect on them. It was hoped that this would help me understand the data differently, through looking at the questions that generate them through other people’s eyes.
The context and set-up were important, because, as I keep saying, context is central to people’s understandings of data and how they work. It is also vital to researchers’ understanding. Context is—again—another one of those ‘contested concepts’5. It is often discussed as a problem for the researcher: qualitative researchers need to be sensitive to the contexts they are researching. The same is true of evaluative research, irrespective of your approach, a researcher should understand as much about the contexts they are evaluating as possible. It is an important concern in data studies, with the concept of ‘contextual integrity’ proposed as a framework for good practice when it comes to using personal data and protecting privacy6. So what is context? In this book, it is all of the whos, wheres, whats, whens, hows and whys, as well as the how much? and the so whats? and what nexts? Context is, therefore, vital in how we understand how people feel about data more generally—and how data get used, more specifically. It is also vital to sharing understandings of data, which we will return to in a bit.
Keeping context at the forefront of the research design and analyses enabled interesting insights into how the data work. People’s reflections on the questions used to gather these data offered new understanding on their utility and their accuracy. After asking everyone to complete these ‘fake’((It is important to acknowledge that, as these questionnaires were fabricated, and while the context was comparable in some respects, the context was different to how one would normally complete an Equality Monitoring form. The complexities of this are discussed in Oman7 and are touched on in the working paper8. Much care and attention were also paid to protecting participants who did not want to have personal conversations with colleagues.)) equal opportunities forms, we spent time discussing how people felt about the questions: how they were formatted, what they were asking—and any other reactions. People indicated that they felt a combination of the types of understanding defined by the dictionary (mentioned earlier), of data and data processes, in which they could see benefits and harms that I discuss below.((In the working paper (Oman 2019c), which is open access, I outline these concerns, challenges and issues in greater detail.))
I categorised four main issues, which touch on the differing aspects of understanding we have encountered above. I grouped people’s responses into political, personal, practical, proxy9. When I say political issues, I refer to those who raised objections to collecting these data in this way as an issue of public concern. These sorts of responses are characterised by people asserting it is not right to collect these data like this, from a position of sympathy and shared understanding. I used the term ‘personal issues’ to explain people’s responses which described how the process was, or could be, hurtful for, or to, themselves and others. These data were seen as too private, and the processes could disproportionately affect some more than others. There were a number of practical issues raised, including people not knowing the answer to the questions, or not being able to answer using the categories provided. This probably feels very familiar to many of you who have tried to fill in a questionnaire and not been able to make your answer fit the form. There was a lack of shared understanding between the person asking the question and the lives of the people trying to complete it. Despite the importance of all the responses across categories, I want to focus on the final category, ‘proxy’, below.
You may remember, a ‘proxy’ is an indirect measure of something. The example I gave in Chap. 2 is that someone’s income does not necessarily tell you about their quality of life directly, but because the relationship has been long-studied, assumptions are made about well-being using what we know about how income relates to well-being. Or so the theory goes. Another example from Chap. 5 is that 5% of teachers were sacked in Washington, D.C., as a result of a determined mayor wanting to turnaround the city’s underperforming schools. However, the teachers were judged and then let go off the back of a complex and flawed algorithm, called a value-added model which ‘define[d] its own reality and use[d] it to justify their results’10. The idea was that ‘the numbers would speak more clearly and be more fair’, but those who interacted with these models, numbers and judgements said, ‘I don’t think anyone understood them’11. The example of the use of proxies in managing schools is more complex than the class metric in the arts question I outline above, but the premise is the same: these proxies categorise people, telling someone else something about performance, identity and background, and are not often presented in a way that is easy to understand.
In the case of equalities data, personal characteristics are used to understand class and social mobility, but it is not as simple as measuring something like age. Class tends to be categorised in bands, but the meaning and dividing lines between these bands (e.g. working class and middle class) are not universally understood by people. People are notoriously bad at self-defining their class12. This means that a direct measure of class using self-definition is unlikely to be accurate. Instead, asking people questions about their lives can indirectly establish aspects of privilege and disadvantage as a result of their socio-economic status, or their class. Some obvious questions might be to do with the house people live in, their salary—or another one that is popular is what newspaper you read. You probably have a different picture in your head for a person reading the Sun (a UK right-wing tabloid) than you do, say, the Guardian (a UK leftwing broadsheet). These questions get at different indicators of class: salary, wealth and cultural consumption, for example, and have all been shown to have different pros and cons.((Dave O’Brien (2018) in Arts pro explains this well))
Although the class proxy questions that were trialled in these group
discussions were new to many answering these equality monitoring forms, they have long-established methods with their own institutional histories. Many of the questions have been used for decades in sociological measures of social mobility13. One question asks for the occupation of the main wage earner in your household when you were 14. It is considered a more accurate measure of class than income or self-identification or any of the other proxy options14. This question is part of a schema that informed the National Statistics Socio-Economic Classification (NS-SEC) system used for half a century15. The schema identifies someone’s class origins by way of the school they attended, whether their parents attended higher education, and parental occupation at 14. While policy and data experts consider these questions most able to produce the most robust metric, the latter question in particular was queried in every one of my focus groups, because of these issues of understanding as political, practical, personal or proxy.
Returning to the findings on the proxy question, what were the issues with it? People by and large understood that this was a proxy question— even if they did not understand what is meant by the term ‘proxy’. Let me explain: one person said, ‘I know that you are trying to get at something, but I don’t know what it is, exactly’. The participant grasped that what their mum or dad did for a job years ago was not really the important thing for the researchers who would be looking at this data to understand class and social mobility. But they could not work out what the connection was between what they were being asked and inequality. What did it mean in the context of equality monitoring in their workplace at that moment, many years later. They found themselves in a process of trying to understand what the proxy question was doing, but it did not quite make sense to them.
Wanting to understand the rationale behind the question was not an isolated incident. There was a palpable moment in most of these group discussions where someone, or numerous people, identified that these are not neutral processes. There was more going on than met the eye and they wanted to understand. I was asked numerous questions by participants in almost every group, such as ‘What are you trying to get at?’, ‘Why has this question been worded like this?’, ‘Why my parents? What have they got to do with my job now?’, ‘Why the employment of only one?’ ‘Why employment at all?’ and, most frequently, ‘Why 14?’ and ‘What about the information about my life that this question does not capture?’ It is clear that this proxy question that aims to produce robust, objective data provokes many more questions when it comes into play with ordinary understandings. The key thing to learn from this was that many people did not feel comfortable answering the question for various reasons, but largely this was because they did not understand what it was doing, or how the data would be useful. They couldn’t imagine what would happen next or how it would be valuable.
As a researcher doing research for a policy organisation, I was asked to make recommendations on what to do next. So, my key recommendation was to improve communications about what was happening when people gave their data16. Essentially, context is not only important to understanding how data work in context for the researcher, but communicating these contexts is vital to move towards a shared understanding of how data work and why they are important.
It seemed clear that people needed to know why a question is being asked and what that question does, and why. They also craved to understand why these personal, intimate data are important to share. The question was not a question about questions, in so much as a question about data. Given the nature of the proxy was so far removed from everyday understandings of what the aim of using these data was, this is understandable.
People in the focus groups were (or at least claimed to be) committed to helping address issues of inequality, which is typical of people working in this sector17. In other words, the people I spoke to by and large had the empathy part of understanding down, but equality monitoring processes were not designed for shared understanding.
Remember that well-being data or inequality data are data about us. Yet, it is not common practice to help people understand what their data can do and how their data can improve anything. Cultivating communications about the whats, whys, whos, hows and so whats and what nexts is important to increase public understandings and trust18. We are seeing increasing attention to public engagement with data19. Yet, to date,((However, this will change, as it is one of the aims of the Living with Data project (and others) I’ve mentioned elsewhere in this chapter.)) this work is not necessarily concerned with how people come to understand data, and is still too focussed on how the tech/media company or the government wants people to engage with what they are doing.
The recommendations I made as a result of the inequality research aimed to not only improve understanding of why measuring class was important, but to be more understanding when collecting data16. As a director of a major museum said to me while I was setting the research up:
This [understanding inequality] is a project of care. It’s about trying to make the sector a better place for everyone, but somehow, the way it is done is the opposite. Its unfriendly, and I think, can feel hostile.
(Oman forthcoming-b)
Interestingly, this sentiment that people collecting data don’t care about people was quite common in the UK’s Measuring National Well-being debate (2010–2011). The quote below was one I chose to illustrate that you got the feeling when reading the comments people wrote in the free text fields, that people who completed the debate survey felt that the survey authors were talking a different language from them. They were almost from two different cultures.
Your [sic] talking to people about their lives, not selling them a product. Empathy and understanding with how you word your surveys will make people actually give a damn and ‘want’ to take part as they believe (rightly or wrongly) that they will be listened too [sic] and their opinion might just count for something.
(Oman 2015, p. 82)
Being more understanding when collecting data reduces these ‘hostile’ conditions of data collection in a project of social justice and well-being20. Those who want data, especially to improve things, need to be mindful of the well-being of those whose data they need. They need to be more understanding of those whose data they ask for, and they need to take account of the personal nature of these kinds of questions and the experience of being asked questions about your identity and your background21. They also need to move towards an idea of shared understanding of data and inequality.
Context should not only be a concern for researchers to improve their understanding on their terms, but needs to account for sharing understanding more broadly. We encountered this in Chap. 8, where research to understand the culture–well-being relationship is designed to prove this relationship and presented in a way that speaks to decision-makers. When in fact work should be done in social, cultural and charity sectors so that research is designed to work with and speak to the sector that wants to better understand the value of the work it does. Again, this means moving towards more shared understandings of data and their processes.
Subsequent to my research with ACE22 and policy recommendations23, this advice now features in the Social Mobility Commission’s new guidelines on collecting data24. The focus on the questions rather than the data is more people-centred:
Asking someone what their socio-economic background is can seem like a personal question to ask, and some people may not be used to being asked it.
In order to build trust, help employees understand why the question is being asked—to help get a better picture of the socio-economic diversity in the business. People need to hear a purpose.
This movement towards being understanding when collecting data to understand society is an important one, and one that has been little acknowledged up to this point in much large-scale data collection: whether that data are about well-being or inequality. Crucially, those marginalised by inequalities are most at risk of suffering from ill-being as a result of data25. While the government statistical service (GSS) has a pledge for statistics for ‘public good’26, this still does not formally((To be fair, there is good work happening in this area, it has just not been formalised yet.)) account for being understanding of the public in data’s collection, analysis and use.
- Brook et al. 2020 [↩]
- DC Research 2017; Oman 2019c [↩]
- Oman 2021, forthcoming [↩]
- 2010 [↩]
- Gallie 1956 [↩]
- Nissenbaum 2009 [↩]
- forthcoming-a [↩]
- Oman 2019c [↩]
- Oman 2019b see below; Oman forthcoming-a [↩]
- O’Neil 2016, 7 [↩]
- O’Neil 2016, 5 [↩]
- O’Brien 2018 [↩]
- Goldthorpe and Hope 1972 [↩]
- O’Brien 2018; Brook et al. 2020 [↩]
- ONS 2010 [↩]
- Oman 2019d [↩] [↩]
- Brook et al. 2020 [↩]
- Oman 2019c, d [↩]
- Kennedy et al. 2020 [↩]
- Oman 2019c, 2015 [↩]
- Oman 2019a [↩]
- Oman 2019c [↩]
- Oman 2019d [↩]
- SMC 2021 [↩]
- Data Justice Lab n.d.; Kennedy et al. 2020 [↩]
- GSS n.d. [↩]