chapter 5 Getting a sense of Big Data and well-being
Why we need to ask critical questions of data in the context of well-being
Many issues related to Big Data don’t have clear-cut answers, especially where well-being is concerned. While data reveal details of the vulnerable, often involving risk for these people and their communities, the State uses data systems that people increasingly need to be a part of to access healthcare and welfare support1, some researchers use Big Data to reveal the limits and social issues connected to everyday datasets that we all use, such as a search engine’s image database2. These critical studies of data and their effects on society reveal how data are capable of not only new problems, but persistent racism and misogyny, as we discovered in Chap. 1 with Virginia Noble’s example of what happens when you search for the phrase ‘black girls’3. These projects reveal data’s negative social effects, and how they are already embedded in society, exacerbating issues.
Other research aims to investigate what people know and think is going on. Also looking at the possibilities of Big Data (and their associated technologies) to understanding aspects of well-being. One such example4 presents real-life cases of public sector data practices to members of the public. It wants to understand how much people appreciate the possible benefits and how much they doubt or distrust the possible implications of data systems and sharing in their everyday lives. One option being, of course, that many people may not really care as much as we think they do, or should.
We touch on these issues in this chapter. Most notable is the increase in concerns regarding the harms that Big Data and new technologies are capable of, and which are happening unchecked5. There are two main problems here. One is that we are compromising well-being in the so-called aim of better understanding the human condition. The second is that we are not only using these data and technologies to understand people but also sorting and managing them in different ways that suit those who are already more powerful.
It is vital to note that key to concerns about datafication are how these practices disproportionately affect the well-being of those already most vulnerable. Facial recognition, for example, negatively impacts people already disadvantaged, owing to its own gendered, heteronormative classed and racialised biases6. These technologies are also being trialled in policing in the UK and have reported more than 90% of incorrect matches7. In a more general way, all public services are adopting new data practices and possibilities.
Data-driven decision-making is growing as an everyday feature of public services. Who receives welfare8 housing9 and other interventions, such as child protection10 or education11 (I wonder if maybe these should be collapsed) are decisions increasingly made by algorithms, rather than people. Even when automated decisions are questioned by people12, it is unclear whether ‘experienced workers’13 or the data system has the greater influence in key decisions.
Beyond welfare, algorithms intervene in other social policy areas. They monitor the ‘quality’ of education, using dubious proxies14, with various bad outcomes, including teachers undeservedly losing their jobs.((O’Neil describes how the bottom scoring 2–5% of teachers were fired. Yet, the modelled target student scores and small classrooms made the scoring of teachers little better than random, and there was almost no correlation in a teacher’s scores from one year to the next and qualitative data called one of the sacked teachers ‘one of the best teachers I’ve ever come into contact with’ (O’Neil 2016, 4).)) In COVID-19 UK in 2020, an algorithm also decided the grades awarded to school-leavers in the absence of exams, owing to social distancing measures. One national media headline15 called this ‘punishment by statistics’.
The UK’s A Level algorithm example was extremely high profile, causing outrage that data-driven decision-making would have such an enormous effect on the futures of these young people. It was seen as morally outrageous for a number of reasons. First, because our society dictates that these young people’s well-being should be protected. Second, this algorithm used data that no one had consented to: no one knew at the time that their prior grades could be used as a final grade. Third, the data model also included proxies for expected performance which were nothing to do with each student’s own academic record. Instead, they used their school’s overall performance in previous years, which were scores based on previous students’ grades, not theirs. While the governing body, Ofqual, insisted its standardisation arrangements ‘are the fairest possible to facilitate students progressing on to further study or employment as planned’15, there were further controversies over transparency around how they had arrived at ‘fair’. After which, Ofqual published a 319-page document explaining its methodology15 which was criticised for not being accessible to the general public. Therefore, not only did the whole thing seem far from fair, but Ofqual didn’t make explicit how the approach was fair to those affected.
Here we see public services failing to look after well-being through the use of data in ways which go against the moral code of fairness, accountability and transparency((Critical Data Studies are moving for more fairness accountability and transparency in data practices. Please see the FAccT conference for more on this: https://facctconference.org.))—and without the young people’s consent. Beyond their high-profile nature, what is different about these data uses? While Chap. 2 discussed the greater role of data in public services from the 1980s onwards, this ostensibly had a different rationale. It aimed to evaluate qualities of these services, such as efficiency or cost-effectiveness. While these approaches led to flawed decisions and evaluations, assessments were made at a societal level. Contemporary data-driven decision-making, whether the allocation of resources to people or the labelling of individuals at risk, is a different approach and uses data on a different level. Or, to use the language of Chap. 3, there is a different unit of analysis, and that unit could be a vulnerable person.
In sum, why do we need to ask critical questions about how people and their well-being are being understood or about how data and data systems used to understand people can compromise well-being? Going back to those definitions, people are often concerned with the speed and size, and so on, of Big Data. Actually, as Kitchin indicates, it is the contexts of these data that are the most important ways that they are different. Not only are the contexts of origin of Big Data more different, and further from the contexts of use, than before, but the practices of analysing data feel less human. By this I mean that less human attention is now required in data analysis and in important processes that require data. What does that mean for decisions made about people and well-being?
As we will discover in a few sections, the response to COVID-19 required older data and data systems—and more human judgement—than you would have imagined if you were looking at media reports of the promise of artificial intelligence (AI) in the first half of 2020. However, as the financial value of data increases, the more expediently they can be analysed, and here we must ask other questions. Who stands to gain and who stands to lose? Who has chosen to participate? But then did people ever get to choose to participate in systems of well-being data? Or were we even thinking about data as ‘a thing’ about us, that affects our lives and was valuable? The next two sections deconstruct the financial value of Big Data and whether this reality is even new.
- Dencik 2020)). This is why the growing amount of research which problematises the utility and ethics of Big Data, and how they are used, is vital. In this area of critical data science ((see Bates 2016 [↩]
- e.g. Otterbacher et al. 2017 [↩]
- Noble 2018 [↩]
- Living With Data n.d. [↩]
- i.e. the UK’s Data Justice Lab n.d.; Eubanks 2018; O’Neil 2016; Noble 2018; Benjamin 2019 [↩]
- Ada Lovelace Institute 2019 [↩]
- Fussey and Murray 2019; Davies et al. 2018 [↩]
- Eubanks 2018, 37 [↩]
- Eubanks 2018, 93 [↩]
- Eubanks 2018, 135 [↩]
- O’Neil 2016, 5-9; 52–60 [↩]
- Eubanks 2018, 141 [↩]
- Eubanks 2018, 77 [↩]
- O’Neil 2016 [↩]
- Pidd 2020 [↩] [↩] [↩]