chapter 5 Getting a sense of Big Data and well-being

chapter 5 contents

What even is ‘Big Data’?

Big data – a new way to understand well-being?

Why we need to ask critical questions of data in the context of well-being

Value

Are Big Data even actually new?

The darker side of historical well-being data and commercial gain

A case study on the promise of commercial Big Data

Linking Big Datasets – for well-being?

Social media data – a game changer?

Social media data-mining in social and cultural sectors

Understanding where people are and how they feel using Twitter data

Fit for Purpose? Health and well-being tracking and apps

Conclusion

Bibliography

← previous • chapters • next →

Why we need to ask critical questions of data in the context of well-being

Many issues related to Big Data don’t have clear-cut answers, especially where well-being is concerned. While data reveal details of the vulnerable, often involving risk for these people and their communities, the State uses data systems that people increasingly need to be a part of to access healthcare and welfare support¹, some researchers use Big Data to reveal the limits and social issues connected to everyday datasets that we all use, such as a search engine’s image database². These critical studies of data and their effects on society reveal how data are capable of not only new problems, but persistent racism and misogyny, as we discovered in Chap. 1 with Virginia Noble’s example of what happens when you search for the phrase ‘black girls’³. These projects reveal data’s negative social effects, and how they are already embedded in society, exacerbating issues.

Other research aims to investigate what people know and think is going on. Also looking at the possibilities of Big Data (and their associated technologies) to understanding aspects of well-being. One such example⁴ presents real-life cases of public sector data practices to members of the public. It wants to understand how much people appreciate the possible benefits and how much they doubt or distrust the possible implications of data systems and sharing in their everyday lives. One option being, of course, that many people may not really care as much as we think they do, or should.

We touch on these issues in this chapter. Most notable is the increase in concerns regarding the harms that Big Data and new technologies are capable of, and which are happening unchecked⁵. There are two main problems here. One is that we are compromising well-being in the so-called aim of better understanding the human condition. The second is that we are not only using these data and technologies to understand people but also sorting and managing them in different ways that suit those who are already more powerful.

It is vital to note that key to concerns about datafication are how these practices disproportionately affect the well-being of those already most vulnerable. Facial recognition, for example, negatively impacts people already disadvantaged, owing to its own gendered, heteronormative classed and racialised biases⁶. These technologies are also being trialled in policing in the UK and have reported more than 90% of incorrect matches⁷. In a more general way, all public services are adopting new data practices and possibilities.

Data-driven decision-making is growing as an everyday feature of public services. Who receives welfare⁸ housing⁹ and other interventions, such as child protection¹⁰ or education¹¹ (I wonder if maybe these should be collapsed) are decisions increasingly made by algorithms, rather than people. Even when automated decisions are questioned by people¹², it is unclear whether ‘experienced workers’¹³ or the data system has the greater influence in key decisions.

Beyond welfare, algorithms intervene in other social policy areas. They monitor the ‘quality’ of education, using dubious proxies¹⁴, with various bad outcomes, including teachers undeservedly losing their jobs.((O’Neil describes how the bottom scoring 2–5% of teachers were fired. Yet, the modelled target student scores and small classrooms made the scoring of teachers little better than random, and there was almost no correlation in a teacher’s scores from one year to the next and qualitative data called one of the sacked teachers ‘one of the best teachers I’ve ever come into contact with’ (O’Neil 2016, 4).)) In COVID-19 UK in 2020, an algorithm also decided the grades awarded to school-leavers in the absence of exams, owing to social distancing measures. One national media headline¹⁵ called this ‘punishment by statistics’.

The UK’s A Level algorithm example was extremely high profile, causing outrage that data-driven decision-making would have such an enormous effect on the futures of these young people. It was seen as morally outrageous for a number of reasons. First, because our society dictates that these young people’s well-being should be protected. Second, this algorithm used data that no one had consented to: no one knew at the time that their prior grades could be used as a final grade. Third, the data model also included proxies for expected performance which were nothing to do with each student’s own academic record. Instead, they used their school’s overall performance in previous years, which were scores based on previous students’ grades, not theirs. While the governing body, Ofqual, insisted its standardisation arrangements ‘are the fairest possible to facilitate students progressing on to further study or employment as planned’¹⁵, there were further controversies over transparency around how they had arrived at ‘fair’. After which, Ofqual published a 319-page document explaining its methodology¹⁵ which was criticised for not being accessible to the general public. Therefore, not only did the whole thing seem far from fair, but Ofqual didn’t make explicit how the approach was fair to those affected.

Here we see public services failing to look after well-being through the use of data in ways which go against the moral code of fairness, accountability and transparency((Critical Data Studies are moving for more fairness accountability and transparency in data practices. Please see the FAccT conference for more on this: https://facctconference.org.))—and without the young people’s consent. Beyond their high-profile nature, what is different about these data uses? While Chap. 2 discussed the greater role of data in public services from the 1980s onwards, this ostensibly had a different rationale. It aimed to evaluate qualities of these services, such as efficiency or cost-effectiveness. While these approaches led to flawed decisions and evaluations, assessments were made at a societal level. Contemporary data-driven decision-making, whether the allocation of resources to people or the labelling of individuals at risk, is a different approach and uses data on a different level. Or, to use the language of Chap. 3, there is a different unit of analysis, and that unit could be a vulnerable person.

In sum, why do we need to ask critical questions about how people and their well-being are being understood or about how data and data systems used to understand people can compromise well-being? Going back to those definitions, people are often concerned with the speed and size, and so on, of Big Data. Actually, as Kitchin indicates, it is the contexts of these data that are the most important ways that they are different. Not only are the contexts of origin of Big Data more different, and further from the contexts of use, than before, but the practices of analysing data feel less human. By this I mean that less human attention is now required in data analysis and in important processes that require data. What does that mean for decisions made about people and well-being?

As we will discover in a few sections, the response to COVID-19 required older data and data systems—and more human judgement—than you would have imagined if you were looking at media reports of the promise of artificial intelligence (AI) in the first half of 2020. However, as the financial value of data increases, the more expediently they can be analysed, and here we must ask other questions. Who stands to gain and who stands to lose? Who has chosen to participate? But then did people ever get to choose to participate in systems of well-being data? Or were we even thinking about data as ‘a thing’ about us, that affects our lives and was valuable? The next two sections deconstruct the financial value of Big Data and whether this reality is even new.

Dencik 2020)). This is why the growing amount of research which problematises the utility and ethics of Big Data, and how they are used, is vital. In this area of critical data science ((see Bates 2016 [↩]
e.g. Otterbacher et al. 2017 [↩]
Noble 2018 [↩]
Living With Data n.d. [↩]
i.e. the UK’s Data Justice Lab n.d.; Eubanks 2018; O’Neil 2016; Noble 2018; Benjamin 2019 [↩]
Ada Lovelace Institute 2019 [↩]
Fussey and Murray 2019; Davies et al. 2018 [↩]
Eubanks 2018, 37 [↩]
Eubanks 2018, 93 [↩]
Eubanks 2018, 135 [↩]
O’Neil 2016, 5-9; 52–60 [↩]
Eubanks 2018, 141 [↩]
Eubanks 2018, 77 [↩]
O’Neil 2016 [↩]
Pidd 2020 [↩] [↩] [↩]

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.