chapter 5 Getting a sense of Big Data and well-being

chapter 5 contents

What even is ‘Big Data’?

Big data – a new way to understand well-being?

Why we need to ask critical questions of data in the context of well-being

Value

Are Big Data even actually new?

The darker side of historical well-being data and commercial gain

A case study on the promise of commercial Big Data

Linking Big Datasets – for well-being?

Social media data – a game changer?

Social media data-mining in social and cultural sectors

Understanding where people are and how they feel using Twitter data

Fit for Purpose? Health and well-being tracking and apps

Conclusion

Bibliography

← previous • chapters • next →

Are Big Data even actually new?

While data are ‘sold’ to us as ‘the new oil’¹, large datasets, and their use to understand human behaviour, are not new; neither is the relationship between governments, commerce and value, when it comes to data. Mary Poovey’s A History of the Modern Fact: Problems of Knowledge in the Sciences of Wealth and Society ((1998)) describes the rise of merchants and their influence over the State, including campaigns to promote the balance of trade as the index of national well-being from the early seventeenth century onwards². The new ‘enthusiasm for numbers’ in the early to mid-nineteenth century³ coincided with a growing infrastructure to collect and analyse data. This desire for numbers, and the data processes that were required to provide them, led to the ‘great explosion of numbers that made the term statistics’⁴. If truth be told, the term ‘statistics’ originated for governments to understand ‘the quantum of happiness’⁵. In this ‘avalanche of numbers’, ‘nation-states classified, counted and tabulated their subjects anew’⁶. However, while ‘statistics’ may be hundreds of years old, large datasets go back further.

Managing land, agricultural hierarchies and the desire to control populations have long required systems of recording. One of the oldest-known writing systems is Sumerian script, which is approximately 6000 years old⁷. This script is called cuneiform, and its uses are said to include the tracking of trade and taxes: you need records on who has paid, how much; who has not paid, and what they owe⁸. While the clay tablets these records were written on may not seem like a database, or feel like the Big Data futures outlined in the previous and subsequent sections, they were a dataset of sorts. Crucially, these data were used to monitor and control resources, including the management of people.

Most countries now undertake a census of sorts. The UK Census takes place every ten years and has done since 1801.7 The first four were only headcounts, with the 1841 Census being the first to intentionally record names of all individuals in a household or institution. The UK’s ONS website offers an interesting history of censuses in the UK, back to the Domesday book ordered by the Norman (French) King, William the Conqueror in 1086⁹. Again, censuses precede these European data moments by some 4000 years in both Egypt and China, whose governments (as they would have been formed and named in those days) recorded who lived where and how wealthy they were. The Romans held regular censuses to keep track of their expanding—and then contracting— empire. Evidence of other institutionalised data practices exists in the Bible: the book of Genesis talks of kinship and marriage records and Exodus mentions a population census to support the tabernacle. The Church collected information on births, christenings, marriages, wills and deaths; this tracked the business of a church and its parish, but was also a means of counting the faithful and tracking their wealth.

You will note that the recording of trade and births, marriages and deaths is not so different from the administrative data that appear in all our examples of well-being data, from Table 3.1 to 5.3. So, what is new about Big Data? We’ve long had large datasets that hold multiple data points on people and nations, but these are thought to be ‘state simplifications’ for officials¹⁰. Rationalisation and standardisation mean these representations ‘did not successfully represent the actual activity of the society depicted, nor were they intended to; they represented only the slice of it that interested the official observer’¹¹. What the historian James Scott tells us here is that the sorts of information that were collected on scale lacked detail that could be used to improve quality of life. He implies, of course, that those in charge did not actually care about quality of life, only quantity of resource, whether this was people to work the land, make armies, or pay taxes. More recently, as we have seen, governments were charged with responsibility for people’s well-being, and therefore, more complex data were required.((Although, of course, given what we have seen elsewhere in the book, we might question whether the changing possibilities for what data could describe, changed policy, rather than the other way around.)) One such development was the social survey.

The social survey has been used to collect data which capture various qualities of lives in richer ways, and for longer, than it is often credited for. For example, surveys in the UK in the mid-1940s (in World War II) discovered almost one in ten households did not have the number of cups deemed necessary for essential use, and ‘the shortage of scrubbing brushes seems to have been extensively felt’¹². Whilst still administrative records of resource and scarcity, the survey began to be used to articulate more qualitative aspects of quality of life as proxies for well-being. This presents richer detail than many of the contemporary surveys that generate the well-being data we have seen as either objective or subjective data so far.

These more qualitative data were not only collected using government social scientists that we might imagine with clipboards. A project called Mass Observation was established in 1937 by anthropologist Tom Harrisson, poet Charles Madge and filmmaker Humphrey Jennings.(( There were a number of iterations of Mass Observation, with different people initiating them, but these were the original founding members.)) Mass Observation aimed to record everyday life in Britain. There were paid investigators who anonymously recorded people’s conversations and their behaviour: at work, on the street and at memorable occasions, including public meetings or sporting and religious events.

This project was reminiscent of the current idea of ‘Big Data’, not only in the scope of the data gathered, but also in how they were gathered. Mass Observation had numerous phases and at one point also used a panel of around 500 voluntary ‘observers’. The initial aims of Mass Observation were to research everyday life, making use of ‘the untrained observer, the man in the street’((There were no women observing anything in those days, of course.)) as much as those who were thought to be skilled and qualified in gathering data of this sort¹³. The observers used various data collection methods to generate large datasets on different topics: some maintained diaries, while others replied to open-ended questionnaires. In 1938, there was ‘a competition’ for the residents of Bolton, Lancashire (see Fig 5.2), asking people what happiness meant for them. This was one of many themes, and people would reply to what were called directives with often very long texts describing what they thought and how they felt. The data from these and from the 1938 project can still be accessed via a vast archive at the University of Sussex.((See Mass Observation (n.d.) website for more on the data available and how to access them.))

Mass Observation began with a positive vision of democratising the processes behind how data were gathered to better understand people’s lives. However, over time, much qualitative social research shifted towards the narrower analysis of consumer choice, and Mass Observation became a market-research firm in 1949¹⁴. Mass Observation relaunched in 1981, returning to its original egalitarian ideals and the archives are testament to the ways that Mass Observation aims to engage the public in the documenting of their own lives.

These historical examples of large datasets are, therefore, not so different from the qualities found in previously crowdsourced, location based, time-based data on how people feel about things, as seen in Table 5.3. The purchasing of scrubbing brushes was used as proxy data for other qualities of life in the same way our purchasing data are analysed to better understand us. Similarly, a lack of cups was indicative of a particular kind of poverty and lack of resources at a point in time, and this was analysed across the population. However, the democratic promise of Mass Observation and other projects of the time were superseded by the potential of understanding what makes people happy for commercial gain.

Fig. 5.2 What is happiness? Mass Observation competition flyer, 1938

The Economist 2017 [↩]
Poovey 1998, 93–94 [↩]
Hacking 1991, 186; Porter 1986, 1996 [↩]
Porter 1986, 11 [↩]
Sinclair 1798, vol. 20, p. xiii [↩]
Hacking 1990, 2; 1991, 186 [↩]
Bellet and Frijters 2019 [↩]
Harford 2017 [↩]
ONS 2016 [↩]
Scott 1998 [↩]
Scott 1998, 3 [↩]
Oman 2015, 88; ONS 2001, 9 [↩]
Madge and Harrisson 1937, 10 [↩]
Albert 2019 [↩]

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.