Data in Context

In Brief Research Read

Katy McKinney-Bock | April 2, 2024

A full version of this report is available:

This report brings together a set of questions around putting data in a context. How can we develop and begin to test the feasibility of a general set of commitments regarding data stewardship, intended to be applicable to a wide variety of contexts? What contextual variables or parameters are important to be able to avoid harms and empower people to understand and participate in how their data is used. What role do governance frameworks play in creating and understanding context?

Throughout this year, A Better Deal for Data collaborators have been asked, and have asked, questions around context. We have been challenged to consider the intractability of creating a set of principles that exist independent of a context. The question of context arose in the desk research around data licenses, when examining license components, including newer AI licenses that contain use restrictions, such as RAIL and Allen AI ImpACT licenses (Data Sharing: Licenses and Agreements, Responsible AI Licenses (RAIL), (n.d.), Allen Institute for AI, (n.d.)). And, it returned a third time in our research on data governance, where we found discussions about the context-sensitive nature of data governance, and concerns about the ability to use a ‘one-size-fits-all’ model for governance, or to reproduce a governance model easily in a new context (Data Governance, Part 1, Lopez Solano et al., (2022)).

In this report, I set out to examine data context from two perspectives: data feminism and data privacy. In the full report, I summarize a set of selected resources that (i) apply data feminism to agriculture, (ii) explore how data feminism is being used in AI ethics, research, and development, and (iii) develop concepts aligned with data feminism and which share a goal of understanding how data is contextdependent/dependent on a context/situated, such as data productivism and data arenas.

Context in Data Feminism

Data Feminism (D’Ignazio & Klein, 2020) sets out an argument that data science needs intersectional feminism. Their view of feminism includes the ways in which power is distributed unequally between men, women, and people of other genders, but it also explores how other “isms”, such as racism, classism, ableism, colonialism, etc., also intersect with sexism to contribute to injustice or oppression. This concept is intersectionality (Crenshaw, 1989). Intersectionality defines the ways in which different aspects of a person’s identity can multiply to exacerbate the effects of power and privilege and contribute to oppression, and intersectional feminism works to “challenge and change the distribution of power” – and, critically, is grounded in a belief that oppression hurts everyone.

In the book, they define seven principles that are based in intersectional feminism, and would guide data science towards a feminist perspective. The seven principles are (D’Ignazio & Klein, 2020, pp. 17– 18):

  1. Examine power
  2. Challenge power
  3. Elevate emotion and embodiment
  4. Rethink binaries and hierarchies
  5. Embrace pluralism
  6. Consider context
  7. Make labor visible

Principle 6, Consider Context, states: “Data feminism asserts that data are not neutral or objective. They are the products of unequal social relations, and this context is essential for conducting accurate, ethical analysis” (Ibid., Chapter 6, p. 2). Some illustrations of ways in which data may lack context:

    • Lack of documentation (who, what, when, where, how, why), on an increasing amount of data available on the web
    • Misunderstanding or misinterpretation of what is a dataset actually represents/counts
    • Failure to understand who is or isn’t included in the dataset
    • Misaligned incentives around data collection (and when, or when not, to collect it)

In addition to showing the shortcomings of data provided without sufficient context, D’Ignazio and Klein also provide examples of approaches to data science that build context, or “restore” context around data (see Chapter 6, p. 23-24). Here is a brief list of the resources they discuss:

Some other recent work that appears aligned with building context around data are the Data nutrition labels (Kasia Chmielinski, et al.) and Civic Software Foundation’s context-aware approach (Cat Nikolovski, et al.). These projects aim to create a practical process for recording and presenting information about data.

In a 2024 perspective piece in Nature Food, Andrea Rissing and colleagues apply the seven principles of data feminism to the National Agricultural Statistics Service Quick Stats database (NASS). NASS contains the Census of Agriculture, which includes a count of all farms/ranches and primary operators, as well as a set of annual surveys, and is a key primary source for research about U.S. agriculture. Using principles from data feminism, they illustrate shortcomings of NASS when thinking of using it for food systems research that is equitable and sustainable:

The authors also provide a set of recommendations for data providers and for data users that are aligned with feminist principles and would lead toward better research on sustainable/equitable agriculture. These principles include improving access to and documentation of datasets and tools, sharing more data, improving what data is collected, remaining critical of data, and utilizing mixed methods of analysis (combining qualitative with quantitative analysis).

In the full report, I also discuss concepts related to data feminism and putting data in context, such as data arenas and data productivism, as well as exploring recent research by the Distributed AI Research Institute and The Good Robot book and podcast.

Context in Privacy

In the 1990s, Helen Nissenbaum proposed the theory of contextual integrity (CI), which rejects the concept that privacy is a stoppage of information flow, and rather, defines a set of five contextual parameters that establish norms of privacy. The evaluation of a given situation with respect to these contextual parameters then determines whether privacy is violated. The theory of contextual integrity has had immense practical value in examining situations and evaluating whether and how privacy norms were violated.

However, there are new challenges for the application of contextual integrity into current machine learning systems – in particular, with data that Nissenbaum refers to as “lower-order data” (data like IoT sensors). Nissenbaum’s (2019) paper is organized into two sections: a review of the original work on CI, and an update in the consideration of the scale of data and AI, where she introduces a concept of a data food chain to account for privacy in a larger, more networked data landscape.

I present a summary of Nissenbaum’s work in roughly the order she presents in Nissenbaum (2019), and then move to a discussion of how some of her concepts of putting privacy in context relate to those in the data feminism literature.

Nissenbaum has four thesis statements that describe the theory of contextual integrity, which have been applied and tested extensively in subsequent years (Ibid., p. 225, 228, 231):

  1. “Privacy is the appropriate flow of personal information”
  2. “Appropriate flows conform with contextual information norms (“privacy norms”)”
  3. “Five Parameters Define Privacy (Contextual Informational) Norms: Subject, Sender, Recipient, Information Type, and Transmission Principle”
  4. “The Ethical Legitimacy of Privacy Norms is Evaluated in Terms of: A) Interests of Affected Parties, B) Ethical and Political Values, and C) Contextual Functions, Purposes, and Values.”

In part two, Nissenbaum constructs the idea of a data food chain, which is a hierarchy of data, where higher order data has some sort of dependency on lower order data (Ibid., p. 236). As one example, the lower order data is the employment database and the database of welfare recipients, and the higher order data is those of welfare fraudsters (which is dependent on inferences from the lower order data). In today’s data world, this has scaled up by orders of magnitude. And, the current state of data permits the type of inference from lower-order to higher-order data somewhat without restriction, which is a problem for privacy (Ibid., p. 236).

Nissenbaum argues that “CI [contextual integrity] offers far more nuanced alternatives than simply ‘ban access!’, ‘ban flow,’ or ‘give subjects control!’” (Ibid, p. 248). It allows selection of a set of recipients, or constrained transmission principles, etc. In other words, Nissenbaum’s core argument is that contextual integrity, based on information flows, can be augmented with a directionality in the form of a data food chain – can help us understand, in a nuanced way, how to describe privacy violations in the age of machine learning. However, there are still challenges for practical application, because the nature of data is changing so rapidly that evaluation of contextual integrity needs to shift. She has a current project working on this (see Nissenbaum, 2019, fn. 57).

In exploring data in context, this report describes two perspectives that define and evaluate data in and around a context (among many others). I summarize current discussions about context in a data feminist approach, which treats data as non-objective and non-neutral, and requires context for analysis. Then, I summarized a longstanding, practical approach to privacy based on contextual integrity, which requires contextual variables to be filled in to understand if information flows are appropriate or not.

In putting these side-by-side, there are some potential similarities in the lessons learned, which are reinforced by both approaches. First, data is context-dependent: any analysis, evaluation or use is incomplete without an understanding of context. Second, it is critical to consider data use as part of context. Even given this, it does seem possible to develop an understanding of which parameters are relevant to contextual evaluation, a la Nissenbaum, in order to create a framework for evaluating a set of principles against situations involving data. In this regard, a careful goal to develop a practical set of tools that allows us to evaluate data in context does not seem completely out of reach.


 

This work is licensed under CC BY 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

This work is supported by a subaward from OpenTEAM as an initiative of Wolfe’s Neck Center for Agriculture and the Environment, specifically funded by the U.S. Department of Agriculture under agreement number NR233A750004G032. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of any funder. In addition, any reference to specific brands or types of products or services does not constitute or imply an endorsement.

Please send feedback via email to [email protected].

References

Abrams, M., Abrams, J., Cullen, P., & Goldstein, L. (2019). Artificial Intelligence, Ethics, and Enhanced Data Stewardship. IEEE Security & Privacy, 17(2), 17–30. https://doi.org/10.1109/MSEC.2018.2888778

Ada Lovelace Institute. (2021, March 4)..Disambiguating data stewardship. https://www.adalovelaceinstitute.org/blog/disambiguating-data-stewardship/

Airoldi, M. (2022). Machine Habitus: Toward a Sociology of Algorithms. Wiley. https://www.wiley.com/enus/Machine+Habitus%3A+Toward+a+Sociology+of+Algorithms-p-9781509543274

Allen Institute for AI. (n.d.). AI2 ImpACT Licenses. Retrieved October 30, 2023, from https://allenai.org/impact-license

Arnstein, S. R. (1969). A Ladder Of Citizen Participation. Journal of the American Institute of Planners, 35(4), 216–224. https://doi.org/10.1080/01944366908977225

Baker, K. S., & Yarmey, L. (2009). Data Stewardship: Environmental Data Curation and a Web-of-Repositories. International Journal of Digital Curation, 4(2), Article 2. https://doi.org/10.2218/ijdc.v4i2.90

Bloom, G. (2020). The Principles of Governing Open Source Commons. SustainOSS: Exploring Sustainability for Open Source Communities. https://sustainoss.pubpub.org/pub/jqngsp5u/release/1

Boeckhout, M., Zielhuis, G. A., & Bredenoord, A. L. (2018). The FAIR guiding principles for data stewardship: Fair enough? European Journal of Human Genetics, 26(7), Article 7. https://doi.org/10.1038/s41431-018-0160-0

Chandler, K. (2024). Good Technology Subverts Militarism. In The Good Robot: Why technology needs feminism, eds. Eleanor Drage and Kerry McInerney. https://www.thegoodrobot.co.uk/thebook

Crenshaw, K. (1989). Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum(1.8). http://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8

D’Ignazio, C. (2024a). Counting Feminicide. The MIT Press. https://mitpress.mit.edu/9780262048873/countingfeminicide/

D’Ignazio, C. (2024b). Good Technology Challenges Power. In The Good Robot: Why technology needs feminism, eds. Eleanor Drage and Kerry McInerney. https://www.thegoodrobot.co.uk/thebook

D’Ignazio, C., & Klein, L. F. (2020). Data Feminism. The MIT Press. https://doi.org/10.7551/mitpress/11805.001.0001

Distributed AI Research Institute (DAIR). (n.d.). Retrieved March 8, 2024, from https://www.dair-institute.org/

Drage, E., & McInerney, K. (2024). The Good Robot: Why technology needs feminism. https://www.thegoodrobot.co.uk/thebook

Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age (p. 12615). (2009). National Academies Press. https://doi.org/10.17226/12615

Favour Borokini & Bonaventure Saturday. (n.d.). Exploring the Future of Data Governance in Africa Data Stewardship, Collaboratives, Trusts and More [White Paper]. Pollicy.

Hilgartner, S., & Brandt-Rauf, S. I. (1994). Data Access, Ownership, and Control: Toward Empirical Studies of Access Practices. Knowledge, 15(4), 355–372. https://doi.org/10.1177/107554709401500401

Hoffmann, C. P., Lutz, C., & Ranzini, G. (2024). Inequalities in privacy cynicism: An intersectional analysis of agency constraints. Big Data & Society, 11(1), 20539517241232629. https://doi.org/10.1177/20539517241232629

Information Commissioner’s Office of the UK. (2023, May 19). Legal definitions. ICO. https://ico.org.uk/fororganisations/data-protection-fee/legal-definitions-fees/

Jasanoff, S., & Kim, S.-H. (Eds.). (2015). Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power. University of Chicago Press. https://press.uchicago.edu/ucp/books/book/chicago/D/bo20836025.html

Joshi, D., & Singh, A. (2021). The Histories, Practices, and Policies of Community Data Governance in the ‘Global South’ (SSRN Scholarly Paper 4506644). https://doi.org/10.2139/ssrn.4506644

Kapoor, A., & Whitt, R. S. (2021). Nudging Towards Data Equity: The Role of Stewardship and Fiduciaries in the Digital Economy (SSRN Scholarly Paper 3791845). https://doi.org/10.2139/ssrn.3791845

Kidd, D. (2019). Extra-activism: Counter-mapping and data justice. Information, Communication & Society, 22(7). https://www.tandfonline.com/doi/abs/10.1080/1369118X.2019.1581243

Kitchin, R., & Lauriault, T. (2014). Towards Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work (SSRN Scholarly Paper 2474112). https://papers.ssrn.com/abstract=2474112

Lewis, J. E. (2024). Good Technology is Messy. In The Good Robot: Why technology needs feminism, eds. Eleanor Drage and Kerry McInerney. https://www.thegoodrobot.co.uk/thebook

Lopez Solano, J., de Souza, S., Martin, A., & Taylor, Linnet. (2022). Governing data and artificial intelligence for all: Models for sustainable and just data governance. European Parliament. https://doi.org/10.2861/915401

Loukissas, Y. A. (2022). All Data Are Local: Thinking Critically in a Data-Driven Society. https://mitpress.mit.edu/9780262545174/all-data-are-local/

Lutz, R., & Greene, S. (1999). Data Stewardship: The Care and Handling of Named Entities. Proceedings of the ASIST Annual Meeting, 36. https://www.learntechlib.org/p/87517/

Mattoni, A. (2020). The grounded theory method to study data-enabled activism against corruption: Between global communicative infrastructures and local activists’ experiences of big data. European Journal of Communication, 35(3), 265–277. https://doi.org/10.1177/0267323120922086

MindKind: A mixed-methods protocol for the feasibility of global digital mental health studies in young people. (2022). Wellcome Open Research, 6, 275. https://doi.org/10.12688/wellcomeopenres.17167.2

Mitchell, M. (2024a). Good Technology is Inclusive. In The Good Robot: Why technology needs feminism, eds. Eleanor Drage and Kerry McInerney. https://www.thegoodrobot.co.uk/thebook

Mitchell, M. (2024b, February 29). Ethical AI Isn’t to Blame for Google’s Gemini Debacle. TIME. https://time.com/6836153/ethical-ai-google-gemini-debacle/

Montenegro de Wit, M., & Canfield, M. (2024). ‘Feeding the world, byte by byte’: Emergent imaginaries of data productivism. The Journal of Peasant Studies, 51(2), 381–420. https://doi.org/10.1080/03066150.2023.2232997

Moorosi, N., Sefala, R., & Luccioni, S. (2023, December 15). AI for Whom? Shedding Critical Light on AI for Social Good. NeurIPS 2023 Computational Sustainability: Promises and Pitfalls from Theory to Deployment. https://openreview.net/forum?id=vjwYYlA8Pj

National Research Council, Division on Earth and Life Studies, Board on Atmospheric Sciences and Climate, & Committee on Climate Data Records from NOAA Operational Satellites. (2005). Review of NOAA’s Plan for the Scientific Data Stewardship Program. National Academies Press.

Nissenbaum, H. (2019). Contextual Integrity Up and Down the Data Food Chain. Theoretical Inquiries in Law, 20(1), 221–256. https://doi.org/10.1515/til-2019-0008

Nost, E., & Goldstein, J. E. (2022). A political ecology of data. Environment and Planning E: Nature and Space, 5(1), 3– 17. https://doi.org/10.1177/25148486211043503

O’hara, K. (2019). Data Trusts: Ethics, Architecture and Governance for Trustworthy Data Stewardship. WSI White Papers, University of Southampton, 1. https://doi.org/10.5258/SOTON/WSI-WP001

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (1st edition). Crown.

Ostrom, E. (1990). Governing the commons: The evolution of institutions for collective action. Cambridge University Press.

Peng, G. (2018). The State of Assessing Data Stewardship Maturity – An Overview. Data Science Journal, 17, 7–7. https://doi.org/10.5334/dsj-2018-007

Pequeño, A., IV. (2024, March 26). Google’s Gemini Controversy Explained: AI Model Criticized By Musk And Others Over Alleged Bias. Forbes. https://www.forbes.com/sites/antoniopequenoiv/2024/02/26/googles-geminicontroversy-explained-ai-model-criticized-by-musk-and-others-over-alleged-bias/?sh=177b3c0d4b99

Raghavan, P. (2024, February 23). Gemini image generation got it wrong. We’ll do better. The Keyword. https://blog.google/products/gemini/gemini-image-generation-issue/

Ramdeen, S., & Hills, D. J. (2013). ESIP’s Emerging Provenance and Context Content Standard Use Cases: Developing Examples and Models for Data Stewardship. 2013, IN53C-1578.

Responsible AI Licenses (RAIL). (n.d.). About. Responsible AI Licenses (RAIL). Retrieved October 30, 2023, from https://www.licenses.ai/about

Rosenbaum, S. (2010). Data Governance and Stewardship: Designing Data Stewardship Entities and Advancing Data Access. Health Services Research, 45(5p2), 1442–1455. https://doi.org/10.1111/j.1475-6773.2010.01140.x

Sagli, J. R., & Egeland, O. (1991). Dynamic coordination and actuator efficiency using momentum control for macromicro manipulators. 1201,1202,1203,1204,1205,1206-1201,1202,1203,1204,1205,1206. https://doi.org/10.1109/ROBOT.1991.131773

Saxena, S. (2023, September 25). Data Sandboxes: Managing the Open Data Spectrum. Data Stewards Network. https://medium.com/data-stewards-network/data-sandboxes-managing-the-open-data-spectrum6ef3bf9c5133

Sefala, R., Gebru, T., Mfupe, L., Moorosi, N., & Klein, R. (2021, August 29). Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=WV0waZz9dTF

Ślosarski, B. (2023). Data arenas: The relational dynamics of data activism. https://journals.sagepub.com/doi/full/10.1177/20539517231177617

Soni, S. (2021, November 22). Empowering communities with data stewardship. The Data Economy Lab. https://thedataeconomylab.com/2021/11/22/empowering-communities-with-data-stewardship/

Strasser, C. (2013). DataUp: Enabling data stewardship for researchers. https://doi.org/10.9776/13300

Toczydlowski, R. H., Liggins, L., Gaither, M. R., Anderson, T. J., Barton, R. L., Berg, J. T., Beskid, S. G., Davis, B., Delgado, A., Farrell, E., Ghoojaei, M., Himmelsbach, N., Holmes, A. E., Queeno, S. R., Trinh, T., Weyand, C. A., Bradburd, G. S., Riginos, C., Toonen, R. J., & Crandall, E. D. (2021). Poor data stewardship will hinder global genetic diversity surveillance. Proceedings of the National Academy of Sciences, 118(34), e2107934118. https://doi.org/10.1073/pnas.2107934118

United Nations Economic Commission for Africa. (2023, May 25). StatsTalk-Africa: Data Stewardship in Africa. Events, UNECA. https://www.uneca.org/eca-events/statstalk-africa-data-stewardship-africa

van den Hoven, J. (1999). Information Resource Management: Stewards of Data. Information Systems Management, 16(1), 88–90. https://doi.org/10.1201/1078/43187.16.1.19990101/31167.13

Verhulst, S. G. (2021). Reimagining data responsibility: 10 new approaches toward a culture of trust in re-using data to address critical public needs. Data & Policy, 3, e6. https://doi.org/10.1017/dap.2021.4

Verhulst, S. G. (2023, March 13). Wanted: Data Stewards — Drafting the Job Specs for A Re-imagined Data Stewardship Role. Data Stewards Network. https://medium.com/data-stewards-network/wanted-datastewards-drafting-the-job-specs-for-a-re-imagined-data-stewardship-role-f7cd28a83379

Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

Related

Informed Consent

What is informed consent? What does it mean to give consent, for example, to participate in a service? How is consent managed, both by institutions and via technologies?