Data Sharing: Licenses, and Agreements
In Brief Research Read
Katy McKinney-Bock | November 20, 2023
A full version of this report is available:
A Better Deal for Data is focusing on empowering people to participate in shared governance of their data, and is about making organizational commitments to data subjects, to data communities, and for the public good. When data is collected about you, we want to make a set of promises about how that data will be used which can be easily understood. This research brief is a shorter version of a full report on data sharing agreements and licenses.
Part of this is having an understanding about how data can be shared with other parties, and what legal mechanisms are out there that can both enable data sharing as well as protect your rights around having your data used by others.
For October, we asked the questions: What conditions the environment for sharing data, and what are some examples of mechanisms for data sharing? More specifically, we explored questions like: what are the common elements present in data sharing agreements? What licenses are commonly used to share data? What policies do organizations put in place around sharing information?
We start from a position that sharing data is (mostly) good; the more shared knowledge we have in the world, the more good can be done through efficient re-use. At the same time, it is important to protect people’s privacy and ensure that there are safeguards for how, when, why, and to whom data is being shared. Also: we can’t make things too complicated and impractical, or nobody will understand if their rights are protected and they are less likely to share at all.
The Open Data Institute (ODI) discusses different types of data as being along a spectrum from “open” to “closed”, with an understanding that if data is small and/or personal, it is likely to be more closed, and larger and/or more government data is likely to be more open. When data can be opened or made public, this is accomplished through a license that grants permission for its use. To define open data, the ODI as well as the Digital Public Goods Alliance (below), among many others, adopt the Open Definition, published by the Open Knowledge Foundation:
“Knowledge is open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness.” (Open Knowledge Foundation, 2015)
The Digital Public Goods Alliance (DPGA), an initiative that supports investment into digital public goods with a focus on achieving SDGs, maintains a list of licenses for software, data, and content that they believe to be conformant to the Open Definition, by SPDX identifiers (Digital Public Goods Alliance, 2022). One of the minimum requirements to enter a digital asset into the DPGA registry is licensing with an approved open license. Their standard states:
“For open content collections the use of a Creative Commons license is required. DPGs are encouraged to use a license that allows for both derivatives and commercial reuse (CC BY and CC BY-SA), or dedicate content to the public domain (CC0); licenses that do not allow for commercial reuse (CC BY-NC and CC BY-NC-SA) are also accepted. For open data, an Open Data Commons approved license is required.” (Digital Public Goods Alliance, n.d.)
Creative Commons is a nonprofit organization that provides a set of simple copyright licenses that, over the past twenty years, has become a standard for giving permission to share and use creative works. CC licenses have been used for purposes beyond copyrighted creative content, and there has been some discussion about the appropriateness of these additional uses of CC licenses. In particular, it is important to be aware of how and when CC licenses can be used for both data and databases; often, the data is highly factual and not necessarily covered under copyright law (e.g. facts/knowledge are not subject to copyright protection in many jurisdictions).
Open Data Commons, created in 2007, has a set of three open licenses for data: the Open Data Commons Open Database License, aka “Attribution Share-Alike for data/databases”, the Open Data Commons Attribution License, aka “Attribution for data/databases”, and the Open Data Commons Public Domain Dedication and License (PDDL), aka “Public Domain for data/databases” (Open Knowledge Foundation, n.d.-c). Along with the PDDL license, Open Data Commons provides a set of Community Norms that can be appended. Open Data Commons and Creative Commons discuss the differences between their licenses, both pointing out that CC licenses are intended for a broader set of artifacts and only those subject to copyright, while the Open Data Commons licenses are specific to databases.
Summary table:
As part of the rapid development of AI technologies, new licenses that attempt to manage some of the ambiguities arising from complexities in AI’s data use have been developed. Three widely discussed examples include: the Montreal Data License Framework (Benjamin et al., 2019), the Responsible AI Licenses (RAIL, Contractor et al., 2022), and the Allen AI ImpACT licenses (see e.g. Allen Institute for AI, n.d.; Dumas, 2023). The use of data in AI/ML applications has not only challenged current licensing models because of how differences in the technology and its application of data isn’t covered by previous open licenses, but also stimulated discussion about whether AI/ML uses should have behavioral use restrictions to prevent harm. The latter is a deeper question that goes back to the Open Definition, and one that several major organizations are discussing at present (e.g. the Open Source Initiative, Open Knowledge Foundation, and Creative Commons).
A different approach is required when dataset access must be restricted, such as when they contain personal or sensitive information, or if there is other risk of harm which would prevent release of the dataset. In some cases, (at least) two parties must enter into an agreement, whether informal or formal, to share the data for use. Recent examples of data sharing policies, principles, and data sharing agreements(“DSAs”) include:
Policy: Wellcome Trust Sanger Institute, March 2023: The Wellcome Trust Sanger Institute is a genomic research institute established in 1936 in the UK. Their Data Sharing Policy document contains a characteristic set of “practical guidance” for researchers. Both open data and restricted data are covered, though it is presumed that most data will be released according to the policy. There are limited exceptions listed that would permit no release of the data, e.g. sensitive studies, where release of data would risk harm or reidentification of participants. The components of the policy address the types of data that are under the scope of the policy (e.g. sequencing data, genomic data) and minimum requirements for furnishing metadata. The policy also addresses risk assessment for release of summary statistics and the possibility of reidentification, with examples of low and high-risk data. There is a release timing table based on data type and a section on anonymization procedures. The policy also requires that all external collaborators should adhere to this policy.
Model Data Sharing Agreement: Chatham House Model Principles for Public Health, 2017: Chatham House, (formal name: the Royal Institute of International Affairs), is a UK think tank that has existed since 1920. In 2017, they published a principles-based report on data sharing in public health called A Guide to Sharing the Data and Benefits of Public Health Surveillance. Part of this is a model data sharing agreement. The agreement defines two parties, the data provider and the data recipient, and states a general purpose of sharing the data to improve public health. The principles provided include discussion of sharing mechanisms that are compatible with laws, ethical principles, and authorized by both parties; involve timely sharing and interoperable formats; have transparent objectives for the sharing; promote cooperation, protect privacy interests, and are equitable and ensure benefits return to the communities where the data originated.
The ODI, the UK Information Commissioner’s Office, the Australian Research Data Commons, and the Contracts for Data Collaboration also have checklists/best practices for data sharing agreements. There is consensus that a DSA should include who is involved in the sharing, why data is being shared, what the data is that will be shared, and involve a set of conditions on the sharing that take into account both the legal and policy context(s), privacy contexts, and sensitivity of the data as well as what the data will be used for. Broader compliance with governance, policy, and legal frameworks should all be a part of the agreement. As a resource for model agreements, the Contracts for Data Collaboration project, a collaborative effort between NYU’s GovLab (an NYU lab focused on governance research), TReNDS (Thematic Research Network on Data and Statistics), the University of Washington’s Information Risk Research Initiative (in the UW Applied Physics Laboratory), and the World Economic Forum also includes a library of 43 sample agreements that can be referenced and searched by question/topic.
Some other recognized frameworks that guide decision-making around data sharing include The Five Safes Framework, which asks five questions to ensure that safety is prioritized for the data, projects, people, settings, and outputs; the TRUST Code, which is a global research code of conduct that talks about involving local communities in data ownership and sharing results; and a framework called the Data Responsibility Journey that was developed by GovLab, which identifies two key components of data sharing to be trust and collaboration.
The full report also reviews some of the available mechanisms for data sharing for both open and non-open data, which is currently overshadowed by fast-moving discussions of what it means to be ‘open’ in AI and how to respond to data being used in new AI technologies. I discuss licenses and data sharing agreements as mechanisms for sharing, in addition to principles and frameworks for decision-making around how to share data in a way that builds trust, collaboration, and safety.
This work is licensed under CC BY 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
This work is supported by a subaward from OpenTEAM as an initiative of Wolfe’s Neck Center for Agriculture and the Environment, specifically funded by the U.S. Department of Agriculture under agreement number NR233A750004G032. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of any funder. In addition, any reference to specific brands or types of products or services does not constitute or imply an endorsement.
References
Allen Institute for AI. (n.d.). AI2 ImpACT Licenses. Retrieved October 30, 2023, from https://allenai.org/impact-license
Australian Research Data Commons. (2023). Data sharing agreement development guidelines. https://doi.org/10.5281/zenodo.7553198 Ávila, R. (2023, March 16). Updating the Open Definition to meet the challenges of today. Open Knowledge Foundation Blog. https://blog.okfn.org/2023/03/16/updating-the-opendefinition-to-meet-the-challenges-of-today/
Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., & Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License (arXiv:1903.12262). arXiv. http://arxiv.org/abs/1903.12262
Carroll, M. W. (2015). Sharing Research Data and Intellectual Property Law: A Primer. PLOS Biology, 13(8), e1002235. https://doi.org/10.1371/journal.pbio.1002235
Chatham House. (2017). A Guide to Sharing the Data and Benefits of Public Health Surveillance (p. 40). Chatham House, The Royal Institute of International Affairs. https://chathamhouse.soutron.net/Portal/Public/en-GB/RecordView/Index/169144
Contractor, D., McDuff, D., Haines, J. K., Lee, J., Hines, C., Hecht, B., Vincent, N., & Li, H. (2022). Behavioral Use Licensing for Responsible AI. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 778–788. https://doi.org/10.1145/3531146.3533143
Creative Commons. (n.d.-a). About CC Licenses. Creative Commons. Retrieved October 30, 2023, from https://creativecommons.org/share-your-work/cclicenses/
Creative Commons. (n.d.-b). Open Data. Creative Commons. Retrieved October 10, 2023, from https://creativecommons.org/about/open-data/
Creative Commons. (2020). Creative Commons Strategy 2021-2025. https://drive.google.com/file/d/10rQDv5Hzuss38oi1ovGuoxHagmFzqn_f/view?usp=embed_fa cebook
Creative Commons. (2022, December 9). Our Work in Policy at CC: Data. Creative Commons. https://creativecommons.org/2022/12/09/our-work-in-policy-at-cc-data/
Data—Creative Commons. (n.d.). Retrieved October 10, 2023, from https://wiki.creativecommons.org/wiki/data
Digital Public Goods Alliance. (n.d.). Digital Public Goods Standard. Retrieved October 10, 2023, from https://digitalpublicgoods.net/standard/
Digital Public Goods Alliance. (2022, September 20). Approved Licenses for Digital Public Goods. GitHub. https://github.com/DPGAlliance/publicgoods-candidates/blob/main/helpcenter/licenses.md
Dumas, J. (2023, July 27). The AI2 ImpACT License—A new way to think about AI licensing. Medium. https://blog.allenai.org/the-ai2-impact-license-a-new-way-to-think-about-ai-licensingbc90ff26a9ee
FAQ. (n.d.). Responsible AI Licenses (RAIL). Retrieved October 30, 2023, from https://www.licenses.ai/faq-2
Five safes. (2022). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Five_safes&oldid=1117226298
Five Safes framework | Australian Bureau of Statistics. (2021, August 11). https://www.abs.gov.au/about/data-services/data-confidentiality-guide/five-safes-framework
GovLab. (n.d.). Data Responsibility Journey—Sharing. Retrieved October 20, 2023, from https://dataresponsibilityjourney.org/sharing
Growth and Adoption of RAIL Licenses. (n.d.). Responsible AI Licenses (RAIL). Retrieved October 30, 2023, from https://www.licenses.ai/license-adoption
National Institute of Health. (2023a). Data Management and Sharing Policy: Frequently Asked Questions. https://sharing.nih.gov/faqs#/data-management-and-sharingpolicy.htm?anchor=56545
National Institute of Health. (2023b). NOT-OD-21-013: Final NIH Policy for Data Management and Sharing. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
Open Data Institute. (2022). Health data governance: A playbook for non-technical leaders. https://open-data-institute.gitbook.io/data-governance-playbook/
Open Data Institute. (2020, September 26). The Data Spectrum. The ODI. https://theodi.org/insights/tools/the-data-spectrum/
Open Knowledge Foundation. (n.d.-a). About. Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/about/ Open Knowledge Foundation. (n.d.-b). Frequently Asked Questions. Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/faq/
Open Knowledge Foundation. (n.d.-c). Licenses. Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/licenses/
Open Knowledge Foundation. (n.d.-d). ODC Attribution-Sharealike Community Norms. Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/norms/odc-by-sa/
Open Knowledge Foundation. (n.d.-e). Open Data Commons Attribution License (ODC-By) Summary. Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/licenses/by/summary/
Open Knowledge Foundation. (n.d.-f). Open Data Commons Open Database License (ODbL). Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/licenses/odbl/
Open Knowledge Foundation. (n.d.-g). Open Data Commons Public Domain Dedication and License (PDDL). Open Data Commons: Legal Tools for Open Data. Retrieved October 30, 2023, from https://opendatacommons.org/licenses/pddl/
Open Knowledge Foundation. (2015). Open Definition 2.1. https://opendefinition.org/od/2.1/en/
Open Knowledge Foundation. (2023, October 10). Open Knowledge Foundation joins the Digital Public Goods Alliance. https://blog.okfn.org/2023/10/10/open-knowledge-foundation-joinsthe-digital-public-goods-alliance/
Responsible AI Licenses (RAIL). (n.d.). About. Responsible AI Licenses (RAIL). Retrieved October 30, 2023, from https://www.licenses.ai/about The Five Safes. (n.d.). Retrieved October 31, 2023, from http://www.fivesafes.org/
TRUST. (2018). The TRUST code -A Global Code of Conduct for Equitable Research Partnerships. https://doi.org/10.48508/GCC/2018.05
UK Data Service. (n.d.). What is the Five Safes framework? UK Data Service. Retrieved October 20, 2023, from https://ukdataservice.ac.uk/help/secure-lab/what-is-the-five-safes-framework/
UK Information Commissioner. (2022). Data sharing: A code of practice (v1.0.31). UK Information Commissioner. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/datasharing/data-sharing-a-code-of-practice/
UN Department of Economic and Social Affairs. (n.d.). Goal 3. UN Department of Economic and Social Affairs. Retrieved October 31, 2023, from https://sdgs.un.org/goals/goal3
Wellcome Trust Sanger Institute. (2023, March). Data Sharing Policy and Guidelines. https://fairsharing.org/FAIRsharing.cg2j1y
What are ‘controllers’ and ‘processors’? (2023, September 29). ICO. https://ico.org.uk/fororganisations/uk-gdpr-guidance-and-resources/controllers-and-processors/controllers-andprocessors/what-are-controllers-and-processors/