Data spaces and data portability
One of the cornerstones of the European Union’s data strategy is a substantial investment in European data spaces. But what, exactly, is a “data space”? And how does this effort relate to data portability and the work that DTI is doing? While the goals and infrastructure models between data spaces and data portability differ in some respects, the values motivating them have a good amount of overlap, as do some of the tools and experience that will be needed to implement them.
About data spaces
The concept of a data space is typically framed around a subject, such as a data space for the energy sector, agriculture, or for health data. Digital data relevant to any of these topics can support significant value creation to users. If a business is building, say, a smart energy tool designed to help users reduce their monthly bills, it’s helpful if both the tool providers and end-users are able to access data from a variety of providers on prices and fees. Other stakeholders can benefit from the existence of a data space, including government agencies and businesses in that sector, but it requires investment to achieve interoperability, fairness, and trust.
The sense in which the data is gathered in a “space” is mostly conceptual. It’s not literally housed in one data store or data warehouse. Data is generated by a diverse range of sources and managed by a diverse range of actors, and any attempt to centralize all data within a single sector in a single controlled digital storage would be horribly fragile and would not be conducive to effective use by businesses or people nor to innovation. Instead, the data space concept recognizes that data on energy prices and data on energy usage, for example, are more closely related than they are far apart. An agent that uses one kind of energy data is more likely to produce or consume another kind of energy data, so the schema and trust work can be leveraged for both kinds of data.
Some data in a data space may be collected in a centralized data store to a certain extent. An intermediary, such as one described in the EU’s Data Governance Act, might collect data from a large number of sources. It can provide important consistency, anonymization and aggregation functions, allowing more value to be derived from inconsistent data sources or simply to reduce privacy concerns.
The forces behind data spaces and related frames
The EU is funding work on data spaces through the Digital Europe Programme. Organizations such as Prometheus-X are built in part on this funding, and are developing shared trust frameworks. In the EU, trust frameworks must be mutli-layered, allowing member nations to make their own determinations of which regulators can attest to the trust status of industry participants, while still allowing pan-EU trust questions to be answered and cross-border interoperability to be achieved.
Open source software is not neglected in the policy and regulatory work, although open source licenses by themselves are no guarantee of trust or interoperability. The EU uses the term Simpl to describe its intention to build open-source tools to facilitate data transfers within its data spaces framework; a consortium of European organizations announced earlier this year that it had won the bidding to build the Simpl “middleware” platform.
Many stakeholders in the United Kingdom are participating in the same data space concepts but additionally in a couple other architectures. For example, the proposed General Data Exchange Layer, as articulated by Jon Nash in Rewiring the State - Public First, describes a future where the United Kingdom’s Department for Work and Pensions can, “with one click,” gather information from a range of companies and public sector agencies to determine an individual’s eligibility for benefits. In many cases, the data subject (citizen, resident or business) would approve the data request but then not have to actually fill out the data – instead some other entity that already has the needed data can with that approval satisfy the request. Data request routing, approval, and response automation would significantly reduce paperwork and can if carefully managed reduce fraud.
Relatedly the UK has invested significantly in the concept of Smart Data, led by the Department of Business and Trade. Despite its recent change of government, the issue remains alive, having been highlighted in the King’s Speech. DTI just returned from London, in fact, where we sponsored and contributed to the Smart Data Forum. The Department for Business and Trade within the UK government defines smart data as “the secure sharing of customer data with authorized third parties (ATPs).” Of course, that language articulates only the first step of the overall puzzle, and the Smart Data agenda is much broader. The UK is designing funding and a number of other strategies to advance this vision.
Globally, the OECD also appears to be exploring interventions in this space, as it runs a consultation through 2024 on the concept of “trusted data intermediaries;” the MyData organization and community – itself oriented towards many intersecting goals – is providing a supporting role.
How our work at DTI fits into the picture
All these efforts share a lot with the effort to improve data portability, where DTI is most active. In all of these problem spaces, shared formats are crucial to make data accessible and avoid translation errors. Trust is paramount, ensuring that the senders and receivers of data can trust each other, and that the subject of the data where applicable is authorizing the transfer. An important concept is the ability to bootstrap through discovery, so that stakeholders can hope to learn who has the data they need, at what Internet address, and what procedure to use to access the data.
Can DTI’s work on the Data Transfer Project or on a trust framework be considered part of data spaces in the broadest sense? We have collaborated with several of the aforementioned projects, but to call our work “data spaces” would be to blur our scope and complicate our requirements. Language is imprecise and we need to make sure not to overgeneralize simply based on using similar language. Both a railway flatcar and an Olympic shot putter are mechanisms that transfer iron from one location to another but the physical differences are so obvious we wouldn’t try to unify those both under a “metal transport” solution. We call user data “data” when we generalize but it’s often more accurately called “content”, and user content such as email, photos or social media posts are not the subject of any existing data spaces (though rates of photo uploads, magnitudes of social media activity, moderation or harmful content rates and other activity data around content might sooner be).
In some ways, personal data transfer is simpler. There are fewer parties involved in personal data transfer – only the user, the source for their data and the destination. In data spaces, the data flows of an entire industry include data that is public as well as private, as well as data that is proprietary to businesses. The data flows of an entire industry can originate in many more places, can involve multilayer trust relationships, and multi-step data translations, transfers and aggregations. In other ways the personal data transfer requirements are more demanding, because our use cases always involve user consent or initiation and privacy concerns, whereas often data spaces involve sharing non-personal data such as government data, business data and aggregated anonymized consumer data.
Could personal data transfer be included under the large umbrella of data spaces work anyway, to achieve unification, at least when the personal data transfer relates to a topic of a data space (such as how personal retail activity is data that a user might want to download or transfer, but also part of retail supply chain data as a whole)? Certainly in some cases, but such an approach might come with unacceptable delay in arriving at solutions. Engineers know that trying to solve too many problems at once can result in untamable complexity. Consider the complexity of Gaia-X’s specifications for data exchange services, for example. This complexity may be appropriate to the outcome of boosting the effectiveness of an entire industry, but it might be overkill for simply allowing a user to transfer their calendar from one service to another. We can work on these smaller simpler standalone features and gain the benefit of making the solutions available more quickly to users while at the same time opening doors to competitiveness and innovation even from small startups.
Looking to the future
Ultimately some of this work may converge anyway, and in the meantime we can share expertise at a high level because these efforts are complementary. In addition to exchanging notes on governance and trust models, we can share experience with schema specifics and detailed use cases. I hope and expect that we will align strategies where we can, to better shape together and measure the impact of these efforts for our global digital future.