The complex problem space of data portability
This week, I was invited by my friend Chinmayi Sharma to speak to her class at UT Austin (virtually - another instance where I’m glad for the ways we’ve all collectively embraced remote engagement!). Her students are studying decentralization and how protocols and interoperability factor in - a timely space that has seen a ton of new ideas in recent years. I was of course eager to talk about the Data Transfer Initiative and our relevant work, but to do so properly in an educational setting requires quite a lot of context and, well, education. A lot of what I talked about falls into a category of “data portability fundamentals” that isn’t yet widely known, so I’m writing up a version of that here - enjoy!
The beginning of this intellectual journey - focusing on what people are reading about and thinking about in the news - is competition. We’re in the midst of a broadly perceived change in how we think about “big” in tech - a seesaw (or perhaps a persistent false binary?) from the classical “Chicago School” of economic thought that prized the efficiencies of vertical integration over basically everything else, to the rise in “break them up” analysis and politics in which big is seen as inherently problematic.
For my part, I believe that a major contributing factor to modern tech tensions is when people feel trapped in a service. (I emphasized this quite a lot in my Techdirt podcast appearance.) From that perspective, it’s less the size of the company that matters, and more how much users feel (and are) empowered to shape their digital and data futures. That, of course, segues right into data portability, because people hold a natural and rational desire to continue to have access to the pictures they’ve uploaded, the posts they’ve written, and the music playlists they’ve created within a service. Free the data, free the user, or so the story goes.
The General Data Protection Regulation, the European Union’s big privacy law adopted in 2016, established data portability as a fundamental right for digital citizens. Everyone should have access to their data, and the right to export it and use it as they see fit. While data portability dates back many years prior (see, for example, this blog post from Google in 2009), the GDPR reflects a milestone in the universal acceptance of data portability as a valuable good. And today, online services offer a number of ways for users to download a copy of their data, including activity data.
To realize the good in data portability as derived from values of data control and privacy, it’s sufficient for a user to be able to download their data in a structured, machine-readable form. But if a user wants to export data from one service and import it into another, the nature of that structure matters. In most cases, data isn’t represented the same way in two different systems, and some level of translation must be done. Facilitating that translation and transfer is the core of DTI’s product goal, and what we build tools for.
All of this is what I’ve started (sometimes, at least) calling “classic data portability” - the use case where one user wants to move data between two different platforms. It’s in contrast to a perhaps related, but fundamentally distinct, problem: that of “protocol interoperability.” That’s where two users on two different platforms exchange information in real time, because they agree in advance on a protocol for the communication. The abstract frame of interoperability - the idea that systems should work together and be able to exchange data - encompasses both, but from an engineering perspective (and thus from a regulatory one, if the regulation is designed properly) the problems are very different.
(Brief tangent, indulge me!) I often encounter the false notion that data portability is a sort of compromise position to interoperability. But from my perspective, it’s an apples and oranges comparison. Look at messaging services. Interoperability is forward-looking in the sense that it means messages sent by one user going forward must be able to reach another user on another platform. That doesn’t mean that a user has the ability to extract their message history from a service (aka data portability); because that’s a different problem, with different technical solutions, though there are some considerations and constraints common to both, such as the need to authenticate the user and to protect privacy and security of the user and the data. (OK, tangent over.)
In addition to being important, the distinction between classic data portability and protocol interoperability is also incomplete. The concept of data transfer between platforms also includes bulk, aggregate, enterprise-style transfers of data (distinct from personal data). And the EU’s newest law related to data portability, the Digital Markets Act, adds a fourth category, because it requires data portability to be offered continuously and in real time. That’s … technically very different from the bulk, stateless, retrospective “classic” data portability. The value it serves is the same universally recognized good, but the implementation will need to be quite different. Look for more from DTI on that to come - and for a start, you can check out my slides from the Brussels workshop in May, available on the Commission’s website (look for the link to materials used by the panelists, or message me here!).