Tokenization of data is still a nascent field, but it is emerging as an area of growing interest, as highlighted in a recent Forbes article: Why Is the AI Engine Data the Most Overlooked Real-World Asset?
The field is often overlooked because other types of "real-world assets" are easier to tokenize and, in that sense, represent low-hanging fruit. Data, on the other hand, is inherently more complex for several reasons. Sensitivity: data can include personal information or reveal business secrets. Intellectual property: data may be owned or controlled by third parties, which restricts how it can be used. Unclear value: unlike physical assets, the value of a dataset depends heavily on the specific use case. The same dataset may be extremely valuable for one purpose but nearly irrelevant for another. Quality and trust: data quality varies and is often hard to verify in advance. Assigning value reliably is therefore difficult.
Because of these challenges, data provenance — the ability to verify where data came from, how it has been processed, and what rights are associated with it — becomes even more important than for other real-world assets. Provenance ensures not only whether the data is legally safe to use, but also whether it is reliable and fit for purpose.
There are essentially two sides to data tokenization: ownership of data and access to data. While the two are linked through provenance, let’s briefly look at them separately.
Ownership tokenization could work such that any individual or entity contributing a dataset is issued tokens that represent their stake. For example, in a shared data pool, contributors might each receive a "fair share" of tokens proportional to the value or quantity of data they contributed. These tokens would entitle them to revenue or profit sharing from the use of that data pool. The tokens could also be traded, transferring the right to future revenue streams to another holder.
Of course, determining what constitutes a “fair share” is not straightforward, and it may depend on how the data is being used. Additionally, complexities arise if data can later be withdrawn — for example, personal data collected with consent that is subsequently revoked.
Provenance is crucial here: contributors must provide information on the origin, handling, and legal rights connected to the dataset. By tokenizing ownership, datasets could become liquid and tradeable assets. For companies, this could even mean that datasets appear on balance sheets with realistic, market-based valuations.
On the user side, tokenization could enable controlled access to data. Holding a specific token might grant an entity (or even an AI agent) permission to access a dataset for particular use cases. While the immediate benefits may not be as clear as with tokenized ownership, this becomes highly relevant in the context of decentralized AI and AI agents. Tokens could facilitate automated transactions between AI systems, supporting agent-based economies where data flows seamlessly among autonomous entities.
Note also, since provenance is maintained, all actors in the “data supply chain” could be identified and rewarded proportionally.
Between providers and consumers of data, an intermediary will likely have a role in setting fair prices, issuing tokens, and maintaining marketplace integrity. Over time, such intermediaries could even be AI-driven agents themselves, automating negotiations and clearing transactions in real time.
Tokenization of datasets opens up entirely new opportunities for data economies. It has the potential to:
create fairer pricing mechanisms,
unlock liquidity for data assets,
expand access to quality data, and
support new models for AI-driven economies.
Although this space is still early and comes with unique challenges, data tokenization is poised to become an important field in the coming years.