Request for Discussion: Data Economy Index (DATA)

I am a big proponent of this Index - the vision of Web3 relies on decentralized data economies. This sector has massive growth potential, and I believe a substantial short-term AUM to be gained. I have heard rumblings about this type of index at the Coop for a few months, so it is great to see this proposal come through! A few of the highlights of the proposal from my perspective:


Both immediate and long-term growth potential

Strong sector belief, similar to MVI and the Metaverse

Imo, this is the clear next choice for a sector index

Now, onto a couple initial questions that I have:

Could you provide more color behind why these ^ were chosen as the final weights?

What are y’alls thoughts on IP with DATA? GRT tokens have great opportunity to earn via delegation or curation. OCEAN can be staked against data on the Ocean Protocol. NMR can be staked against models if you are a Data Scientist, but also there is a possiblity for lending NMR to data scientists on Aave. Do you envision the Coop taking an active role in these economies using the underlying tokens? What are the possibilities you see there?

I really want to see a DATA index. Looking forward to the review and discussion here. Great work @Thomas_Hepner & @Kiba


Hey @Thomas_Hepner and @Kiba

These are the same comments I shared with you in private and adding here for everyone’s benefit.

  • My gut reaction on the economic activity weight is a negative one. Main concern is that I don’t know what it adds to the index. Perhaps it would be helpful to explain why the economic activity component is necessary? Perhaps some performance backtest or something like that. If you think about people investing in broad, thematic indices, they usually want a simple and straightforward methodology. Economic activity would confuse most people. As a side note, all the sources of data for protocol revenue are different and some are rather ad-hoc. This would be hard to keep up-to-date and ensure reliability of the data.

  • Same comment as above for liquidity. Are there liquidity issues with any of the tokens? That was the primary reason to include liquidity in the MVI weighting.

  • With that, economic activity and liquidity weights feel like an unnecessary overcomplication of the methodology.b Are there any specific issues with simply running square root of mcap index?

  • I’m still unclear what justifies inclusion in the index. What exactly are “on-chain data-based services”? Perhaps more clearly spelling out the categories would be helpful.

  • 4 tokens is a bit underwhelming. I would love to invest in this product if it had like 6 or 8 tokens. Otherwise, I can just hold LINK, GRT and NMR and use them productively in the ecosystem. I see the potential customers for DATA being crypto native and capable of using assets productively.


This ^ is a good point, and why I asked about IP with DATA. There could be strong incentive to hold these due to the ability to participate in the economy as an individual rather than get exposure through an Index.

I think @verto0912 brings up some good questions around the weights, which I second. My initial question was a little more broad (i.e. why these weights?). So interested to see y’alls answers here!

1 Like

Glad to hear you believe in the vision and potential for the Data Economy Index!

@Kiba and I felt that it was most important for a given token’s weight in the index to be based on a combination of market value (the market cap weight) and on-chain economic activity relative to market value (the economic weight).

The biggest challenge in creating a methodology for a DATA index at this time is the extreme power distribution of market capitalization and liquidity for the assets (LINK is 90% of circulating market market cap and 94% of liquidity). DPI has this challenge as well, but not to the same extent.

Liquidity weight was added as DEX liquidity for GRT, NMR, and OCEAN was under $5m per token so liquidity could become an issue if AUM exceeds $20m for DATA.

1 Like

@verto0912 Thank you for sharing your feedback with us - these are excellent points for consideration!

@Kiba and I both feel that the Economic Activity Weight (EW) is a valuable part of the Matrix Scoring System because it gives greater weight to tokens with stronger economic fundamentals (i.e. higher on-chain revenues as a percentage of circulating market cap). We do not believe that adding economic activity is confusing as it is fully transparent in the Matrix Scoring System.

In regards to the data sources, the links provided are well established sites where each community tracks data relevant to their markets. We can improve data collection processes for the index over time as the Data Economy matures and develops as an industry.

As I mentioned in my comment to @jdcook, the biggest challenge in creating a methodology for a DATA index at this time is the extreme power distribution of market capitalization and liquidity for the assets (LINK is 90% of circulating market market cap and 94% of liquidity). DPI has this challenge as well, but not to the same extent. Liquidity weight was added to the matrix scoring system as DEX liquidity for GRT, NMR, and OCEAN is under $5m per token so liquidity could become an issue if AUM exceeds $20m for DATA.

We could simply calculate the square of circulating market capitalization, but that would skew the index heavily into a single asset, Chainlink (LINK), and also not take into account fundamental economic activity for a given protocol, which we believe should be an important factor for the index.

For all of the tokens/projects in the Data Economy Index (DATA), data is the product.

For instance, Chainlink Node operators power decentralized price feeds, Graph Indexers create consumable GraphQL endpoints, data owners publish data assets for consumers on Ocean Protocol, and data scientists submit predictions from machine learning models to power Numerai’s hedge fund.

Starting with 4 tokens does feel a bit underwhelming, but we have to start somewhere! We’re extremely early in the development of the Data Economy. I believe that the Index Coop should define and own the category before someone else does. As noted in the post, DeFi Pulse started when Maker dominance of DeFi was ~90% and Lightning Network and Augur were still considered DeFi projects:

Crypto evolves rapidly. It is easy to imagine many additional projects meeting the Token Inclusion Criteria outlined above and being included in DATA. For example, if cross-chain interoperability were a solved problem on Ethereum and Set Protocol then projects like Filecoin and Helium would already be eligible for inclusion.

1 Like

@verto0912 @jdcook In regards to your comments on Intrinsic Productivity (IP):

@Kiba has lots of ideas about Intrinsic Productivity (IP) for DATA, but we did not want to overload initial considerations for the index given that IP is still in development for DPI after much consideration and many spirited debates we have both been a part of. :grinning_face_with_smiling_eyes:

In the original draft, we had a section for Future Considerations, here is what @Kiba wrote for IP:

Intrinsic Productivity (IP): IP will make DATA completely unforkable if done correctly. There are two types of IP possible - protocol level staking and DeFi farming. We will start with DeFi strategies similar to DPI. Once staking becomes available on a protocol and we start staking on it we can provide superior returns to token holders because 1) higher yields with native staking 2) we can give all returns to DATA holders maxing out APY which no other operator can do. This leads to flywheel effect where higher yield gets more TVL → more staking power → more jobs (superlinear unlike L1/DeFi staking) → higher APY → higher TVL. If we reach capacity on native staking we can switch assets to DeFi farming.

1 Like

Just wanted to respond to this @verto0912 . My perspective is that these assets are fundamentally different than the assets in DPI. They perform functions within a specific data economy where data is the overall product. The economic weights can also be seen as a sign of future growth, imo, so you aren’t just giving weight to what is big now, but also to what has strong tokenomics & usage and potential for high growth. I think it is a valuable addition and one that probably only makes sense with the DATA index at the moment.

The point about data collection is very fair. That would have to be transparent at all times.

1 Like

Great work bringing this to the Forum @Thomas_Hepner and @Kiba - clearly a boatload of work has gone into this.

I strongly desire more index products from the Coop which help capture new investment themes in crypto. DPI and MVI do this well and I think DATA could too - though to be honest I would personally be more excited if the product had more tokens (8+) and a broader scope.

There’s been talk of this decentralized middleware index from DFP for sometime with nothing forthcoming - and given that I’d be open to your ‘data’ theme being opened up to cover all decentralized cloud (data, compute, query, graph, etc). I ultimately think decentralized cloud is a more comparable theme to DPI and MVI - DATA being more comparable to decentralized exchanges or decentralized insurers, a category lower.

I’m sure DATA could be quite successful - and maybe we have subcategories captured by our index products in future - but for now, I think the Coop should prioritize a decentralized cloud index, which I think could be many multiples larger and help us build our brand better (regardless of who is the methodologist!).


Hey @DevOnDeFi - appreciate the thoughtful comments! Here is my thinking on some of your concerns:

@DevOnDeFi I hear your concern about wanting more tokens in the DATA index - that is definitely the long-term goal! I think it will happen naturally as the Data Economy grows and matures, just like DeFi did, but there are also some current technological limitations preventing this from happening today.

DATA’s scope already covers the decentralized cloud and much more.

The reason DATA does not include tokens in what you are calling the decentralized cloud (data, compute, query, graph, etc.) has nothing to do with the Token Inclusion Criteria of DATA. Decentralized data storage projects Filecoin ($5.4B ), Siacoin ($0.85B), Arweave ($0.55B), and Akash Network ($0.2B) are all excluded because they are not ERC-20 tokens. This is a technological, not methodological limitation. DATA would already include these projects if Ethereum and Set Protocol supported cross-chain interoperability.

If we wanted to include Filecoin in DATA at launch we would need to use wrapped Filecoin (WFIL) or renFIL (renFIL). Siacoin, Arweave, and Akash Network do not have wrapped or derivative tokens as ERC-20 tokens at the present time.

@DevOnDeFi Do these points address your concerns or do you still have reservations? My main point is that we plan to include decentralized cloud projects in the DATA index when it is technologically feasible to do so.

@Thomas_Hepner and @Kiba, great work for collating so much information for this proposal.
As an engineer, I can see the big potential and demand for decentralized Web3 services because they make the whole ecosystem anti-fragile.

However, having four tokens to start the DATA index is quite risky for an investor, and deges will just buy them separately.

We are still in the early stage of Web3 and it is very hard to guess who will be the winners. I would also prefer to have 8+ tokens in an index for diversification.

I would recommend that we dig deeper on other Web3 ERC20 tokens which have a huge potential. Lastly, I think having a survey on discord about what Web3 tokens they like will give us an idea of what tokens to dig in. I would also be delighted to help you on this journey.

1 Like

Another point on the economic activity component - it limits your potential inclusions. Assume there’s a token that meets all of your criteria (which still looks like you are handpicking tokens) but they don’t have a credible source for revenue. What happens?

If there are only 4 tokens that fit your criteria, what it tells me is that either 1) your criteria are too strict or 2) the space is not mature enough for an index. Or potentially both.

I think trying to expand the universe (maybe you’ve done this and there’s just no other tokens) could be a helpful exercise. What happens if you lower market cap to $50m from $100m? It would also reduce the dominance of LINK in a sqr root of market cap index.

Really quickly on liquidity weights, if all tokens but LINK have liquidity constraints, having a liquidity weight gives LINK token a higher allocation at the expense of the other tokens which is something you are trying to avoid.

1 Like

Really impressive analysis. I have not seen the Infrastructure/Middleware proposal by DeFi Pulse - is their proposal limited to Ethereum?

1 Like


How do you think we are “handpicking tokens”? We have defined an objective criteria (Token Inclusion List) and instantiated a Token List with 4 tokens that meet all of those criteria.

Revenue and Earnings are both very commonly used for index construction. For example, Tesla was not added to the S&P 500 until December 2020 because it did not meet the S&P 500’s criteria of needing 4 straight quarters of GAAP profits.

Personally, I thought that was pretty stupid (I have been a major TSLA holder since mid 2019), and think Revenue is the right fundamental metric for an industry like the Data Economy experiencing rapid growth.

Yes, the economic activity component in the Token Inclusion criteria absolutely limits inclusions by design. Augur and Gnosis were both excluded from the DATA Token List for either having insufficient on-chain data-based economic activity. We view exclusion of tokens that do not fit the Token Inclusion Criteria as a feature, not a bug.

I disagree strongly with this point. As I noted in my comment to @DevOnDeFi:

DATA as designed has a higher market capitalization than all components in MVI; Filecoin alone has a higher market cap than all components in MVI and has been live for 6 years. DATA would already meet your goal to have 8+ tokens if Ethereum and Set Protocol supported cross-chain interoperability.

Lowering the market capitalization still excludes interesting data-based projects like FOAM and Robonomics Networks, but these projects would still be excluded due to the economic activity criteria.

This is a good point. I don’t think we are trying to avoid LINK having an outsized weight in the index given how much greater it’s market cap is than everything else so much as give relatively more weight to projects with smaller market caps that nonetheless have great data economy fundamentals (NMR’s weight relative to OCEAN’s is the prime example of this phenomenon).

Thanks for all the responses and feedback everyone.
So far there seem to be 3 main topics: # of tokens, purpose of economic weight, and Intrinsic Productivity

# of Tokens

There are many tokens that qualify for inclusion in DATA but aren’t on the Ethereum chain. We listed at least 4 that we would like to include but can’t for this reason. It’s possible that wrapped tokens like WFIL could be used if they get enough onchain liquidity. We did discuss reducing the mcap requirement to $30m I think but it didn’t make a big difference


This is very early stages for this sector with lots of near term potential in these tokens and for new tokens that come up as it develops. @verto0912 mentioned we use different data sources for economic activity of each and thats because something like or doesn’t exist yet. This obviously a tool we are looking to build and as Thomas said the coop can be a major part in shaping this industry in it’s infancy just like DPI and MVI.

Intrinsic Productivity

short summary on IP here
Even with a few tokens DATA has same value as DPI with easy market coverage, rebalances, lower volatility, etc. so I don’t think we need IP to get PMF. That being said I think the case for IP in DATA is stronger than DPI, MVI and maybe SMI.

First reason being that if we run our own nodes we can provide DATA holders with higher staking returns than any other nodes on the market because we don’t need to charge staking fees, we only take a small 95bps streaming fee. So historically ETFs outperform actively managed accounts and you get higher staking returns than anywhere else. Second unlike DeFi staking pro-rata rewards, most DATA economies can have superlinear rewards stakers. Nodes have access to even more jobs the more they have staked (bc of minimum requirement, bandwidth, etc.) and could theoretically monopolize the market and set prices at will, something not possible in DeFi. Third, with the amount of tokens we’ll ideally have in DATA we can go to nodes/pools and bargain for better reward splits.

This is all on top of the normal DeFi options available for tokens like LINK.

Economic Weight

I’d say jd captured the main reason here:

We are tracking economies and economies are networks so the strongest network will perform better over time, especially if their token price is undervalued to the fundamentals and then price increase leads to higher economic bandwidth for them to operate. We use sources from the protocol itself ( or reputable community members ( and verify on-chain, an aggregator would be nice though.

It’s the Data Economy Index, saying it’s weighted by economic activity has made sense to all the normie friends I’ve sent it to (like they buy DOGE an ADA) and cryptonatives.


Thank you @patb! I do not think DeFi Pulse has put forward a proposal yet, but I’ve heard it mentioned on the weekly Index Coop community calls a few times. So, I do not know if their proposal is limited to Ethereum-based projects or not in the long-term.

I actually don’t think your response to @DevOnDeFi covers “the space is not mature enough for an index.” My reaction based on what I’ve seen so far is “a phenomenal idea that’s too early.”

Why would doing this now benefit the space or the Coop? DPI wasn’t the first tokenized DeFi index. It seems that didn’t matter too much. If you answer this question, you win my support.

The other notion would be to broaden the theme. I’m hesitant to go that route because I think you guys are onto something great with DATA. Have you considered a broader scope? If so, what would that be?


@fallow8 and I caught up over Discord about his question:

Here’s a brief summary of our discussion:

  • Establish Index Coop as the Institution that Tracks the Data Economy: The value of launching DATA now is that it establishes the Index Coop as the premier institution tracking the Data Economy. DPI inherited the credibility that DeFi Pulse built over 2 years by tracking DeFi from the beginning of its development. We believe it’s really important that the Index Coop position itself as the category owner of the Data Economy, even with only 4 tokens in the index, before a different index provider does.

  • Number of Tokens Will Naturally Expand: As blockchain technology matures and cross-chain interoperability solutions develop, the number of tokens in the index will naturally expand. There would be 8 tokens in the index now if Filecoin, Siacoin, Arweave, and Akash Network were ERC-20s.

  • Creation of a Public Scoreboard to Build Brand: People love scoreboards. There would be lots of brand and reputation value in the index methodologists (@kiba and I) maintaining a website with the Data Economy Token List, akin to what DeFi Pulse has done, showing which assets are included in DATA and which are not. For instance, there are many great DeFi assets, like Curve (CRV), which are included on DeFi Pulse, but not in DPI for various reasons.

@fallow8 Do you agree with this summarization of our conversation? Is there anything I missed or you feel I misrepresented?

1 Like

Hello @Don-ETH - thank you for your thoughtful comment!

Here are some of my thoughts on different points you have made:

What do you mean by risk? If you are referring to volatility, serious cryptoasset investors are quite fine with volatility given that even the crypto blue chip assets like BTC and ETH dropped ~50% in the past month in USD-terms. DPI also declined >50% from peak to trough despite having 14 assets in the index.

DATA would not have fared any better on this dimension, regardless of whether it included 4 or 8 tokens. The original 4 tokens described in the post all lost 65%+ of their value from peak to trough; LINK declined almost 70%! Filecoin, Arweave, Siacoin, and Akash Network all declined by 70%+ in the recent drop so including them would not have decreased volatility of the DATA index materially.

If you are referring to idiosyncratic or specific risk, then having 8+ tokens instead of 4 would certainly be better.

As I noted in some of my other replies, there would be 8 tokens in the index now if Filecoin, Siacoin, Arweave, and Akash Network were ERC-20s.

1 Like

That covers it. For my part, our conversation turned me around on “it’s too early” as an objection. That may still affect the product in the prioritization process, but it’s not a reason not to do it. Especially because DG1 to DG2 can really refine a product. This should be on the list if you believe in the trajectory of the Data Economy as @Kiba and @Thomas_Hepner lay out (which I do). And it gives us the unique opportunities @Thomas_Hepner covers above (entrenching the Coop as a locus for signal effect in burgeoning markets).


I want to start this off by saying that this proposal sets the standard for future Index proposals. This is the exact level of quality and attention to detail our community should expect from future proposals.

From reading through the comments these seems to be two main concerns:

  1. Defining economic activity is difficult and the value of using it as an inclusion criteria is uncertain

I tend to agree here - for these broad thematic indices investors want extremely transparent methodologies that are also conceptually simple. Investors from all backgrounds value the simplicity of methods like market weighting because they are well established and battle tested in a variety of market conditions. The only real variable is the data input. Adding another layer of complexity of selection may also add another layer of un-anticipated risk.

For broad thematic indices I tend to support this approach. With that said - if you guys’ strongly feel like this is necessary it definitely warrants further discussion.

  1. The current token basket feels limited

Spending some more time hashing out exactly what tokens we want to include would be very helpful. I see a strong argument for sidechains and potentially select L2 infrastructure being included in this indice.

With that said - I am strongly in favor of moving this forward. I also see this as an opportunity to quickly release an index that has clear market fit and minimal engineering requirements. Our bias needs to be towards simple solutions and simple implementations. I would rather get v1 of these indices out the door than spend unnecessarily time speccing out more elaborate architecture.