A short update was provided last Monday during our weekly stand-up, as well as during the Product WG call.
The week-end approaching I thought it would be important to take some time again to go through the details, and above all the progress being made to finally unleash the Robot
1/ Flashback : what happened ?
Indeed, after reprocessing an old dataset from May, multiple different results came out across the 800-ish assets of which average return and downside volatility are tracked on a daily basis.
The initial investigation revealed that the length of the datasets provided by a long time reference API had changed randomly for several assets. After digging a bit more, it appeared that the beginning of the available price history would also be delayed by a couple of days to several weeks in comparison with the data available from CoinGecko. The variation in dataset length would simply be additional bits of history added on a random basis.
2/ What consequences did this have and what has been done so far to rectifiy the situation ?
While this issue has got a minor impact on the ranking of the oldest assets, the missing bits of history have introduced a bias in the calculation of the Sortino ratio for the “youngest” tokens. This has prevented their integration in the list of potential candidates for iRobot, such that the draft compositions used for backtesting or liquidity analysis would not match with the original methodology anymore.
At this point, it was decided to pause the DG2 process in order to re-build confidence in the whole toolchain, and deliver transparent / unbiased data to the community before the vote.
To achieve this, the whole data processing suite has been rebuilt and automated nearly from scratch. Most importantly, it is now based on data directly imported from CoinGecko (for the analytic geeks out there who would like to have a look at the tool, please DM me in Discord).
The old dataset from May, which led to this discovery, has been re-processed again. For the same asset tracked over the same period of time, it was checked that the new suite is delivering identical results to the old one.
This part of the investigation was concluded by drafting a new composition as of 1st May, and checking the differences with the original backtest. It was found that, among 15 underlyings, the new composition would retain 10 of the old one and introduce 5 new tokens with subsequent variations in weights.
It is worth mentioning that this new result reinforces my confidence in the methodology’s ability to identify strong candidates, but I will do my best to support this with more data in a future post.
3/ What’s next ?
While the backtest could simply be run again based on this latest composition, it’s also important to mention that automating access to the CoinGecko price history for any token significantly extends the universe of “processable” candidates.
This is exactly the reason why it was planned to give this a try before launch… Even if the experience has been more painful than expected, now that the most difficult has been done I think it makes sense to focus on the integration of this extended universe to :
- Draft the final iRobot composition
- Pick the most relevant selection to backtest again from May onwards.
We can reasonably expect that this will require at least one more week of work, but of course further updates will follow along the way. As usual, any question or comment, don’t hesitate to get in touch here or in Discord !