DataCrunch Equity Market Neutral #2
This weekly cross-sectional problem target the expected returns of the 3000 most liquid US equities.
DataCrunch uses the quantitative research of the CrunchDAO to manage its systematic market-neutral portfolio.
The long-term strategic goal of the fund is capital appreciation by capturing idiosyncratic return at low volatility.
In order to achieve this goal, DataCrunch needs the community to assess the relative performance of all assets in a subset of the Russell 3000 universe. In other words, DataCrunch is expecting your model to predict the performance of the constituent of its investment universe.
This dataset is the new of the original DataCrunch dataset that can be found here.
To read more about the evolutions between #1 and #2 of the DataCrunch datasets, please read the full article.
Reward Scheme
DataCrunch is distributing 1,000 USDC every weeks.
The reward scheme is calculated as follows for both metrics:
# Reward Calculation
weekly_rewards = 1000
# 0 = worst, 1 = best
percentile_rank = your_rank / nb_participants
if percentile_rank <= 0.5:
reward = 0
else:
# excess above median, scaled 0–1
e = 2 * (percentile_rank - 0.5)
weight = e ** 20
reward = weekly_rewards * (weight / sum(participants_weights))
All rewards are computed on the leaderboards.
The Historical Rewards are the sum of every payout you have received from DataCrunch.
The Projected Rewards are the current estimated rewards yet to be distributed.
Dataset
Each row of the dataset represents a single stock at a given weekly timestamp.
X_train
X_trainmoon: A sequentially increasing integer representing a date. Time between subsequent moons is constant, denoting a weekly fixed frequency at which the data is sampled.id: A unique identifier representing a stock at a givenmoon. Note that the same asset has a differentidin differentmoon.Feature_1, …,Feature_1150: Anonymised features that describe the state of assets on a givenmoon. They are ways of assessing the relative performance of each stock on a givenmoon.
y_train
y_trainmoon: Same as inX_train.id: Same as inX_train.target: The targets that will help you build your models. It is derived from the 28 days forward returns.
X_test and y_test
X_test and y_testX_test and y_test have the same structure as X_train and y_train but comprise only one moon at each iteration. These files are used to simulate the submission process locally via crunch.test() (within the code), or crunch test (via the CLI). The aim is to help participants debug their code and have successful submissions. A successful local test usually means no errors during execution on the submission platform.
Embargo
The embargo is defined by the length of the target and is thus 4 moons.
Tournament Structure
Data Splits
The DataCrunch competition follows a fixed and transparent structure:
Local data:
The first 15 years of the dataset. A smaller version of the latest 2.5 years of these 15 years is provided for small configurations.
Cloud data:
A public Out-of-Sample allowing participants to test your submission in the cloud. This data is on the first two month of 2020 year.
A private Out-of-Sample allowing DataCrunch to have historical performance of the models in order to do meta-modelling and ensemble research.
The last available date of the dataset will be scored on a weekly basis, after the target is resolved.
Submission Cut-off
The window to submit and lock your model is removed. You can submit your code / model whenever you want. If the run completes before Sunday 12pm UTC, it will be taken into account for the week. Otherwise, the run will be terminated and ignored for the week.
The prediction target takes 1 day + 4 weeks to be fully resolved, so the score for a model submitted in week #1 will be available and published in week #6.
Scoring and Evaluation
Participants are evaluated using the Pearson correlation between their predictions and the target on the last date of the dataset.
Computing Resources
Participants will be allocated a specified quantity of computing resources within the cloud environment for the execution of their code.
Participants are entitled to 15 hours of GPU or CPU compute time per week. The competition being weekly, you will have to manage your weekly computing resources consumption to comply with this constraint.
Quota will be reset each Sunday at 12pm UTC.
Getting Started
Participants can begin by implementing the required train and infer functions and validating locally using the provided quickstarter notebook.
The API is as follow:
Last updated
