CrunchDAO Docs V3
  • Crunch Hub
    • The Crunch-Hub
      • Activity Graphs
  • Competitions
    • Competitions
      • DataCrunch Competition
      • Broad Institute Autoimmune Disease
        • Crunch 1 – Oct 28 to Feb 9 – Predict gene expression
        • Crunch 2 – Nov 18 to Mar 21 – Predicting Unseen Genes
        • Crunch 3 – Dec 9 to Apr 30 – Identifying Gene
        • Full Specifications
        • Lectures
      • ADIA Lab Structural Break Challenge
      • ADIA Lab Causal Discovery
      • ADIA Lab Market Prediction Competition
    • Rallies
      • Mid+One
      • DataCrunch Rally
      • X-Alpha Rally
    • Participate
    • Teams
      • Managing
      • Referendums
      • Leaderboard
      • Rewards
    • Data
    • Code Interface
    • Leaderboard
      • Duplicate Predictions
    • Resources Limit
    • Whitelisted Libraries
    • Known Issues
  • CRUNCH Token practical
    • Release Map
  • Credits
    • Avatar
  • Other
    • Glossary
Powered by GitBook
On this page
  • Splited datasets
  • X_train, y_train, X_test
  • The example prediction
  • Data Formats
  • Cross-sectional DataFrame
  • DAG
  • Stream
  1. Competitions

Data

PreviousRewardsNextCode Interface

Last updated 23 days ago

In all Crunchs, the crunchers can access our datasets as follow:

Splited datasets

X_train, y_train, X_test

  1. X_train, y_train: Labeled training dataset;

  2. X_test, y_test: Small portion of the test set that can be used to .

In Crunches, crunchers submit code or models in the form of notebooks or Python files. These submissions are then run on testing or live data by the system. As a result, crunchers never have direct access to the testing data.

The example prediction

A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.

The also uses this file to perform a local check.

The values are usually either random or a constant.

Data Formats

Data can come in various formats

Cross-sectional DataFrame

Containing only a moon, id and features columns.

Moon are proxy for timestamps, in other words moons are date

How to load the data?

Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.

# X_train: pandas.DataFrame
# y_train: pandas.DataFrame
#  X_test: pandas.DataFrame

X_train, y_train, X_test = crunch.load_data()

Examples from the DataCrunch competition

Prediction's format

There should be one column per target.

DAG

...or Directed Acyclic Graph data are distributed as a pickled dict of pandas.DataFrame.

How to load the data?

# X_train: typing.Dict[str, pandas.DataFrame]
# y_train: typing.Dict[str, pandas.DataFrame]
#  X_test: typing.Dict[str, pandas.DataFrame]

X_train, y_train, X_test = crunch.load_data()

Examples from the ADIA Lab Causality Discovery competition

Prediction's format

The main column is example_id, formatted as follows:

<dataset_id>_<from_node>_<to_node>

Stream

Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.

How to load the data?

# X_train: typing.List[typing.Iterator[dict]]
#  X_test: typing.List[typing.Iterator[dict]]

X_train, X_test = crunch.load_streams()

Examples from the Mid+One competition

Prediction's format

Participant can access the name of the columns (the features) with .

Stream competition have a different way of submitting the result. Please follow the instead.

crunch-cli
run local test
DataCrunch's X_train.parquet
DataCrunch's y_train.parquet
DataCrunch's example_prediction_reduced.parquet
Causality Discovery's X_train.pickle
Causality Discovery's y_train.pickle
Causality Discovery's example_prediction_reduced.parquet
Mid-One's X_train.parquet
Mid-One's y_train.parquet
Mid-One's example_prediction_reduced.parquet
parameters of the code interface
Stream Code Interface