Data
In all Crunchs, the crunchers can access our datasets as follow:
Splited datasets
x_train, y_train, x_test
x_train
,y_train
: Labeled training dataset;x_test
,y_test
: Small portion of the test set that can be used to run local test.
The example prediction
A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.
The crunch-cli also uses this file to perform a local check.
The values are usually either random or a constant.
Data Formats
Data can come in various formats
Cross-sectional DataFrame
Containing only a moon
, id
and feature
s columns.
Participant can access the name of the columns (the features) with parameters of the code interface.
How to load the data?
Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.
Examples from the DataCrunch competition
Prediction's format
DAG
...or Directed Acyclic Graph data are distributed as a pickled dict
of pandas.DataFrame
.
How to load the data?
Examples from the ADIA Lab Causality Discovery competition
Prediction's format
The main column is example_id
, formatted as follows:
Stream
Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.
How to load the data?
Examples from the Mid+One competition
Prediction's format
Stream competition have a different way of submitting the result. Please follow the Stream Code Interface instead.
Last updated