Data
In all Crunchs, the crunchers can access our datasets as follow:
Splited datasets
x_train, y_train, x_test
x_train
,y_train
: Labeled training dataset;x_test
,y_test
: Small portion of the test set that can be used to run local test.
In Crunches, crunchers submit code or models in the form of notebooks or Python files. These submissions are then run on testing or live data by the system. As a result, crunchers never have direct access to the testing data.
The example prediction
A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.
The crunch-cli also uses this file to perform a local check.
The values are usually either random or a constant.
Data Formats
Data can come in various formats
Cross-sectional DataFrame
Containing only a moon
, id
and feature
s columns.
Moon are proxy for timestamps, in other words moons are date
Participant can access the name of the columns (the features) with parameters of the code interface.
How to load the data?
Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.
Examples from the DataCrunch competition
Prediction's format
There should be one column per target.
DAG
...or Directed Acyclic Graph data are distributed as a pickled dict
of pandas.DataFrame
.
How to load the data?
Examples from the ADIA Lab Causality Discovery competition
Prediction's format
The main column is example_id
, formatted as follows:
Stream
Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.
How to load the data?
Examples from the Mid+One competition
Prediction's format
Stream competition have a different way of submitting the result. Please follow the Stream Code Interface instead.
Last updated