Data
In all Crunchs, the crunchers can access our datasets as follow:
Splited datasets
X_train, y_train, X_test
X_train
,y_train
: Labeled training dataset;X_test
,y_test
: Small portion of the test set that can be used to run local test.
The example prediction
A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.
The crunch-cli also uses this file to perform a local check.
The values are usually either random or a constant.
Data Formats
Data can come in various formats
Cross-sectional DataFrame
Containing only a moon
, id
and feature
s columns.
Participant can access the name of the columns (the features) with parameters of the code interface.
How to load the data?
Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.
# X_train: pandas.DataFrame
# y_train: pandas.DataFrame
# X_test: pandas.DataFrame
X_train, y_train, X_test = crunch.load_data()
Examples from the DataCrunch competition

X_train.parquet

y_train.parquet
Prediction's format

example_prediction_reduced.parquet
DAG
...or Directed Acyclic Graph data are distributed as a pickled dict
of pandas.DataFrame
.
How to load the data?
# X_train: typing.Dict[str, pandas.DataFrame]
# y_train: typing.Dict[str, pandas.DataFrame]
# X_test: typing.Dict[str, pandas.DataFrame]
X_train, y_train, X_test = crunch.load_data()
Examples from the ADIA Lab Causality Discovery competition

X_train.pickle

y_train.pickle
Prediction's format
The main column is example_id
, formatted as follows:
<dataset_id>_<from_node>_<to_node>

example_prediction_reduced.parquet
Stream
Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.
How to load the data?
# X_train: typing.List[typing.Iterator[dict]]
# X_test: typing.List[typing.Iterator[dict]]
X_train, X_test = crunch.load_streams()
Examples from the Mid+One competition

X_train.parquet

y_train.parquet
Prediction's format
Stream competition have a different way of submitting the result. Please follow the Stream Code Interface instead.

example_prediction_reduced.parquet
Last updated