Data
In all Crunchs, the crunchers can access our datasets as follow:
Splited datasets
X_train, y_train, X_test
X_train,y_train: Labeled training dataset;X_test,y_test: Small portion of the test set that can be used to run local test.
The example prediction
A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.
The crunch-cli also uses this file to perform a local check.
The values are usually either random or a constant.
Data Formats
Data can come in various formats
Cross-sectional DataFrame
Containing only a moon, id and features columns.
Participant can access the name of the columns (the features) with parameters of the code interface.
How to load the data?
Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.
# X_train: pandas.DataFrame
# y_train: pandas.DataFrame
# X_test: pandas.DataFrame
X_train, y_train, X_test = crunch.load_data()Examples from the DataCrunch competition

X_train.parquet
y_train.parquetPrediction's format

example_prediction_reduced.parquetDAG
...or Directed Acyclic Graph data are distributed as a pickled dict of pandas.DataFrame.
How to load the data?
# X_train: typing.Dict[str, pandas.DataFrame]
# y_train: typing.Dict[str, pandas.DataFrame]
# X_test: typing.Dict[str, pandas.DataFrame]
X_train, y_train, X_test = crunch.load_data()Examples from the ADIA Lab Causality Discovery competition

X_train.pickle
y_train.picklePrediction's format
The main column is example_id, formatted as follows:
<dataset_id>_<from_node>_<to_node>
example_prediction_reduced.parquetStream
Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.
How to load the data?
# X_train: typing.List[typing.Iterator[dict]]
# X_test: typing.List[typing.Iterator[dict]]
X_train, X_test = crunch.load_streams()Examples from the Mid+One competition

X_train.parquet
y_train.parquetPrediction's format
Stream competition have a different way of submitting the result. Please follow the Stream Code Interface instead.

example_prediction_reduced.parquetLast updated