Data
Last updated
Last updated
In all Crunchs, the crunchers can access our datasets as follow:
X_train
, y_train
: Labeled training dataset;
X_test
, y_test
: Small portion of the test set that can be used to .
A sample prediction is provided to show the required format for the output of the submission. Any deviation from this format will result in an invalid submission.
The also uses this file to perform a local check.
The values are usually either random or a constant.
Data can come in various formats
Containing only a moon
, id
and feature
s columns.
Loading the data from a notebook is really easy, the load_data function will make sure that the latest version of the data is available locally, or download it if necessary, and return 3 dataframes.
...or Directed Acyclic Graph data are distributed as a pickled dict
of pandas.DataFrame
.
The main column is example_id
, formatted as follows:
Crunch's Streams are iterator object that allows you to traverse through all the elements of a time serie one at a time.
Participant can access the name of the columns (the features) with .
Stream competition have a different way of submitting the result. Please follow the instead.
X_train.parquet
y_train.parquet
example_prediction_reduced.parquet
X_train.pickle
y_train.pickle
example_prediction_reduced.parquet
X_train.parquet
y_train.parquet
example_prediction_reduced.parquet