Code Interface


Your submission needs to provide at least three components: imports, train(), and infer().

  1. imports: As with any script, if your solution has dependencies on external packages be sure to import them. The system will automatically install your dependencies. Make sure that you only use packages that are whitelisted.

  2. train(): In the training function, users build and train the model to make inferences on the test data. The model must be stored in the resources/ directory.

  3. infer(): In the inference function, the trained model is loaded and used to make inferences on a sample of data that matches the characteristics of the training test.

Dynamic Parameters

If required, parameters can also be queried by name:

  • If the name does not exist, None is used.

  • If a default value is specified, the value is retained (useful for local testing).

  • Typing is always ignored, so make sure it is correct.

They can be used in both the train() and the infer() functions:

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    target_column_name: str,
    has_gpu: bool,
    embargo: int,
    my_custom_value=42,  # user specified
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    prediction_column_name: str,
) -> pandas.DataFrame

Cross-Sectionnal and DAG

Function Signature

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str
) -> pandas.DataFrame

Available parameters

Parameter NameDescription


the number of features in the dataset


the path to the directory in wich your model should be saved into/loaded from


the name of the id column


the name of the moon column


the name of the target column


the name of the prediction column


the moon currently being processed


same as moon


data embrago


if the runner has a gpu


if the moon will train


Function Signature

def train(
    streams: typing.List[typing.Iterable[crunch.StreamMessage]],
    model_directory_path: str
) -> None

def infer(
    stream: typing.Iterator[crunch.StreamMessage],
    model_directory_path: str
) -> typing.Generator[float]

The train function will only be called if the resources/ directory is empty.

Iterable vs Iterator

An iterator can be iterated many times, whereas an iterator can only be consumed once. (learn more)

The difference is subtle, but really important:

  • The train() function is called once, but can consume the streams as many times as necessary

  • The infer() function is called only once per stream, and there is no going back

Available parameters

The system has a lot of hidden parameters that the user can use.

Parameter NameDescription


the path to the directory to the directory in wich we will be saving your updated model


if the runner has a gpu

Global variables

Global variables being commented out is a known issue.

Last updated