Code Interface

Requirement

Your submission needs to provide at least three components: imports, train(), and infer().

  1. imports: As with any script, if your solution has dependencies on external packages be sure to import them. The system will automatically install your dependencies. Make sure that you only use packages that are whitelisted.

  2. train(): In the training function, users build and train the model to make inferences on the test data. The model must be stored in the resources/ directory.

  3. infer(): In the inference function, the trained model is loaded and used to make inferences on a sample of data that matches the characteristics of the training test.

Dynamic Parameters

If required, parameters can also be queried by name:

  • If the name does not exist, None is used.

  • If a default value is specified, the value is retained (useful for local testing).

  • Typing is always ignored, so make sure it is correct.

They can be used in both the train() and the infer() functions:

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    target_column_name: str,
    has_gpu: bool,
    embargo: int,
    my_custom_value=42,  # user specified
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    prediction_column_name: str,
) -> pandas.DataFrame

Cross-Sectionnal and DAG

Function Signature

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str
) -> pandas.DataFrame

Available parameters

Parameter Name
Description

number_of_features

the number of features in the dataset

model_directory_path

the path to the directory in wich your model should be saved into/loaded from

id_column_name

the name of the id column

moon_column_name

the name of the moon column

target_column_name

the name of the target column

prediction_column_name

the name of the prediction column

moon

the moon currently being processed

current_moon

same as moon

embargo

data embrago

has_gpu

if the runner has a gpu

has_trained

if the moon will train

Stream

Function Signature

def train(
    streams: typing.List[typing.Iterable[crunch.StreamMessage]],
    model_directory_path: str
) -> None

def infer(
    stream: typing.Iterator[crunch.StreamMessage],
    model_directory_path: str
) -> typing.Generator[float]

The train function will only be called if the resources/ directory is empty.

Iterable vs Iterator

An iterator can be iterated many times, whereas an iterator can only be consumed once. (learn more)

The difference is subtle, but really important:

  • The train() function is called once, but can consume the streams as many times as necessary

  • The infer() function is called only once per stream, and there is no going back

Available parameters

The system has a lot of hidden parameters that the user can use.

Parameter Name
Description

model_directory_path

the path to the directory to the directory in wich we will be saving your updated model

has_gpu

if the runner has a gpu

Spatial

Function Signature

def train(
    data_directory_path: str,
    model_directory_path: str
) -> None

def infer(
    data_file_path: str,
    model_directory_path: str
) -> pandas.DataFrame

Available parameters

The system has a lot of hidden parameters that the user can use.

Parameter Name
Description

data_directory_path

the path to the directory where the data is located

model_directory_path

the path to the directory to the directory in wich we will be saving your updated model

target_names

name of the targets to predict, usually one file per target is provided in the data directory

has_gpu

if the runner has a gpu

has_trained

if the train function has been called before

Infer only parameters

data_file_path

the path to the data that must be used to predict the target

target_name

the name of the current target being predicted

Global variables

Global variables being commented out is a known issue.

Last updated