Code Interface

Requirement

Your submission needs to provide at least three components: imports, train(), and infer().

  1. imports: As with any script, if your solution has dependencies on external packages be sure to import them. The system will automatically install your dependencies. Make sure that you only use packages that are whitelisted.

  2. train(): In the training function, users build and train the model to make inferences on the test data. The model must be stored in the resources/ directory.

  3. infer(): In the inference function, the trained model is loaded and used to make inferences on a sample of data that matches the characteristics of the training test.

Dynamic Parameters

If required, parameters can also be queried by name:

  • If the name does not exist, None is used.

  • If a default value is specified, the value is retained (useful for local testing).

  • Typing is always ignored, so make sure it is correct.

They can be used in both the train() and the infer() functions:

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    target_column_name: str,
    has_gpu: bool,
    embargo: int,
    my_custom_value=42,  # user specified
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str,
    id_column_name: str,
    prediction_column_name: str,
) -> pandas.DataFrame

Cross-Sectionnal and DAG

Function Signature

def train(
    X_train: pandas.DataFrame,
    y_train: pandas.DataFrame,
    model_directory_path: str
) -> None

def infer(
    X_test: pandas.DataFrame,
    model_directory_path: str
) -> pandas.DataFrame

Available parameters

Parameter NameDescription

number_of_features

the number of features in the dataset

model_directory_path

the path to the directory in wich your model should be saved into/loaded from

id_column_name

the name of the id column

moon_column_name

the name of the moon column

target_column_name

the name of the target column

prediction_column_name

the name of the prediction column

moon

the moon currently being processed

current_moon

same as moon

embargo

data embrago

has_gpu

if the runner has a gpu

has_trained

if the moon will train

Stream

Function Signature

def train(
    streams: typing.List[typing.Iterable[crunch.StreamMessage]],
    model_directory_path: str
) -> None

def infer(
    stream: typing.Iterator[crunch.StreamMessage],
    model_directory_path: str
) -> typing.Generator[float]

The train function will only be called if the resources/ directory is empty.

Iterable vs Iterator

An iterator can be iterated many times, whereas an iterator can only be consumed once. (learn more)

The difference is subtle, but really important:

  • The train() function is called once, but can consume the streams as many times as necessary

  • The infer() function is called only once per stream, and there is no going back

Available parameters

The system has a lot of hidden parameters that the user can use.

Parameter NameDescription

model_directory_path

the path to the directory to the directory in wich we will be saving your updated model

has_gpu

if the runner has a gpu

Global variables

Global variables being commented out is a known issue.

Last updated