Participate

To get started and submit your first model, you will need to pass through the following steps.

Register

Creating an account on the CrunchDAO platform will allow you to be identified and get access to the competition dataset. Follow the link below to join the competition.

Submit

Two distinct formats of submission are accepted for the competitions:

  • Jupyter Notebook (.ipynb), which is a self-contained version of the code

  • Python Script (.py), which allows more flexibility and to split the code into multiple files

All the work you submit remains your exclusive property. The Crunch Foundation guarantees the privacy of both client data and competitors' code.

Jupyter Notebook

Notebook users can use the Quickstarters provided by CrunchDAO to quickly experiment with a working solution that users can tinker with.

Setting the Environment

Before trying to execute any cell, users must set up their environment by copying the command available on the competition page:

The "Submit a Notebook" tab from the "Submit" page of a competition

Run the commands to set up your environment and download the data to be ready to go:

Python Notebook Cell
# Upgrade the Crunch-CLI to the latest version
%pip install crunch-cli --upgrade

# Authenticates yourself, it will downloads your last submission and the data
!crunch setup <competition name> <model name> --notebook --token <token>

Users can now load the data locally:

Python Notebook Cell
# Load the notebook, run me once
import crunch
crunch = crunch.load_notebook()

# Load the data, re-run me if you corrupt the dataframes
X_train, y_train, X_test = crunch.load_data()

Local Testing

When users are satisfied with their work, they can easily test their implementation:

Python Notebook Cell
# Run a local test
crunch.test()

Submitting your Notebook

After testing the code, users need to have access to the .ipynb file.

  • If you are on Google Colab: File > Download > Download .ipynb

  • If you are on Kaggle: File > Download Notebook

  • If you are on Jupyter Lab: File > Download

Then submit on the Submit a Notebook page:

Some model files can also be uploaded along with the notebook, which will be stored in the resources/ directory. Read more about the file selection dialog.

Global variables

If you want to use global variables in your notebook, put them in a class, this will improves the readability of your code:

Python Notebook Cell
class Constants:

    TRAIN_DEPTH = 42
    IMPORTANT_FEATURES = [ "a", "b", "c" ]

def infer():
    print(Constants.TRAIN_DEPTH)
    # 42

Automatic line commenting

The notebook is automatically converted into a Python script that only includes the functions, imports, and classes.

Everything else is commented out to prevent side effects when your code is loaded into the cloud environment. (e.g. when you're exploring the data, debugging your algorithm, or doing visualizating using Matplotlib, etc.)

You can prevent this behavior by using special comments to tell the system to keep part of your code:

  • To start a section that you want to keep, write: @crunch/keep:on

  • To end the section, write: @crunch/keep:off

Python Notebook Cell (before)
# @crunch/keep:on

# keep global initialization
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# keep constants
TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

# @crunch/keep:off

# this will be ignored
x, y = crunch.load_data()

def train(...):
    ...

The result will be:

Python Notebook Cell (after)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

TRAIN_DEPTH = 42
IMPORTANT_FEATURES = [ "a", "b", "c" ]

#x, y = crunch.load_data()

def train(...):
    ...

The command does not affect comments, functions, classes, or imports.

You can put a @crunch/keep:on at the top of the cell and never close it to keep everything.

Specifying package versions

Since submitting a notebook does not include a requirements.txt, users can instead specify the version of a package using import-level requirement specifiers in a comment on the same line.

Python Notebook Cell
# Valid statements
import pandas # == 1.3
import sklearn # >= 1.2, < 2.0
import tqdm # [foo, bar]
import scikit # ~= 1.4.2
from requests import Session # == 1.5

Specifying multiple times will cause the submission to be rejected if they are different.

Python Notebook Cell
# Inconsistant versions will be rejected
import pandas # == 1.3
import pandas # == 1.5

Specifying versions on standard libraries does nothing (but they will still be rejected if there is an inconsistent version).

Python Notebook Cell
# Will be ignored
import os # == 1.3
import sys # == 1.5

If an optional dependency is required for the code to work properly, an import statement must be added, even if the code does not use it directly.

Python Notebook Cell
import castle.algorithms

# Keep me, I am needed by castle
import torch

It is possible for multiple import names to resolve to different libraries on PyPI. If this happens, you must specify which one you want. If you do not want a specific version, you can use @latest, as without this, we cannot distinguish between commented code and version specifiers.

Python Notebook Cell
# Prefer https://pypi.org/project/EMD-signal/
import pyemd # EMD-signal @latest

# Prefer https://pypi.org/project/pyemd/
import pyemd # pyemd @latest

Embed Files

Additional files can be embedded in cells to be submitted with the Notebook. In order for the system to recognize a cell as an Embed File, the following syntax must be followed:

Markdown Notebook Cell
---
file: <file_name>.md
---

<!-- File content goes here -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean rutrum condimentum ornare.

Submitting multiple cells with the same file name will be rejected.

While the focus is on Markdown files, any text file will be accepted. Including but not limited to: .txt, .yaml, .json, ...

Python Script

Script users can use the Quickstarters provided by CrunchDAO to know what the structure should be.

A mandatory main.py is required to have both functions (train and infer) in order for your code to run properly.

Setting the Environment

Before starting to work, users must setup their environment which will be similar to a git repository.

The "Submit via CLI" tab from the "Submit" page of a competition

Run the commands to set up your environment and download the data to be ready to go:

Terminal
# Upgrade the Crunch-CLI to the latest version
$ pip install crunch-cli --upgrade

# Authenticates yourself, it will downloads your last submission and the data
$ crunch setup <competition name> <model name> --token <token> [directory]

# Change the directory to the configured environment
$ cd <directory>

Read more about how setup tokens work and why it is safe to (accidentally) "leak" them.

Directory Layouts

File Explorer
# Example of a folder structure.
# The data files may change depending on the competition.
.
├── data/
│   ├── X_test.parquet
│   ├── X_train.parquet
│   └── y_train.parquet
├── main.py
├── requirements.txt
└── resources/
    └── model.joblib
File / Directory
Reason

data/

Directory containing the data of the competition, should never be modified by the user. Always kept up to date by the CLI.

main.py

Code entry point. Must contain the train() and infer() function. Can import other files if necessary. Learn more...

requirements.txt

List of packages used by your code. They are installed before your code is invoked.

resources/

Directory where your model should be stored. The content is persisted across runs during the transition between the Submission and Out-of-Sample phases.

Local Testing

When users are satisfied with their work, they can easily test their implementation:

Terminal
# Run a local test using a shell command
$ crunch test

Pushing your Code

After the code has been tested, the submission needs to be uploaded to the server.

The message is optional and is just a label for users to know what they did.

Terminal
$ crunch push --message "hello world"

Remember to include all your dependencies in a requirements.txt file.

Package version freezes

Before submitting, the CLI does a pip freeze in the background to find out what version you are using locally. These versions are then used to freeze the requirements.txt on the server.

This is to ensure that if your code works locally with the exact same versions, it should theoretically work the same on the server.

However, this behavior may result in your packages not being installed because:

  • you are using custom/externally installed package versions,

  • you have installed some packages by force, but PyPI considers them incompatible,

  • the architecture is different and the package is platform-specific.

If you want to disable this behavior, use the --no-pip-freeze flag.

Terminal
$ crunch push --no-pip-freeze --message "hello world"

Your original requirements will always be preserved. Don't hesitate to reach out to us on Discord or the Forum for help.

Hybrid

For some complex setups, users may need to use the CLI to submit a Jupyter Notebook. This can happen if they want to submit with a large pre-trained model, or they want to include non-PyPI packages.

It will be very similar to the Python Script setup:

  • Setting the Environment, like for a Python Script.

  • Remove the main.py.

  • Move your notebook to the project directory and name it main.ipynb.

The main name can be changed by using the --main-file <new_file_name>.py option. (keep the .py at the end)

If done correctly, before each crunch push, the CLI will first convert the notebook to a script file before sending it.

Package version freezes are still being done in the background.

Files

If you do not want to use the CLI and did not use a notebook to write your code, you can submit files directly. This is an advanced feature that requires preparation.

The directory layout must be the same as the CLI directory layout.

File Selection Dialog

To add files you can:

  • select multiple files by clicking on the "Add file(s)" button

  • or select the contents of an entire directory by clicking on the "Add directory" button

Files selection dialog

If no files have been selected yet and you add a directory, that directory will be used as the root.

Due to a limitation of web browsers, it is not possible to select a directory via the "Add file(s)" button or add multiple directories via the "Add directory" button.

To achieve this, either:

  • add multiple directories

  • or place your submission in a directory and add the directory once.

Once added, files can be disabled if you add too many, or renamed if the name is incorrect.

A model is selected

Setup Tokens

The site generates new tokens every minute, and each token can only be used once within a 3-minute timeframe.

This prevents any problems if your token is accidentally shared, as it will likely have already been used or expired. Even the team shares their expired tokens in Quickstarters.

This token allows the CLI to download the data and submit your submission on your behalf.

Run

Checking your submission

A successful submission.

The system parses your work to retrieve the code of the interface functions (train() and infer()) and their dependencies. By clicking on the right arrow, you can access the contents of your submission.

The view of a submission once properly uploaded

Running in the Cloud

Once you've submitted, it's time to make sure your model can run in the cloud environment. Click on a submission and then click the Run in the Cloud button.

Click Run in the Cloud to start your run

Your code is fed a standard epoch of data and the system simulates an inference.

If your submission ran properly, you'll see the status as successful.

A successful run means that the system will be able to call your code on new data to produce the inferences for that customer.

Debugging with the logs

If your run crashes or you want to better understand how your code behaved, you can review the logs.

How to check your execution logs
Logs of a run

Last updated