Participate
To get started and submit your first model, you will need to pass through the following steps.
Register
Creating an account on the CrunchDAO platform will allow you to be identified and get access to the competition dataset. Follow the link below to join the competition.
Submit
Two distinct formats of submission are accepted for the competitions:
Jupyter Notebook (
.ipynb
), which is a self-contained version of the codePython Script (
.py
), which allows more flexibility and to split the code into multiple files
All the work you submit remains your exclusive property. The Crunch Foundation guarantees the privacy of both client data and competitors' code.
Jupyter Notebook
Notebook users can use the Quickstarters provided by CrunchDAO to quickly experiment with a working solution that users can tinker with.
Setting the Environment
Before trying to execute any cell, users must set up their environment by copying the command available on the competition page:
Run the commands to set up your environment and download the data to be ready to go:
Users can now load the data locally:
Local Testing
When users are satisfied with their work, they can easily test their implementation:
Submitting your Notebook
After testing the code, users need to have access to the .ipynb
file.
If you are on Google Colab:
File
>Download
>Download .ipynb
If you are on Kaggle:
File
>Download Notebook
If you are on Jupyter Lab:
File
>Download
Then submit on the Submit a Notebook page:
Some model files can also be uploaded along with the notebook, which will be stored in the resources/
directory.
The notebook is automatically converted to a Python script, keeping only the functions, imports, and classes. Everything else will be commented out.
Specifying package versions
Since submitting a notebook does not include a requirements.txt
, users can instead specify the version of a package using import-level requirement specifiers in a comment on the same line.
Specifying multiple times will cause the submission to be rejected if they are different.
Specifying versions on standard libraries does nothing (but they will still be rejected if there is an inconsistent version).
If an optional dependency is required for the code to work properly, an import statement must be added, even if the code does not use it directly.
Embed Files
Additional files can be embedded in cells to be submitted with the Notebook. In order for the system to recognize a cell as an Embed File, the following syntax must be followed:
Submitting multiple cells with the same file name will be rejected.
While the focus is on Markdown files, any text file will be accepted. Including but not limited to: .txt
, .yaml
, .json
, ...
Python Script
Script users can use the Quickstarters provided by CrunchDAO to know what the structure should be.
A mandatory main.py is required to have both functions (train
and infer
) in order for your code to run properly.
Setting the Environment
Before starting to work, users must setup their environment which will be similar to a git repository.
Run the commands to set up your environment and download the data to be ready to go:
Directory Layouts
data/
Directory containing the data of the competition, should never be modified by the user. Always kept up to date by the CLI.
main.py
Code entry point. Must contain the train()
and infer()
function. Can import other files if necessary. Learn more...
requirements.txt
List of packages used by your code. They are installed before your code is invoked.
resources/
Directory where your model should be stored. The content is persisted across runs during the transition between the Submission and Out-of-Sample phases.
Local Testing
When users are satisfied with their work, they can easily test their implementation:
Pushing your Code
After the code has been tested, the submission needs to be uploaded to the server.
The message is optional and is just a label for users to know what they did.
Remember to include all your dependencies in a requirements.txt
file.
Hybrid
For some complex setups, users may need to use the CLI to submit a Jupyter Notebook. This can happen if they want to submit with a large pre-trained model, or they want to include non-PyPI packages.
It will be very similar to the Python Script setup:
Setting the Environment, like for a Python Script.
Remove the
main.py
.Move your notebook to the project directory and name it
main.ipynb
.
The main
name can be changed by using the --main-file <new_file_name>.py
option.
If done correctly, before each crunch push, the CLI will first convert the notebook to a script file before sending it.
Note that package version specifiers will not work and the requirements.txt
file must be updated manually.
Setup Tokens
The site generates new tokens every minute, and each token can only be used once within a 3-minute timeframe.
This prevents any problems if your token is accidentally shared, as it will likely have already been used or expired. Even the team shares their expired tokens in Quickstarters.
This token allows the CLI to download the data and submit your submission on your behalf.
Run
Checking your submission
The system parses your work to retrieve the code of the interface functions (train()
and infer()
) and their dependencies. By clicking on the right arrow, you can access the contents of your submission.
Running in the Cloud
Once you've submitted, it's time to make sure your model can run in the cloud environment. Click on a submission and then click the Run in the Cloud button.
Your code is fed a standard epoch of data and the system simulates an inference.
A successful run means that the system will be able to call your code on new data to produce the inferences for that customer.
Debugging with the logs
If your run crashes or you want to better understand how your code behaved, you can review the logs.
Due to abuse, only the first 1,500 lines of a user's code logs will be displayed.
Last updated