CrunchDAO Docs V3
  • Crunch Hub
    • The Crunch-Hub
      • Activity Graphs
  • Competitions
    • Competitions
      • DataCrunch Competition
      • Broad Institute Autoimmune Disease
        • Crunch 1 – Oct 28 to Feb 9 – Predict gene expression
        • Crunch 2 – Nov 18 to Mar 21 – Predicting Unseen Genes
        • Crunch 3 – Dec 9 to Apr 30 – Identifying Gene
        • Full Specifications
        • Lectures
      • ADIA Lab Structural Break Challenge
      • ADIA Lab Causal Discovery
      • ADIA Lab Market Prediction Competition
    • Rallies
      • Mid+One
      • DataCrunch Rally
      • X-Alpha Rally
    • Participate
    • Teams
      • Managing
      • Referendums
      • Leaderboard
      • Rewards
    • Data
    • Code Interface
    • Leaderboard
      • Duplicate Predictions
    • Resources Limit
    • Whitelisted Libraries
    • Known Issues
  • CRUNCH Token practical
    • Release Map
  • Credits
    • Avatar
  • Other
    • Glossary
Powered by GitBook
On this page
  • Global variables
  • NaN in the Cloud, but none locally
  • Use the same index
  • Reset the index
  • CatBoostError: Can't create train working dir: catboost_info error
  • Change the train directory to: /tmp
  • Change the train directory to: model_directory
  • Other
  1. Competitions

Known Issues

A list of known problem and how to solve them.

Global variables

If you want to have global variables in your notebook and you do not want them to be commented out, please put them in a class:

class Constants:

    TRAIN_DEPTH = 42
    IMPORTANT_FEATURES = [ "a", "b", "c" ]

def infer():
    print(Constants.TRAIN_DEPTH)
    # 42

NaN in the Cloud, but none locally

The index might not always be the same in the Cloud and locally as the data that is shared is different.

Depending on your code, this could cause some issues when trying to do operation between pandas objects that do not share the same index.

# will use a range index, from 0 to len(X_test)
final_ensemble = pd.Series(
    [0] * len(X_test),
    dtype='float'
)

# pandas objects do not share the same index, it will likely result in only nans
final_ensemble += (X_test.loc[:, 'some_colomn'] * 2)

Use the same index

final_ensemble = pd.Series(
    [0] * len(X_test),
    index=X_test.index,
    dtype='float',
)

Reset the index

X_test.reset_index(inplace=True)

# then do your operations
final_ensemble += (X_test.loc[:, 'some_colomn'] * 2)

CatBoostError: Can't create train working dir: catboost_info error

CatBoost create a directory for persisting his state. But the Run does not allow you to create file anywhere.

Change the train directory to: /tmp

If the state doesn't need to be persisted, the /tmp directory is the way to go.

model.set_params(train_dir='/tmp/catboost_info')

Change the train directory to: model_directory

If the state does need to be persisted, store everything inside the model_directory as this folder will be reused for the Out-of-Sample phase.

info_path = os.path.join(model_directory, 'catboost_info')
model.set_params(train_dir=info_path)

Other

If your problem is not listed, don't hesitate to reach the team!

PreviousWhitelisted LibrariesNextRelease Map

Last updated 7 months ago

This issue should be fixed automatically (by monkey patching the catboost library).