Known Issues

A list of known problem and how to solve them.

Global variables

If you want to have global variables in your notebook and you do not want them to be commented out, please put them in a class:

class Constants:

    TRAIN_DEPTH = 42
    IMPORTANT_FEATURES = [ "a", "b", "c" ]

def infer():
    print(Constants.TRAIN_DEPTH)
    # 42

NaN in the Cloud, but none locally

The index might not always be the same in the Cloud and locally as the data that is shared is different.

Depending on your code, this could cause some issues when trying to do operation between pandas objects that do not share the same index.

# will use a range index, from 0 to len(X_test)
final_ensemble = pd.Series(
    [0] * len(X_test),
    dtype='float'
)

# pandas objects do not share the same index, it will likely result in only nans
final_ensemble += (X_test.loc[:, 'some_colomn'] * 2)

Use the same index

final_ensemble = pd.Series(
    [0] * len(X_test),
    index=X_test.index,
    dtype='float',
)

Reset the index

X_test.reset_index(inplace=True)

# then do your operations
final_ensemble += (X_test.loc[:, 'some_colomn'] * 2)

CatBoostError: Can't create train working dir: catboost_info error

CatBoost create a directory for persisting his state. But the Run does not allow you to create file anywhere.

Change the train directory to: /tmp

If the state doesn't need to be persisted, the /tmp directory is the way to go.

model.set_params(train_dir='/tmp/catboost_info')

Change the train directory to: model_directory

If the state does need to be persisted, store everything inside the model_directory as this folder will be reused for the Out-of-Sample phase.

info_path = os.path.join(model_directory, 'catboost_info')
model.set_params(train_dir=info_path)

Other

If your problem is not listed, don't hesitate to reach the team!

Last updated