Duplicate Predictions
How is the duplicate badge triggered.
Some competitions have a duplicate detection feature that flags all models that have a prediction correlation above and makes them ineligible for prizes.
The correlation is computed using pandas.DataFrame.corr(method="spearman")
.
Grouping
The prediction are first grouped between:
user, to avoid duplication between multiple models
team member, to avoid duplicate between all models of every team member
The correlation function is then called for each prediction against each other.
Keeping
Predictions are then grouped into correlated pairs to isolate them.
The first model created in a pair is always retained. This makes the choice deterministic. Other models are treated as copies of the first.
e.g.:
If A & B are considered duplicates, and C & D are also considered duplicates, then A and C will be retained and B and D will have the duplicate badge.
However, if B & C are also considered duplicates, then only A will be retained. B, C, D will have the duplicate badge.
Last updated