ADIA Lab Structural Break Open Benchmark Challenge
New edition of the ADIA Lab Structural Break Challenge with a new dataset.
Overview
The ADIA Lab Structural Break Open Benchmark Challenge is a continuation of the original ADIA Lab Structural Break Challenge.
The benchmark addresses the same scientific problem, uses data with the same structure and characteristics, and preserves the original evaluation philosophy. It extends the original challenge into a long lived, continuously evaluated public benchmark.
Scientific Continuity
The open benchmark preserves:
The original research question
The data generating philosophy
The evaluation metric
The definition of structural breaks
The objective is not to reset the problem, but to extend it over time under comparable conditions.
Changes Introduced
Originality Constraint:
An originality metric has been introduced to prevent solution cloning and excessive convergence.
Submissions that are highly correlated with any of the original top models will be rejected.
Leaderboard Initialization:
The top ten models from the original competition are visible on the leaderboard as fixed reference benchmarks.
Out of Sample Evaluation Schedule:
Models are evaluated on a rolling out of sample basis.
Scoring occurs at the end of each calendar quarter, starting with Q1 2026. The first official scoring date is March 31st, 2026.
Possible Use
This benchmark is intended for:
Academic research on structural breaks
Method comparison under regime change
Long horizon robustness evaluation
It is designed to remain open and relevant beyond a single competition cycle.
Problem Description
Participants are asked to build models that determine whether a structural break has occurred at a known breakpoint in a univariate time series.
A structural break corresponds to a change in the underlying data generating process. Such breaks are common in economic and financial time series and can significantly impact inference, forecasting, and decision making.
Each time series consists of two segments separated by a known boundary. The task is to assess whether the statistical properties of the two segments differ in a way consistent with a structural break.

Timeline
Opening of the new competition: December 18, 2025
Quarterly Out-of-Sample: March 31, 2026
Quarterly Out-of-Sample: June 30, 2026
Quarterly Out-of-Sample: September 30, 2026
Quarterly Out-of-Sample: December 31, 2026
Evaluation
For each time series in the test set, your task is to predict a score between 0 and 1, where values towards 0 mean that no structural break occurred at the specified boundary point, and values towards 1 mean that a structural break did occur. The evaluation metric will be the ROC AUC (Area Under the Receiver Operating Characteristic Curve), which measures the performance of detection algorithms regardless of their specific calibration.
A ROC AUC value around 0.5 means that the algorithm is not able to detect structural breaks better than random chance, while values approaching 1.0 indicate perfect detection. The ROC AUC allows us to compare the output of different detection methods by removing the specific bias of each method towards false positives or false negatives.
The competition follows a two-stage evaluation process:
Public Leaderboard: Each submission is immediately scored against a portion of the test data, and results appear on the public leaderboard;
Private Leaderboard: At the end of the competition, selected submissions are evaluated on the remaining test data to determine the final rankings.
This approach ensures that models are evaluated on their ability to generalize rather than potentially overfitting to the public leaderboard data.
Code Submission
This is a code competition where participants are required to submit their Python code (files or notebooks) directly to the CrunchDAO platform. Your submission should:
Process and analyze the data;
Output a score between
0and1for each time seriesidin the test set, representing the likelihood of a structural break;Your code must produce deterministic output, or it will be ineligible for any rewards;
Only the team leader will be ranked on the leaderboard, and be eligible for a reward.
Your submitted code will be executed on the competition platform and automatically scored against a portion of the test set. Shortly after submission, your score will appear on the public leaderboard of the competition.
Submission Requirements
Your main solution file should follow the template provided by the competition host here;
Your solution must include
train()andinfer()functions. The first one is meant to train your model on the training set, in case your model needs that, otherwise it can be left empty. The second one takes the test data as input and returns predictions;The execution time of your solution should not exceed the platform's time limits: 15 hours per week.
The yield-based approach is now optional.
If no such use is detected, the entire X_test will be provided as a DataFrame.
Dataset Description
The dataset for this competition comprises tens of thousands of synthetic univariate time series, each containing approximately 1,000 to 5,000 values with a designated boundary point. For each training time series, a label (True for break, False for no break) indicates whether a structural break occurred at this boundary point.
The time series in this competition are designed to represent various real-world scenarios where structural breaks may occur, with different levels of difficulty in detection. This includes scenarios similar to those found in financial markets, climate data, industrial sensor readings, and biomedical signals, among others. The challenge is to develop algorithms that can generalize across these scenarios and accurately detect structural breaks in new, unseen data.
Data Format
For the training data, the X variable will be provided as a pandas.DataFrame with a MultiIndex structure. Here's an example of the format:
The DataFrame has the following structure:
A
MultiIndexwith two levels:id: Identifies the unique time series (each time series has a unique ID)time: The timestep within each time series
The columns include:
value: The actual time series value at that timestep;period: A binary indicator where0represents the period before the boundary point, and1represents the period after the boundary point. The structural break occurs at the change in value, but it may take some time (values) to become detectable/apparent.
The y variable is a boolean pandas.Series, with id as index, indicating whether a structural break
occurred at the boundary point for that time series (True if there was a break, False otherwise).
The test data will follow the same format. Your code will need to process these time series and generate predictions of the likelihood of a structural break for each unique time series ID.
Data Size
The training set consists of 10,000 datasets and is the same as last year's.
There will also be another 10,000 new datasets for each of the public and private test sets.
The testing data provided for local usage only consists of 100 datasets. Participants must consider that their code will run on 100 times more datasets, with a maximum limit of 15 hours of computing time.
A determinism check will re-run the infer function on 30% of the data (3,000 datasets) to ensure your model is deterministic. The results must be equal, with a tolerance of 1e-8.
Comparison against the Top 10
The participants with the highest scores in the original challenge will be used as fixed reference points. These models will only be used as a baseline for benchmarking purposes and will not be eligible for prizes.
*Previous edition scores.
Methodology Suggestions
Methods such as change point detection algorithms, tests for equality of distributions, anomaly detection, or supervised learning models can be utilized to recognize patterns associated with structural breaks. Some approaches to consider include:
Statistical tests comparing the distributions before and after the boundary point;
Feature extraction from both parts of the time series for comparative analysis;
Time series modeling to detect deviations from expected patterns;
Deep learning approaches for automated pattern recognition.
Careful preprocessing of the time series data is an essential step in developing robust detection models.
The ultimate goal is to develop reliable algorithms for detecting structural breaks in time series data across various domains where such changes have significant implications for decision-making and risk management.
Prizes
1st place
$3,000 USD
2nd place
$2,000 USD
3rd place
$1,000 USD
Scoring occurs at the end of each calendar quarter, starting with Q1 2026.
FAQ
Last updated
