Obesity ML Competition: Tackling Metabolic Diseases
Can you design algorithms that identify genes driving obesity and metabolic disease?
You will get to work with cutting-edge biological data collected in partnership with the Eric and Wendy Schmidt Center at the Broad Institute, the Broad Diabetes Initiative, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center. Your algorithms could directly guide biological discoveries in obesity!
Quick TL;DR
The Goal: Identify genetic "switches" that can trick human cells into burning fat instead of storing it.
The Data Science: You will be given Single-Cell RNA sequencing (scRNA-seq) data from cells where specific genes have been knocked out (turned off) using CRISPR/Cas9.
The Challenge: Build a model to predict how a cell’s behavior and development change when a new, unseen gene is turned off.
Introduction
The Biology: Storing Energy vs. Burning Energy
Not all body fat is created equal. Our bodies primarily rely on two types of fat cells (adipocytes) to manage energy:
White Fat: The "storage" cells. They hold onto excess energy from food. When we store too much, it leads to obesity and metabolic diseases like Type 2 Diabetes.
Brown Fat: The "furnace" cells. Instead of storing energy, they burn sugar and fat to generate heat (thermogenesis).
Most current obesity drugs work by making people feel less hungry. However, scientists are exploring a different approach: What if we could convince the body to produce more brown fat, or trigger white fat to start burning energy?
The Experiment: Flipping Genetic Switches
To find the biological "switches" that control these processes, researchers are conducting massive experiments. They harvest human fat-cell precursors and use CRISPR/Cas9 technology to turn off specific genes, one by one. They then observe:
Does the cell still become a fat cell?
Does it act like a white fat cell (storage) or a brown fat cell (burning)?
How does the cell's internal machinery (gene expression) change?
Why cannot we physically test every single gene in the human genome?
There are 20,000 genes in our genomes! It would take too much time and money. We need Machine Learning to fill in the gaps. By training on the existing experimental data, your model will predict the biological outcome of turning off genes that haven't been tested yet.
Why this Matters?
Obesity affects over 890 million people worldwide and is a primary driver of cardiovascular disease and cancer. While weight-loss drugs exist, they don't work for everyone and often carry side effects.
To develop better therapies, we need to understand the fundamental code of metabolism:
Which genes tell fat cells how to develop?
Which genes help turn white fat cells into brown/heat-producing fat cells?
Which genes change how cells store or burn fat?
If your model can successfully simulate how perturbing genes affect fat cells, it will allow researchers to screen thousands of potential drug targets purely computationally. This could dramatically accelerate the discovery of medicines that restore metabolic balance and fight disease.
The Challenge
In this competition, you are working with single-cell RNA sequencing data, which indicates how much each gene is expressed in each single cell. Scientists have "knocked out" (deleted) different genes in fat-cell precursors and measured how the cells changed.
Your job is to build a model that predicts what happens when scientists turn off new genes that appear in the test set.
You must predict two specific outcomes for these unseen gene knockouts:
1. The "Internal State" (Gene Expression Profiles)
You must predict the gene expression profile of single cells after the target gene is knocked out. This is a high-dimensional vector representing the activity levels of thousands of genes within the cell.
2. The "Cell Identity" (Cell Type differentiation)
Normally, these precursor cells differentiate into specific types of fat cells. You must estimate the resulting proportions of four distinct cell states:
Pre-adipocyte: Early-stage cells that haven't differentiated yet.
Adipocyte: Mature fat cells (the standard white fat).
Lipogenic: Specialized fat-producing cells.
Other: Cells that followed a different developmental path.
Explore the full specifications for in-depth details.
Phases
The challenge is broken down into three Crunches.
Crunch 1 – Predicting the effect of held-out single-gene perturbations
From Dec 8 to May 1, Crunchers will build a model to predict the single-cell transcriptomic response to unseen single-gene perturbations.
Crunch 2 – Predicting the effect of held-out double-gene perturbations
From Feb 28 to May 1, Crunchers will build a model to predict the single-cell transcriptomic response to unseen double-gene perturbations. It will be very similar to the Crunch 1, so we highly encorage participation to both!
More details about this challenge will be announced soon!
Crunch 3 – Identifying combinatorial perturbations to drive white and brown adipocyte differentiation
Crunchers will predict combinatorial perturbations to drive adipocyte differentiation, and The Eric and Wendy Schmidt Center will test these perturbations directly in the lab!
More details about this challenge will be announced soon!
Timeline
December 2025:
Beginning of Crunch 1
February 2026:
Closing of Crunch 1
Beginning of Crunch 2
March 2026:
Closing of Crunch 2
Beginning of Crunch 3
April 2026:
Closing of Crunch 3
Evaluation Criteria
For each Crunch, participants must submit predictions in h5ad format along with matrices containing the predicted cell state proportions for each perturbation.
Outputs will be evaluated using:
Pearson Delta (Crunch 1 & 2)
Maximum Mean Discrepancy or MMD (Crunch 1 & 2)
L1-Distance (Crunch 1 & 2)
To avoid overfitting, Crunch will only publish the public leaderboard once each week.
Prizes
Crunch 1 and Crunch 2 will use multiple metrics to rank participants. Your final prize will be the sum of the prizes for each metric that you or your team ranks in.
All prizes are in USDC, a cryptocurrency with the same value as the US dollar.
Crunch 1
1st place
1,400
1,400
1,400
2nd place
800
800
800
3rd place
480
480
480
4th place
320
320
320
5th place
240
240
240
6th place
200
200
200
7th place
200
200
200
8th place
160
160
160
9th place
120
120
120
10th place
80
80
80
Total
4,000
4,000
4,000
Crunch 2
1st place
1,400
1,400
1,400
2nd place
800
800
800
3rd place
480
480
480
4th place
320
320
320
5th place
240
240
240
6th place
200
200
200
7th place
200
200
200
8th place
160
160
160
9th place
120
120
120
10th place
80
80
80
Total
4,000
4,000
4,000
Crunch 3
1st place
7,000
2nd place
6,500
3rd place
5,000
4th place
3,000
5th place
1,000
6th place
900
7th place
800
8th place
700
9th place
600
10th place
500
Total
26,000
External Resources
Crunchers are encouraged to use publicly available external resources, including gene perturbation datasets and pre-trained models, as long as they are properly credited.
References
Below are a few references meant to provide more background and some of the approaches researchers are applying in the fields relevant to these Crunches. This is not meant to be an exhaustive list and many important works are not listed here.
Last updated
