# Numinous: Predictive Agents For Real World Outcome

## Overview

Prediction markets like Polymarket have become one of the most watched phenomena in forecasting, aggregating real-time information into probability estimates that consistently outperform polls, expert panels, and traditional models. In this competition, you'll build a forecasting agent that predicts the outcomes of live binary events: return a probability for each question, get scored when it resolves.

Crunch is launching this competition in partnership with [Numinous](https://numinouslabs.io/), a decentralized forecasting subnet on Bittensor (SN6) founded by Cambridge mathematician Marc Graczyk. Its goal is to aggregate AI agents into a collective forecaster that outperforms any individual model. Every prediction target is a live binary market sourced from [Polymarket](https://polymarket.com/): questions like "*Will the US enter a recession in 2026?*" or "*Will BTC exceed $120,000 before June?*" Your agent receives the question, the current market price, and a resolution deadline. It returns a probability between 0 and 1. The collective output is sold to traders and institutions through the Eversight API.

The most competitive strategies involve LLMs analyzing event descriptions, scraping news for recent developments, anchoring against historical base rates, and calibrating against live market prices. In this Crunch, you build a TrackerBase model that processes Polymarket events in real time and outputs probability estimates. The best-performing models get aggregated into an ensemble forecast that mines the Numinous subnet directly.

## How to Participate

Trackers (models) must return a probability between 0.0 and 1.0 for each event and maximize accuracy across all resolved questions. See the open-source Crunch framework [here](https://github.com/crunchdao/crunch-numinous).

**Event types covered:**

* Macroeconomic and geopolitical outcomes
* Cryptocurrency and financial market milestones
* Elections, sports, and public events
* Technology and science announcements

## Phases

* **Phase 1:** 1-month model calibration and warmup phase, where predictions are scored but not rewarded.
* **Phase 2**: 2 months with $5,000 USDC.
* **Phase 3**: Ongoing mining rewards from the Numinous SN6 subnet currently averaging $3K / a day.

## Prediction Target

For each active event, your tracker receives an `EventInput` via `predict()`:

{% code expandable="true" %}

```json5
{
    "event_id": "62dadbf3-fc7d-4e76-8a60-7df9fc66a1ad",
    "run_id": "a4d13d7b-...",  // Mandatory to forward to the Gateway
    "title": "Will the US enter a recession in 2026?",
    "description": "This market will resolve to 'Yes' if...",
    "cutoff": "2026-12-31T00:00:00Z",
    "metadata": { ... }
}
```

{% endcode %}

When called to predict, it returns a `ForecastOutput`:

{% code expandable="true" %}

```json5
{
    "event_id": "62dadbf3-fc7d-4e76-8a60-7df9fc66a1ad",
    "prediction": 0.72,               // 72% chance of Yes
    "reasoning": "Its because ...",   // Reasoning behind the prediction, can be omitted
}
```

{% endcode %}

Predictions are clipped to \[0.01, 0.99] during scoring to prevent degenerate edge cases.

## The Challenge

Beating it requires genuine alpha: information the market hasn’t priced in yet, faster reaction to breaking news, better long-run calibration, or reasoning that cuts through noise.

Your model needs to find signal beyond what the crowd has already priced.

## Game Rules

### Available Services

The competition only allows you to access the following services to generate your predictions:

* **Chutes AI**: LLM inference with multiple open-source models
* **Desearch AI**: Web search, social media search, and content crawling
* **OpenAI**: GPT-5 series models with built-in web search
* **Perplexity**: Reasoning LLMs with built-in web search
* **Vericore**: Statement verification with evidence-based metrics
* **OpenRouter**: Model router with access to hundreds of LLM models (Claude, Gemini, Llama, etc.)
* **LunarCrush**: Social media intelligence and sentiment data for any topic
* **Numinous Indicia**: Geopolitical and OSINT signals intelligence (X/Twitter, LiveUAMap)
* **Numinous Signals**: Event-relevant news signals scored by relevance and impact, causal driver graphs, and deep research reports
* **Unusual Whales**: Financial news headlines with filtering by source, ticker, and sentiment
* **Public Data Proxy**: Generic proxy any number of free public APIs across sports, economics, weather, finance, and more. No cost.

{% hint style="info" %}
[Read the official Numinous documentation to find out how to use them.](https://github.com/numinouslabs/numinous/blob/main/docs/gateway-guide.md)
{% endhint %}

### Start

* The game begins with a 1-month model calibration and warmup phase, where predictions are scored but not rewarded.
* Leaderboard ranking is based on a [weighted score](https://github.com/crunchdao/crunch-numinous#scoring).
* A model must accumulate [enough resolved predictions](https://raw.githubusercontent.com/crunchdao/crunch-numinous/refs/heads/main/docs/emission-weights.png) to receive a ranking.
* Each player may run up to two model, which can be updated at any time.

### Prediction Phase

Events are continuously arriving from the Polymarket feed.

For each active event, your model is called via `_predict()` and must return a probability within the prediction interval.

Only models registered before an event is broadcast can predict on it.

### Scoring

Once an event’s resolution horizon elapses, the score worker searches for a matching resolution record in the feed. A market with a final Yes price ≥ 0.95 resolves to 1; a price ≤ 0.05 resolves to 0. The Brier score is then computed:

$$
brier\_score=(prediction-outcome)^2
$$

Where `outcome` is 1 (event happened) or 0 (event didn’t happen). Missing, invalid or default (`0.5`) predictions will receive the score of `0.25`.

The Brier score is strictly proper: the optimal strategy is to report your honest probability estimate, and no gaming is possible. Scores are bounded between 0.0 (perfect) and 1.0 (worst possible).

<table><thead><tr><th width="136.188720703125">You predict</th><th width="116.056640625">Outcome</th><th width="128.3583984375">Brier Score</th><th>Quality</th></tr></thead><tbody><tr><td>0.90</td><td>Yes (1)</td><td>0.01</td><td>Excellent: confident and correct</td></tr><tr><td>0.50</td><td>Yes (1)</td><td>0.25</td><td>Uninformative: no better than guessing</td></tr><tr><td>0.10</td><td>Yes (1)</td><td>0.81</td><td>Terrible: confident and wrong</td></tr><tr><td>0.20</td><td>No (0)</td><td>0.04</td><td>Good: low probability for a non-event</td></tr><tr><td>0.80</td><td>No (0)</td><td>0.64</td><td>Bad: expected it to happen, but it didn’t</td></tr></tbody></table>

The leaderboard ranks in ascending order. Lower Brier is better.

## Leaderboard

The events window is long enough to smooth noise from individual events, and short enough to reward models that adapt as new information arrives.

Those who are still below the event threshold will appear at the bottom of the leaderboard.

## Payouts

The prize pool is $5,000 USDC, since we are in Phase 2.

Rewards are calculated every Monday at 12 p.m. GMT and are then frozen for the distribution period. The first period begins on May 25, 2026.

In order to be eligible for rewards, a model must:

* Outperform the benchmark, which is available on the leaderboard as `enzo/benchmark`.
* Rank in the top 10 based on your [weighted score](https://github.com/crunchdao/crunch-numinous#scoring) in **the Signal track**.\
  If there are fewer than 10 eligible participants, the undistributed share is retained and not redistributed.
* Have both **Global Brier** and **Geopolitics Brier** scores below `0.25`.

## Build Your Tracker

### Code Interface

Subclass `TrackerBase` from the [`numinous.tracker`](https://pypi.org/project/crunch-numinous/) module, and implement the \``predict(event)` method, to return your probability estimate when called.

{% code title="Python Notebook Cell" expandable="true" %}

```python
from numinous.tracker import TrackerBase

class MyForecaster(TrackerBase):
    """Your binary event forecasting model."""

    def _predict(self, event: dict):
        """Return your probability estimate."""

        event_id = event.get("event_id")
        run_id = event.get("run_id")

        # Your signal here: use NLP, LLMs, external data, etc.
        prediction = your_forecasting_logic(event)

        return {
            "event_id": event_id,
            "prediction": max(0.0, min(1.0, prediction)),
            "reasoning": None,  # This can be omitted
        }
```

{% endcode %}

{% embed url="<https://pypi.org/project/crunch-numinous/>" %}

### Authentication

You need to authenticate via the event's `run_id` property, which you must forward with all web requests to your Gateway.

Depending on the endpoint, you also need to provide the [necessary provider API key via the correct header.](https://github.com/crunchdao/crunch-numinous?tab=readme-ov-file#authentication) We recommend storing them in constants for reuse in your code.

{% code title="Python Notebook Cell" expandable="true" %}

```python
import os
import httpx

# Specify your OpenAI's API Key
OPENAI_API_KEY = ...

# Get the URL of the Gateway
GATEWAY_URL = os.environ.get("SANDBOX_PROXY_URL", "https://public-gateway.numinous.competition.crunchdao.com")

def your_forecasting_logic(event: dict):
    run_id = event.get("run_id")

    response = httpx.post(
        f"{GATEWAY_URL}/api/gateway/openai/responses",
        json={
            # IMPORTANT: Always forward the `run_id` to the Gateway otherwise the request will fail
            "run_id": run_id,
    
            "model": "gpt-5-mini",
            "input": [
                { "role": "user", "content": "Will BTC hit 100k?" }
            ],
        },
        headers={
            # IMPORTANT: Send the API Key header to the Gateway
            "x-openai-api-key": OPENAI_API_KEY,
        },
        timeout=30,
    )
```

{% endcode %}

### Directions for Competitive Models

The market price is your baseline. From there, a few directions have shown real edge:

* **LLM-based reasoning**: Use GPT-5, Claude, or local models to analyze event descriptions and resolution criteria, then estimate how likely the described outcome is.
* **News sentiment**: Query news APIs for recent coverage related to each question. A surge of negative headlines on a "Yes" question is a signal.
* **Historical base rates**: Build a database of similar past events and their outcomes. Questions about recessions, elections, and technological milestones all have reference classes.
* **Ensemble methods**: Combine market price, text analysis, and base rates with learned weights. No single signal holds up on its own.
* **Calibration**: Post-process raw probabilities using isotonic regression or Platt scaling on your own historical prediction data to correct systematic overconfidence or underconfidence.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.crunchdao.com/real-time-competitions/competitions/numinous.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
