El Niño Probability Tracker, Methodology Overview

This document describes how the weekly El Niño probability brief is constructed. It is meant for someone reviewing the methodology cold, without prior context on the project.

What this is

A weekly internal probability tracker focused on the 2026-27 El Niño event. The brief reports peak-season strength probabilities (DJF 2026-27, with NDJ used as a proxy when CPC's strength table doesn't extend that far) and a small set of physical-state observations.

It is an aggregator, not an original model. We harmonize forecasts from publicly published agency outputs into a common framing (traditional Niño 3.4 ONI), surface disagreements between centers, and let the reader judge. We do not run any custom statistical or ML model. The historical sample of super El Niño events is small (n~4) and under-determined for any classifier or regression that would beat the agency forecasts.

Headline buckets

The brief reports four cumulative probability buckets in traditional Niño 3.4 ONI terms (3-month running mean SST anomaly vs 1991-2020 climatology, peak season DJF):

The +2.5 °C bucket is reported as a lo-hi range when it depends on how the open-ended top RONI bin is discretized. See "RONI to traditional ONI" below.

Sources

Seven publicly-published sources are fetched live each Monday. Each carries an issued date stamped by the agency, distinct from when we fetched it. The diff layer uses issued dates to distinguish "agency re-released this week" from "agency stale, we're carrying forward".

Source Contributes Cadence
NOAA CPC ENSO strength table Quantitative bin-by-bin probabilities (RONI) for 9 overlapping seasons Monthly, 2nd Thursday
OISST weekly Niño 3.4 Current weekly traditional-ONI anomaly Weekly, Mondays
CPC heat content index 0-300m subsurface heat content anomaly, 180W-100W Monthly
IRI ENSO Quick Look 3-category probabilities (La Niña / Neutral / El Niño) for 9 seasons Monthly, ~19th
BoM ENSO outlook Australian Bureau categorical alert + summary Fortnightly
ECMWF SEAS5 (via Copernicus CDS) 51-member Niño 3.4 SST forecast, leads 1-6 Monthly, ~5th
ERA5 cumulative westerly wind anomaly (CWWA) Cumulative positive 850 hPa zonal-wind anomaly since March 1, m/s · days Continuous (5-day lag)
ERA5 spatial-peak WWB events Count + detail of westerly wind bursts (sliding 5x10 deg sub-region area-mean, dual threshold) Continuous (5-day lag)

If a fetcher fails, the brief falls back to the last successful cache for that source, then to a hand-curated seed value, and surfaces the fallback in a "Source freshness" panel at the bottom of the brief. The pipeline is designed to never fail to produce a brief on Monday.

Quantitative aggregation

RONI to traditional ONI

NOAA CPC switched from traditional ONI to RONI (Relative Oceanic Niño Index) as the official ENSO index in February 2026. RONI is the traditional Niño 3.4 SST anomaly minus the tropical-mean SST anomaly, which removes the global-warming background warming signal from the ENSO measurement.

For the brief, all headline buckets are stated in traditional ONI because that is what most readers and most analog references (1997-98, 2015-16) are expressed in. By definition the offset offset = ONI − RONI = tropical_mean_SST_anomaly. We compute it directly each week from CPC's published indices: the difference between the most recent week's traditional Niño 3.4 anomaly (wksst9120.for) and relative Niño 3.4 anomaly (rel_wksst9120.txt) gives the offset, observed live. The brief reports it in the section 1 preamble. The current offset around +0.4 °C reflects the present tropical-mean warmth; previously a static +0.3 °C had been used and would have drifted under the warming trend.

The forecast horizon (NDJ 2026-27) introduces a small additional uncertainty: we use the latest observed weekly offset as the best estimate for the offset at the target season, on the assumption that tropical-mean SST anomaly drifts only slowly (the 30-year trend is ~+0.15 °C/decade; seasonal variability is small in the tropics). The brief flags whether the offset is live-fetched or seeded.

Bin-interior shape: skew-normal fit, not uniform mass

CPC publishes a strength table in 0.5 °C-wide RONI bins, with the top bin open-ended at >= +2.0 °C. To convert these bin probabilities to the probability that traditional ONI exceeds a specific threshold (e.g., +2.5 °C), we need an assumption about how mass is distributed within each bin.

The brief fits a skew-normal distribution to the nine bin probabilities (loc, scale, shape; minimized via BFGS to match observed bin masses) and evaluates the survival function at each headline threshold. This is more defensible than a uniform-within-bins assumption because peak Niño 3.4 anomaly distributions are inherently right-skewed (rare super events sit in the right tail with low probability mass that a uniform interpolation would misallocate).

Sensitivity range on the +2.5 °C bucket comes from a bootstrap that jitters each bin probability by Gaussian noise (sigma = 1 percentage point, matching CPC's whole-percent reporting precision), refits the skew-normal, and reports the 5th and 95th percentile of the resulting +2.5 °C survival probabilities. The range therefore captures reporting-quantization uncertainty in CPC's published table; it does not capture forecast uncertainty in the underlying CPC ensemble or methodological uncertainty in the choice of distribution family.

Headline smoothing: CPC anchor with bounded SEAS5 deflection (v1.5)

CPC reissues the strength table monthly, on the 2nd Thursday with the ENSO Diagnostic Discussion. Between issuances, the raw skew-normal output is mathematically frozen for 4-5 weeks, and during that window new observational evidence and updated model ensembles can move the underlying probabilities materially. The brief's audience is climate- curious operators who read it weekly; a static headline reads as a brief that has stopped tracking the event between CPC's monthly releases.

We close that gap by deflecting CPC's anchor probability each week based on the current ECMWF SEAS5 ensemble, while keeping CPC firmly in charge of the headline.

For each headline bucket b, given anchor probability p_anchor[b] (the skew-normal-fitted CPC probability above threshold b in traditional-ONI terms) and current SEAS5 fraction p_seas5[b] (the share of the 51-member ensemble exceeding threshold b at the longest available lead, in traditional-ONI-equivalent terms after model- climatology subtraction):

deflection[b] = clamp(W × (p_seas5[b] − p_anchor[b]), −cap, +cap)
headline[b]   = clamp(p_anchor[b] + deflection[b], 0, 100)

with W = 0.2 (SEAS5 contributes 20% of the gap to the anchor) and cap = ±10 percentage points per bucket per week.

When CPC reissues the strength table, p_anchor jumps to the new CPC value and the deflection is recomputed against the new anchor; there is no carryover. The diff section surfaces both the CPC anchor change and the new deflection separately so the reader can see where the weekly motion came from.

Why these specific values:

What this is not:

Audit trail: every headline number can be reconstructed from (p_anchor, p_seas5, W, cap), all of which are reported in the brief's caveat section per issue. A reviewer can replicate the math without access to private state.

ECMWF SEAS5 cross-check

ECMWF's SEAS5 produces 51 ensemble members per monthly forecast. For each member at the longest available lead month (typically 6 months out), we compute the Niño 3.4 area-mean SST and subtract the SEAS5 model climatology, computed as the mean across 24 years of hindcasts (1993-2016, 25 members per year, same start month and lead). The result is one anomaly per member, in traditional ONI units.

We then count members above {+1.0, +1.5, +2.0, +2.5} °C and report both the counts and the median.

These ECMWF numbers are presented as a cross-check to CPC, not averaged in. The two centers can disagree materially: the choice to surface that disagreement rather than smooth it away is deliberate.

Why model climatology, not observational

ECMWF SEAS5 has a known warm bias in the equatorial Pacific. Subtracting the model's own climatology removes that bias from the anomaly count; subtracting an observational climatology (the more common practice in news summaries) does not. The model-anomaly approach answers "the model itself thinks this run is X °C above its own typical forecast for this calendar window," which is a cleaner signal-detection question. The observational-anomaly approach answers "the model output, evaluated against the real-world climatology," which mixes signal with the warm bias.

We use the model approach for the brief's headline numbers and note the observational comparison would shift the count higher in the caveat text.

Spatial-peak westerly wind burst detection (v1.7)

CWWA (above) gives a smooth cumulative scalar but is blind to burst structure: a 5-day intense burst at 7N generates a Kelvin wave that 50 days of weak equatorial westerlies of the same area-integrated magnitude would not, yet CWWA scores them similarly. v1.6 added a complementary discrete-event indicator that captures the burst structure directly; v1.7 (current) refines the event-detection algorithm to fix a count-merging issue documented under v1.6.

Method (fetchers/era5_burst.py):

  1. Pull ERA5 daily 850 hPa zonal wind at 12 UTC over the wider domain 10N-10S, 130E-150W (broader than CWWA's 5N-5S, per Daniel Swain's 2026-05-10 observation that productive bursts often sit just outside the equatorial band).
  2. Build a full-field 1991-2020 same-calendar-day climatology over the same domain (chunked monthly, 30y x 1mo per CDS call, six chunks total). Cached on disk; re-pulled only when invalidated.
  3. For each observation day, compute the full anomaly field obs(lat, lon) - clim(mmdd, lat, lon).
  4. Slide a 5deg lat x 10deg lon window over the anomaly field; the day's "spatial peak" is the maximum area-mean anomaly across all window positions (computed via scipy.ndimage.uniform_filter for efficiency, with edge pixels masked to avoid biased edge windows).
  5. From the daily-spatial-peak time series, detect events via peak-detection with a recovery interval (v1.7, replaces v1.6 run-detection):

a. Candidate peaks: days where the spatial peak exceeds the peak threshold (7 m/s). b. Non-maximum suppression: sort candidates by amplitude descending. Greedily select the strongest, then suppress all candidates within +/- 10 days of any already-selected peak. Repeat. This yields a set of distinct peak days separated by at least 10 days each, matching the typical inter-burst separation in the super-event literature (McPhaden 1999, Lengaigne et al. 2003). c. Event boundaries: for each surviving peak, walk outward while the spatial peak stays above the base threshold (5 m/s), bounded by the midpoint to neighboring selected peaks. This lets a sustained westerly period containing multiple sub-bursts split into multiple events instead of collapsing into one. d. Duration filter: drop events shorter than 5 days.

Each detected event carries start date, end date, duration, peak amplitude, and the peak date (the day of the surviving local maximum). The fetcher returns the current-year event list plus full event detail for each analog year (1997, 2015, 2023, 2025).

Cache architecture. The full-field climatology is cached once. Per-year spatial peak series are cached as algorithm-independent JSON. Per-year event lists are cached with an algorithm-version suffix (_v17). When the detection algorithm changes again, only the events caches need invalidating; peak series are reused. No CDS re-pull needed for analog years.

Relationship to CWWA. CWWA and WWB are presented as complementary, not as a swap. CWWA remains useful for season-on- season comparisons (a smooth cumulative scalar). WWB is the operationally important diagnostic for "did a Kelvin-wave-generating burst occur." Brief readers see both numbers; the case where CWWA is low but WWB count is non-zero is exactly the methodologically interesting one Swain flagged.

Parameter choices. RECOVERY_DAYS = 10 is the central tunable parameter introduced in v1.7. The typical autocorrelation length of equatorial WWBs in the 850 hPa zonal wind field is 7-14 days (McPhaden 1999 Figure 3; Lengaigne et al. 2003 Section 3). At 10 days the algorithm separates distinct bursts riding on a sustained westerly background without over-splitting a single burst's natural multi-day shape. A sensitivity check at 7 and 14 days produces event counts within +/-1 of the production value for all analog years; the analog ordering is preserved.

v1.6 algorithm (deprecated). The prior detector counted "consecutive runs of days above the base threshold" as single events. During persistently westerly periods, this merged multiple distinct bursts into one long event: 1997's first event ran 71 days, 2015's 104 days, even though each contained 3-5 distinct sub-bursts. v1.7 splits these correctly by peak. The methodology change log entry under "Versioning and change log" documents the swap and resulting count changes.

What is explicitly out of scope

Impact aggregation

Starting in v1.3, the brief includes an "Impact outlook" section after the analog tracker. This section aggregates institutional impact ranges for the developing event in the same posture as the headline ENSO numbers: we surface what major institutions and peer-reviewed regional analyses are saying, with named sources, and surface disagreement rather than averaging it away.

Method. Regional probabilities are stated as the source states them (high, medium, ~70%, etc.). When multiple sources address the same region with materially different ranges, both are surfaced and the disagreement is noted. The probabilities are conditional on the headline strong-to-super case from section 1 materializing; we do not multiply them out, which would overstate confidence given that the headline probability is itself a 90 / 72 / 45 / 21 distribution across four buckets.

Sources. The institutional sources for the impact section include WMO, NOAA CPC seasonal outlooks, IMF WEO and the Cashin-Mohaddes-Raissi (2017) framework, Allianz Research, FAO Food Price Index and Crop Prospects, Swiss Re sigma, IIF Capital Flows, OCHA and SADC humanitarian reporting, World Weather Attribution, NASA SERVIR and the Brazilian Geologic Service, IMD MMCFS, the International Coral Reef Initiative and NOAA Coral Reef Watch, the IEA Oil Market Report, IFPRI and IFA fertilizer-trade statistics, Lloyd's Market Association Joint War Committee, and sell-side framing notes (Goldman, JPMorgan) cited for analytical framing rather than for asset-price targets. Cadence ranges from continuous (heat stress, river gauges) to annual (IFA statistics). The impact section is currently re-curated by hand each issue (edit impacts.md at the project root); there are no impact fetchers in v1.3.

What the impact section does not do

The credibility of this brief depends on holding a strict aggregator line. Specifically, the impact section does not:

The reason for these exclusions is reader-specific. The brief is read by climate scientists and policy-elite audiences who lose trust the moment the prose resembles a sell-side trade note or a political column.

Known limitations

  1. Spring predictability barrier. Forecasts issued in April and May for the following peak season carry materially wider error bars than what we'll see by July or August. All current numbers are preliminary in a way that won't be true later in the year.
  2. CWWA in place of WWE event counting. The brief originally reported a discrete count of westerly wind events using a simplified area-mean criterion (≥5 m/s sustained ≥5 days). In v1.2 we replaced that with a continuous Cumulative Westerly Wind Anomaly index (CWWA) over 5N-5S, 130E-150W (the standard equatorial WWE source domain): the running sum of positive daily 850 hPa zonal wind anomalies vs the 1991-2020 same-calendar-day climatology, from March 1 of the develop year. The chart panel below the ONI analog plot overlays the 2026 CWWA against 1997, 2015, 2023, and 2025 reference curves at the same calendar offset. Limitation: the cumulative integral does not preserve which dates carried most of the forcing, only the running total. A discrete spatial-peak event count (Gemini's full method per WWE follow-up) is on the V2 list.

Latitude-band sensitivity check (2026-05-09). An external reviewer flagged that some of the most anomalous westerlies sit slightly off-equator and asked whether the 5N-5S band understates forcing. We computed CWWA at 10N-10S for the same domain for 1997, 2015, 2023, 2025, and 2026 develop years (one-off script at scripts/cwwa_sensitivity.py; pulls cached at .fetch_cache/era5_cwwa_*_10NS*).

Result: the wider band reduces the super-event CWWA values, not raises them. At May 1 of the develop year: 1997 narrow 306 → wide 237 (−22%); 2015 narrow 248 → wide 177 (−29%); 2023 narrow 42 → wide 49 (+16% but small absolute); 2026 narrow 137 → wide 121 (−12%). The full-season totals (end-of-August) show the same pattern. Physical interpretation: westerly wind bursts during super-event development are equatorially confined within roughly ±3-5° latitude; subequatorial bands (5-10°N and 5-10°S) are typically neutral-to- easterly during ENSO development, so averaging the wider band dilutes the dynamically relevant signal rather than capturing more of it. The narrow band correctly isolates the Kelvin-wave forcing region.

Conclusion: keep production at 5N-5S. McPhaden 1999 alignment plus the Kelvin-wave physics both argue for the narrow band. No methodology version change.

Noteworthy side finding: the wider band changes 2026's "closest analog" assessment. 2015's forcing was tightly equatorially confined (loses 29% when widening), while 2026's is more latitudinally diffuse (loses only 12%). The narrow-band ranking puts 2026 closest to 2023; the wide-band ranking puts it closest to 2015. This is not a band-choice problem but a real observation: 2026's wind forcing has different latitudinal structure than the 2015 super event, broader but weaker at the equator. Worth watching whether this convergence holds as the season develops.

Follow-up from Daniel Swain (2026-05-10), reframing the limitation more substantively. The cumulative-area-mean framing systematically understates the operationally important signal: transient localized westerly wind bursts. A burst lasting 5 days at 7N (just outside the production band) generates a downwelling Kelvin wave that propagates east and surfaces 2 months later, doing physical work that 50 days of weak equatorial westerlies of the same area-integrated magnitude would not. CWWA scores them similarly because cumulative integrals are blind to burst structure.

This is a real metric-design limitation, not just a band-choice issue. The right diagnostic for "did a Kelvin-wave-generating burst occur in the basin" is spatial-peak event detection (moving 5x10 deg sub-region area-mean exceeding a threshold with persistence and continuity), not a fixed-domain cumulative integral. That method (per Gemini's earlier WWE follow-up, independently re-motivated by Swain's input) is being added alongside CWWA as a parallel indicator (added in v1.6, algorithm refined to peak-detection with 10-day recovery interval in v1.7); CWWA is being retained because the cumulative scalar is more comparable across analog years than a discrete event count and because the metrics capture complementary aspects of forcing.

For 2026 specifically, Swain's operational read: the surfacing Kelvin wave (visible in heat content jumping +1.36 to +2.24 C in one issue, exceeding 1997 and 2015 at this calendar week, and record-breaking warm water volume in some datasets) is a more meaningful indicator of ENSO trajectory than the CWWA cumulative. The brief should weight ocean response signals (heat content, subsurface Kelvin wave evidence) at least as heavily as the wind metric. 3. Heat content climatology mismatch. CPC's published heat content series uses a 1981-2010 climatology, while the rest of the brief uses 1991-2020. Difference is small at current magnitudes (~0.1-0.3 °C) and the value enters the brief qualitatively, not in any headline probability. 4. No formal uncertainty propagation. The lo-hi range on the +2.5 °C bucket comes from a single discretization assumption (the open-ended top bin width). It does not reflect uncertainty in the RONI offset, in the bin-interior interpolation, or in the agency forecasts themselves. 5. Forecast-horizon offset stability. The offset is now fetched live each week, but applied unchanged to the target season several months ahead. Tropical-mean SST anomaly varies on inter-annual timescales that are smaller than seasonal Niño 3.4 swings, but a residual ±0.05 to ±0.10 °C uncertainty over an 8-month horizon translates into ±1 to ±2 percentage point shifts in the upper-tail headline buckets. 6. Distribution-family choice. The skew-normal is one defensible right-skewed family; generalized extreme value (GEV) and log-normal would also fit. The brief commits to skew-normal for tractability and predictability of fit; alternative families are not currently reported as a sensitivity range. 7. Single-model ensemble breadth. Our forecast cross-check pulls from one model (ECMWF SEAS5, 51 members) and reports the 5-95 percentile band as the visualization of forecast uncertainty. This captures initial-condition uncertainty (member-to-member differences in starting state) but not structural uncertainty (different ocean-atmosphere coupling, parameterizations, and bias characteristics across forecast centers). A multi-model pool (CFSv2, JMA, UKMO, NMME, etc., as on the Climate Brink dashboard) captures both, and at October 2026 the multi-model 5-95 spread is roughly three times wider than SEAS5's alone (a ~3°C span vs our 1°C). The single-model fan in the chart is therefore an over-confident representation of total forecast uncertainty. The central tendency is similar (within ~0.1°C of multi-model median in observational-anomaly terms); the spread understatement is the live methodological gap. 8. Forecast horizon limit. ECMWF SEAS5 system 51 publishes 6-7 leads operationally, so an April issue cannot reach the DJF 2026-27 target peak season directly. Each successive monthly run extends the visible window forward; full DJF coverage from SEAS5 alone arrives with the August 2026 brief. Public dashboards using multi-model pools (CFSv2 and JMA both have 9-month leads) reach January 2027 from an April issue. Adding a longer-lead second model (CFSv2 the cleanest candidate, accessed via the same CDS API with originating_centre=ncep, system=2) is queued for V2, primarily for the breadth argument in #7 above rather than the horizon argument alone.

Snapshot and diff machinery

Each issue freezes the input state to a JSON snapshot. The next issue loads the prior snapshot and computes:

The methodology version (METHODOLOGY_VERSION, currently 1.0) is bumped any time the conversion math, RONI offset, analog list, or bucket logic changes. When a snapshot loaded for diffing has a different methodology version, the brief flags this loudly so the reader knows headline numbers are not strictly week-over-week comparable across the change.

What a reviewer should focus on

If you are reviewing this for methodological soundness, the highest- value places to push back are:

  1. Forecast-horizon validity of the live offset. We use the most-recent observed tropical-mean SST anomaly as the offset for the target season several months ahead. Is that defensible, or should we project a seasonally-resolved tropical-mean trajectory?
  2. Skew-normal as the parametric family. Is this the right shape for CPC's nine-bin probability distribution at the upper tail? Generalized Pareto (for the tail alone) or GEV (full distribution) are alternatives we did not adopt. Should we report a sensitivity range across families instead of just the bootstrap?
  3. The bootstrap range. Sigma = 1 percentage point Gaussian noise on bin probabilities matches CPC's reporting precision but does not capture true forecast uncertainty (which is much larger). Should the brief surface a separate "forecast uncertainty" range, and how would we estimate it? See limitation #7 (single-model ensemble breadth) for the related concern on ECMWF.
  4. The model-vs-observational climatology choice for ECMWF. Is "model anomaly" the right framing for a "what's the chance peak ONI exceeds X" question, given the question is observational by construction? Or should we report both and let the reader choose?
  5. The bucket thresholds (+1.0/+1.5/+2.0/+2.5 °C). Are these the right cuts for the audience the brief is written for?
  6. The decision to surface CPC vs ECMWF disagreement rather than average them. Pros and cons of pooling vs surfacing?

What we are not asking for is predictive-skill review of any individual agency forecast (we did not produce any of them) nor calibration evidence (we are not trying to outperform the agency median).

Methodology change log

Implementation: new fetcher fetchers/era5_burst.py. For each day in Mar-onward, slide a 5deg lat x 10deg lon window over the search domain 10N-10S, 130E-150W (wider than CWWA's 5N-5S, per Swain's point that productive bursts sit just outside the equator). The per-day "spatial peak" is the maximum area-mean u_850 anomaly across all window positions. Event detection then applied a dual threshold (5 m/s sustained > 5 days, with at least one day > 7 m/s within the event), following McPhaden 1999 + Gemini's earlier follow-up.

CWWA is retained as a complementary indicator: it remains the smoother, more comparable-across-years cumulative scalar, useful for season-on-season comparisons. WWB event count is the burst-structure indicator, useful for "did a Kelvin-wave-generating burst occur in the basin." The brief shows both in the physical- state section.

Known limitation at v1.6 issuance: run-based detection merged multiple sub-bursts inside a sustained westerly period into a single long event (1997's first event ran 71 days; 2015's 104 days). Addressed in v1.7.

Rationale: v1.6's count metric was systematically biased toward episodic-burst years (2023, 2025) over sustained-burst years (1997, 2015) in a way that flipped the analog ordering relative to what the peak-amplitude and peak-date data actually say. v1.7 restores the expected ordering and brings the count metric into line with the literature's burst-counting conventions (McPhaden 1999; Lengaigne et al. 2003).


Methodology version 1.7. RONI offset fetched live each week from CPC. ECMWF anomaly subtracts SEAS5 model climatology (1993-2016 hindcasts). WWE forcing tracked via CWWA (5N-5S, 130E-150W cumulative) AND spatial-peak WWB detection (10N-10S, 130E-150W, McPhaden-inspired dual threshold, peak-detection with 10-day recovery interval). Impact section renders as institutional aggregation only (no editorial synthesis). Headline buckets are CPC-anchored and deflected weekly by SEAS5 ensemble evidence within bounded limits.