Hydrocarbon Reserves in Probabilities

Every hydrocarbon reserves calculation begins with a simple question: how much is down there?

But the simplicity ends right there. The answer — it's never a single number.

For decades, the oil and gas industry relied on deterministic estimation. Take the best estimate for each geological parameter: area from seismic interpretation, net pay thickness from well logs, porosity from core calibration, water saturation from Archie's equation. Multiply everything together. Get a single OGIP or OOIP figure. Done. That number enters the report, lands in the slide deck to management, becomes the basis for investment decisions worth millions of dollars.

But geology is not deterministic. A reservoir is a heterogeneous natural system with inherent uncertainty at every parameter. Ignoring this uncertainty doesn't simplify the problem — it hides it.

This is why Monte Carlo Simulation has become the de facto standard for volumetric reserves estimation in the modern industry. Not because it produces more accurate numbers — it doesn't — but because it answers a more important question. Not "how much are the reserves?" but rather: "what is the probability that reserves exceed X, and what drives that uncertainty?"

The Fundamental Problem with Deterministic Estimation

Consider the classic gas volumetric equation:

OGIP = (43,560 × A × h × φ × (1 − Sw)) / Bgi

Every variable on the right-hand side carries uncertainty. Area might be estimated from seismic interpretation with ±20% accuracy. Net pay thickness varies well to well — one well finds 80 ft, the offset finds 40 ft. Porosity from log calibration can shift ±2 porosity units depending on shale volume cutoff. Water saturation is even more sensitive — Archie's equation has substantial tail uncertainty when saturation exponent and cementation factor aren't well constrained.

The deterministic approach collapses all of this uncertainty into a single number. The probabilistic approach preserves uncertainty from input to output.

Fig 1 — The same central estimate, but the probabilistic view carries information the deterministic view discards.

Why does this matter? Because in investment decision making, downside risk and upside potential are just as important as the central estimate. Banks and investors don't fund projects based on P50 alone — they need the probability curve to compute risk-adjusted NPV and expected value. SEC reporting in the United States and SPE PRMS internationally explicitly require 1P/2P/3P (Proved/Probable/Possible) reserves classification — categories that cannot be produced without a probabilistic framework.

Monte Carlo Simulation: The Core Concept

Monte Carlo Simulation — named after the casino in Monaco, borrowed by Stanislaw Ulam at Los Alamos in the 1940s — is a computational method that uses random sampling to estimate outcomes from systems with uncertain inputs.

The process is conceptually simple:

Define probability distributions for each input variable (A, h, φ, Sw, Bgi, RF)
Sample one value from each distribution randomly — this is one iteration
Compute OGIP using the volumetric equation with those sampled values
Repeat steps 2–3 anywhere from 10,000 to 100,000 times
Analyze the output distribution — histogram, CDF, summary statistics

The output isn't a single number; it's a full distribution of possible OGIP values. From this distribution we extract:

P10: the value exceeded with 10% probability (high/optimistic estimate)
P50: the median; exceeded with 50% probability
P90: the value exceeded with 90% probability (low/conservative estimate, roughly equivalent to Proved)
Mean: arithmetic average (typically larger than P50 for skewed distributions)
Swanson's Mean: approximation = 0.3 × P10 + 0.4 × P50 + 0.3 × P90

These definitions follow SPE PRMS — the industry standard consistent with regulatory reporting in most jurisdictions worldwide.

A note on convention: in the petroleum industry, P10 refers to probability of exceedance — the probability that the actual value is greater than P10. This differs from the general statistical convention where P10 = 10th percentile from below. Be careful when reading international papers or using general-purpose statistical tools.

Choosing the Right Distribution

Perhaps the most critical decision in Monte Carlo isn't iteration count or sampling algorithm — it's the choice of probability distribution for each input variable. The wrong choice can render results meaningless, or worse, misleading with high apparent precision.

Fig 2 — Four distributions, four use cases. Choose based on the physics, not the familiarity.

Normal Distribution (Gaussian) suits variables that emerge from many independent processes contributing in similar magnitudes (Central Limit Theorem). Typical example: porosity averaged across many log measurement points, or average Sw across well-characterized zones. Parameters: mean (μ) and standard deviation (σ). Caution applies for bounded variables like Sw or φ — an untruncated Normal can sample negative values or values greater than 1, which makes no physical sense.

Lognormal Distribution suits variables that are always positive with a long right tail. Net pay thickness often follows this pattern because it cannot be negative and a few wells encounter unusually thick zones. Permeability is almost always lognormal — multi-order-of-magnitude variation. Parameters can be entered as μ and σ in log-space, or alternatively as P10 and P90, which is more intuitive for geologists.

Triangular Distribution is the most popular choice among geologists because it only requires three numbers: minimum, mode (most likely), and maximum. No assumption about the underlying process — just specify range and center of mass. Excellent fit when data is limited but expert judgment is strong. Almost every geologist–engineer collaboration in the industry produces triangular distributions for at least Area and Recovery Factor.

Uniform Distribution is the most conservative choice. Only minimum and maximum, all values in between equally likely. Appropriate when there is no prior knowledge about distribution shape — for example, when screening prospects in a frontier basin.

Deterministic (single value) applies to parameters whose uncertainty is small relative to other variables. Bgi or Boi from well-calibrated PVT often falls in this category — relative uncertainty under 2%, while Area might run ±30%. Modeling PVT as random when Area has 15× greater uncertainty just adds noise without insight.

Distribution selection should be based on three considerations: (1) the physical nature of the variable, (2) the quantity and quality of analog data, and (3) expert judgment. Don't default to Normal just because it's familiar from undergraduate statistics — many reservoir variables aren't normally distributed, and wrong assumptions distort downstream results.

Multi-Zone Aggregation: The Mathematical Trap

Imagine a field with three productive zones: Upper Sand, Middle Sand, and Lower Sand. You compute probabilistic OGIP for each zone, then sum the P50s of all three to obtain total field P50.

That is wrong.

This is an extraordinarily common statistical error in reserves reports worldwide — so common it sometimes appears in JOA submissions that should have undergone rigorous review.

The explanation: P50 of Zone A is the median of Zone A's distribution. When we take one Monte Carlo iteration, the Zone A value might be P30 or P80, not exactly P50. For correct field total, we must sum the sampled values per iteration, not the P50s of each distribution.

The correct procedure:

For each iteration i (1 to N_iter):
    Zone_A_i = sample from Zone A distribution
    Zone_B_i = sample from Zone B distribution
    Zone_C_i = sample from Zone C distribution
    Total_i  = Zone_A_i + Zone_B_i + Zone_C_i

Total_P50 = median(Total_1, Total_2, ..., Total_N_iter)

The Total distribution will be tighter (smaller relative spread) than the sum of zonal ranges — this is the diversification effect. Analogous to portfolio diversification in finance: combining assets with independent uncertainty yields a total with smaller relative uncertainty. The standard deviation of the total equals √(σ_A² + σ_B² + σ_C²), not σ_A + σ_B + σ_C.

The practical consequence: teams using sum-of-P50s typically overstate total uncertainty range while simultaneously misstating the central estimate. Results can be misleadingly optimistic or pessimistic depending on distribution shape.

Inter-Zone Correlation: When Diversification Disappears

Diversification assumes independence between zones. But in geological reality, zones in one field are often correlated for several reasons:

Same depositional environment — porosity in Zone A and Zone B within one deltaic system is likely correlated. If one zone shows low φ due to compaction, others probably do too.
Same structural trap — if Zone A area is overestimated due to biased seismic interpretation, Zone B area is probably overestimated as well (systematic error).
Same PVT regime — Bgi or Boi shared between zones at similar depth and pressure.

Ignoring correlation produces a total distribution that is too tight (overconfident). The independence assumption hides systemic risk — all zones can miss together, not compensate each other.

The solution: implement Iman-Conover rank correlation or Cholesky decomposition. For non-normal distributions (Triangular, Lognormal), rank correlation is more appropriate because it operates in rank-space and is distribution-agnostic. Cholesky is valid only for multivariate normal cases.

MCVR will support inter-zone correlation via a global ρ slider (0 to 1) in V2, with custom correlation matrices for pairwise control in subsequent phases.

Tornado Analysis: Identifying the Primary Drivers

Once simulation completes, the natural next question is: which variable drives most of the output uncertainty? Because not all variables are equally important — and the resources for reducing uncertainty (additional seismic, additional logs, additional cores) are limited.

Fig 3 — Tornado chart from Spearman rank correlation between input samples and output samples. Acquisition budget belongs at the top.

The tornado chart answers this question visually. Each variable is ranked by its contribution to output variance. The variable with the longest bar is the primary driver — focus uncertainty reduction effort there.

Two main methods are used in industry:

One-at-a-time (OAT) sensitivity: vary each variable from P10 to P90 while holding others at P50. Measure the output range. The main weakness: it doesn't capture interactions between variables. If two variables actually interact (for example φ and Sw via Archie's equation), OAT misses that effect.

Spearman rank correlation: compute the rank correlation between each input variable's samples and the output samples from the same Monte Carlo run. More rigorous, captures interactions naturally, and is extracted directly from simulation results without requiring re-runs. Distribution-agnostic — works equally well for Triangular and Lognormal.

MCVR uses Spearman rank correlation. The output is a bar chart with variables sorted by absolute rank correlation. Sign (positive or negative) indicates the direction of influence.

Practical insight: if Area has rank correlation 0.78 and Porosity 0.62 while Sw is only 0.34, the priority for additional data acquisition is clear — infill seismic or angle stack inversion to lock down area first, then additional log calibration for porosity. Spending acquisition budget on Sw refinement while A and φ remain dominant is a common but easily avoidable resource misallocation.

P-Values and Swanson's Mean

A subtle but important point: in reserves estimation, the P-value convention refers to probability of exceedance, not cumulative probability.

P10: 10% probability that the actual value exceeds this (= 90% cumulative from the low side). Called "high estimate".
P50: median, 50% probability of being exceeded.
P90: 90% probability that the actual value exceeds this. Called "low estimate" or roughly Proved.

This is the opposite of the convention used in finance and general statistics, where P10 means the 10th percentile from below. Misreading the convention is a surprisingly frequent source of confusion.

Swanson's Mean is an approximation of the distribution mean:

Mean ≈ 0.3 × P10 + 0.4 × P50 + 0.3 × P90

This weighted average is surprisingly accurate for distributions common in reserves estimation (lognormal, beta, triangular). Hurst, Brown, and Swanson (2000) demonstrated accuracy within 5% for moderate skewness.

When to use Swanson's Mean versus the arithmetic mean from simulation? Swanson's serves as a back-of-envelope check — if it diverges substantially from the simulation mean, that's a warning sign about extreme distribution shape that warrants review.

An AI Agentic Layer for the Reservoir Engineer

Monte Carlo simulation produces a lot of output — distributions, summary statistics, tornado charts, sensitivity analyses. Correct interpretation requires experience and broad context. This is where agentic AI becomes a powerful tool — not to replace the engineer, but to accelerate iteration and democratize access to senior-level interpretation patterns.

MCVR includes an AI layer powered by Claude with several capabilities:

Passive Summary: Once simulation completes, the AI auto-generates a one-paragraph executive summary — interpreting the P50, naming the top two drivers from the tornado, and providing a sanity check against typical analog fields. No user trigger needed — it appears automatically below the results.

Interactive Q&A: Users can ask in natural language — "why doesn't porosity make the top 3 drivers even though its range is fairly wide?" or "compare these results with a typical shallow gas analog." The AI has access to the input table, sample data, and results through tool calls.

Distribution Fitting Suggestion: Paste analog data from offset wells or analog fields, and the AI suggests the best-fit distribution with goodness-of-fit metrics (Kolmogorov-Smirnov or Anderson-Darling). Useful when a geologist has a porosity database from many wells but isn't sure which distribution form is appropriate.

The distinction between agentic and plain chatbot: the AI can act — calling tools like get_samples(variable), fit_distribution(data), or recompute_with_overrides(changes). Not just answering from training knowledge, but computing on the user's actual data.

MCVR: Launch the Tool

MCVR (Monte Carlo Volumetric Reserves) is now live as a standalone tool at rfourenergy.com/mcvr. Following the MABAL pattern — single HTML file, 100% client-side, zero-knowledge architecture. Your data never leaves the browser. No server-side storage, no uploads, no login walls.

Feature roadmap:

V1 MVP (target launch within weeks): Gas + Oil mode, multi-zone with independent sampling, five distributions (Normal, Lognormal, Triangular, Uniform, Deterministic), histogram + CDF + summary statistics, Excel template I/O, AI passive summary.
V2: Inter-zone correlation (Iman-Conover rank correlation), tornado chart with Spearman rank, interactive AI chat with tool calls.
V3: Distribution fitting from pasted analog data, SI unit support, custom correlation matrix editor.
V4 polish: PDF report export, URL-encoded scenario sharing, truncated distributions for bounded variables.

The end goal: a tool that reservoir engineers and geologists worldwide can use from a laptop or tablet, without convoluted Excel macros, without Crystal Ball or @RISK licenses, without data leak risk. Open-knowledge tools for everyone serious about reserves estimation.

Closing: Probability is the Language of Modern Reserves

The oil and gas industry is maturing. The large, easy assets were found decades ago. What remains is marginal fields, unconventional plays, tight reservoirs, and mature assets with high uncertainty. In this context, the ability to quantify and communicate uncertainty is no longer nice-to-have — it's the core skill that separates modern engineers from those stuck in 1980s workflows.

Deterministic estimation offers an illusion of precision that doesn't actually exist. Probabilistic estimation is more honest — acknowledging the range of possibilities, giving stakeholders a basis for informed decision making, and enabling capital allocation based on risk-adjusted expected value rather than hopeful single-point estimates.

Monte Carlo simulation is a tool. Just as the P/Z plot is a tool for material balance, or Arps decline is a tool for DCA. The tool doesn't replace engineering judgment — it strengthens and externalizes that judgment within a transparent, auditable mathematical framework. When an engineer states "P10 reserves are 30.8 BSCF with Area as the top driver," that's a claim that can be reviewed, challenged, and improved with new data — something that is difficult to do with a single deterministic number.

For RFour Energy, MCVR is a natural addition to the existing tool suite: GWIS (gas well surveillance), MABAL (material balance), and soon GOWIS (gas + oil surveillance). The consistent goal across all products: practical, modern, accessible petroleum engineering tools — deployed to the browser, without compromising privacy, without a steep learning curve.

References
Cronquist, C. (2001). Estimation and Classification of Reserves of Crude Oil, Natural Gas, and Condensate. SPE.
SPE PRMS (2018). Petroleum Resources Management System.
Capen, E. (1976). The Difficulty of Assessing Uncertainty. JPT, August.
Murtha, J. A. (1997). Monte Carlo Simulation: Its Status and Future. JPT, April.
Iman, R. L. & Conover, W. J. (1982). A Distribution-Free Approach to Inducing Rank Correlation Among Input Variables. Communications in Statistics.
Hurst, A., Brown, G. C., Swanson, R. I. (2000). Swanson's 30-40-30 Rule. AAPG Bulletin.

Frequently asked questions

What is Monte Carlo volumetric estimation?

A probabilistic method that samples distributions of volumetric inputs — area, thickness, porosity, saturation, recovery factor — to produce a full reserves distribution.

What are P90, P50, and P10?

Percentiles of the reserves distribution: P90 is the conservative estimate, P50 the median, and P10 the optimistic estimate.

Why use probabilistic instead of deterministic volumetrics?

Because a single deterministic number hides uncertainty; a distribution communicates the range and likelihood of outcomes for better decisions.

What inputs drive volumetric uncertainty?

Reservoir area and thickness, net-to-gross, porosity, hydrocarbon saturation, formation volume factor, and recovery factor.