AI and machine learning in upstream oil & gas

● Data Science & ML · June 10, 2026 · 19 min read

The upstream industry has always run on inference under uncertainty — estimating what lies kilometres underground from sparse, expensive, indirect measurements. That is precisely the kind of problem machine learning is built for. But the subsurface is also where naive machine learning fails most spectacularly: tiny datasets, brutal extrapolation, and physics that punishes a model for ignoring it. This is a systematic tour of where AI and ML genuinely add value across the upstream value chain, how the methods actually work, where they break, and why the field is converging on physics-informed approaches rather than pure black boxes.

Why upstream is both ideal and hostile for ML

Upstream generates enormous volumes of data: seismic surveys measured in terabytes, continuous sensor streams from drilling rigs and producing wells, decades of production history, and millions of feet of well logs. The promise is obvious — patterns too subtle or too high-dimensional for a human to see, surfaced automatically and at scale. Yet the same domain is unusually hostile to off-the-shelf machine learning. Labelled data is scarce and expensive (every label may cost a multi-million-dollar well). The systems are governed by hard physics — mass, momentum, and energy balances — that a correlation-only model will happily violate. And the cost of a confident wrong answer is measured in dry holes and abandoned facilities, not a mis-served advertisement. Everything that follows is shaped by that tension.

A practical precondition for any of this is a usable data foundation. The industry’s move toward standardized, vendor-neutral data platforms — the Open Subsurface Data Universe (OSDU) being the most prominent — exists because the single biggest blocker to upstream ML is not algorithms but siloed, inconsistent, poorly governed data. No model survives contact with a spreadsheet whose units nobody documented.

A map of the value chain

AI/ML is not one thing applied once; it is many techniques applied at distinct stages, each with different data, different stakes, and different maturity. The cleanest way to organize the field is by where in the asset lifecycle the model lives.

Figure 1AI/ML across the upstream lifecycle. Early stages are data-poor and interpretive, so physics and domain priors dominate; late stages are sensor-rich time-series problems where conventional ML and predictive maintenance deliver fast, measurable returns.

The spectrum from black box to physics

Before the applications, one organizing idea earns its place above all others: every upstream model sits somewhere on a spectrum between a pure data-driven model that knows only correlations and a pure physics simulator that encodes the governing equations. Pure ML is fast and flexible but extrapolates dangerously and ignores conservation laws. Pure simulation is faithful but slow and hungry for parameters you cannot measure. The fertile middle is physics-informed machine learning — models that learn from data while being constrained to respect the physics. This is the single most important trend in technical upstream AI, and it is where the field is heading.

Figure 2The defining trade-off. With abundant data and weak theory, lean left; with sparse data and strong governing equations — the usual subsurface case — lean right. Physics-informed ML deliberately occupies the middle, and it is why upstream AI has matured beyond pure pattern-matching.

Domain by domain

1 · Seismic and exploration

Seismic interpretation was among the first upstream domains transformed by deep learning, because a seismic volume is essentially a 3D image and convolutional neural networks (CNNs) excel at images. The landmark example is automated fault detection: where interpreters once hand-picked faults slice by slice, a 3D CNN trained on synthetic seismic now outputs a fault-probability volume in minutes (Wu et al.’s FaultSeg3D being the canonical reference). Related tasks include salt-body and channel segmentation, seismic facies classification, and noise attenuation. A deeper frontier is velocity inversion — recovering the subsurface velocity model from raw waveforms — where physics-informed networks encode the wave equation directly so the inversion respects wave physics rather than merely fitting amplitudes.

2 · Petrophysics and well logs

Logs are the highest-resolution direct window into the rock, and ML serves several roles. Log prediction reconstructs missing or degraded curves (synthesizing a sonic or a density log from the others). Lithofacies classification — assigning each depth a rock type from log responses — is the textbook supervised-learning task in petrophysics, popularized by an open machine-learning contest whose dataset is still a teaching staple. ML also automates formation-top picking and feeds directly into rock typing, where clustering and classification group the reservoir into flow units. The recurring caution: logs from one field rarely transfer to another without recalibration, because the same tool reading means different things in different rocks.

3 · Reservoir characterization and modeling

Full-physics reservoir simulation is accurate but expensive — a single run can take hours, and uncertainty studies need thousands. ML answers with surrogate (proxy) models: a fast statistical emulator trained on a limited set of full simulations that then predicts outcomes across the parameter space in milliseconds, enabling optimization and uncertainty quantification that would otherwise be intractable. ML also accelerates assisted history matching — tuning a model to reproduce observed production — and underpins data-driven reservoir modeling, where field behaviour is learned largely from data with physics as a guide rather than starting from a fully built geological model.

4 · Drilling

Drilling is a real-time, sensor-dense activity, which suits ML well. Models optimize rate of penetration (ROP) by recommending weight-on-bit and rotary speed; predict drilling dysfunctions such as stuck pipe, kicks, and washouts before they escalate; and support geosteering by interpreting logging-while-drilling data on the fly. Here the payoff is immediate and measurable — non-productive time avoided is money saved the same day — which is why drilling analytics has been one of the faster areas to reach production deployment.

5 · Production and surveillance

This is where data is richest and ML is most operationally embedded. Production forecasting extends classical decline-curve analysis with sequence models (LSTMs and, increasingly, graph networks that capture well-to-well interference). Virtual flow metering infers rates from pressure and temperature when physical meters are absent or unreliable. And predictive maintenance — most famously electric submersible pump (ESP) failure prediction — flags equipment degradation from sensor trends so an intervention can be planned before an unplanned, production-killing failure. Surveillance dashboards increasingly fold these models in so that the wells needing attention rise automatically to the top of the queue.

Domain	Representative task	Typical technique	Output
Seismic	Fault / salt detection	3D CNN (U-Net family)	Probability volume
Seismic	Velocity inversion	Physics-informed NN / FWI	Velocity model
Petrophysics	Facies / log prediction	Gradient boosting, CNN, RNN	Classified / synthesized curve
Reservoir	Surrogate modeling	Neural net / Gaussian process	Fast forecast emulator
Drilling	ROP / dysfunction	Tree ensembles, time-series NN	Recommendation / alarm
Production	Forecast / interference	LSTM, graph neural network	Rate & EUR forecast
Facilities	ESP / equipment failure	Anomaly detection, survival models	Time-to-failure / alert

The physics-informed turn

Why has the industry converged on physics-informed methods? Because a model that scores well on a random test split can still produce physically absurd results — negative saturations, mass that appears from nowhere, pressure responses that violate diffusion. The fix is to make the physics part of the training objective. A physics-informed neural network (PINN) minimizes a composite loss: a data term that fits observations plus a physics term that penalizes violations of the governing partial differential equation, evaluated by automatic differentiation of the network itself.

Physics-informed composite loss L = L_data + λ · L_physics

L_data = misfit to measurements · L_physics = residual of the governing PDE (mass / momentum / energy)

λ balances trusting the data against trusting the physics.

Figure 3How a physics-informed network learns. The network predicts a field; one loss term pulls it toward the measured data, a second penalizes any violation of the governing equation. Training balances the two, so the result fits observations and stays physically consistent — the key to extrapolating safely from sparse subsurface data.

The workflow nobody photographs

The glamour is in the model; the value is in the workflow around it. A deployable upstream ML system is a loop, not a one-off script: assemble and clean data, engineer features with domain meaning, split correctly, train, validate, deploy, and — critically — monitor for drift as the field changes, then retrain. Skip the loop and a model that dazzled in a notebook quietly rots in production as new wells, new operating conditions, and instrument changes pull the live data away from what it was trained on.

Figure 4The operational lifecycle. The two steps that separate a real system from a demo are domain-aware feature engineering and validation split by well — plus the feedback loop that detects drift and triggers retraining as the asset evolves.

Subsurface-specific pitfalls

Generic ML advice is necessary but not sufficient here; the subsurface adds failure modes of its own.

Data leakage through spatial correlation. Splitting train and test by random row lets samples from the same well sit on both sides, so the model memorizes the well rather than learning the physics — and scores beautifully until it meets a new well. Always split by well, field, or time.
Tiny labelled datasets. Tens of wells, not millions of rows of independent examples. This is why synthetic training data (for seismic CNNs) and physics constraints (for everything) matter so much.
Extrapolation, not interpolation. The questions that matter — undrilled locations, future pressures — lie outside the training range, exactly where pure ML is least trustworthy.
Non-stationarity. Reservoirs deplete and operating conditions change, so the relationship the model learned last year may not hold this year.
Overfitting classical methods. An ML decline curve that bends to honour every noisy point forecasts worse than a disciplined Arps fit. Flexibility is not free.
Interpretability and trust. A black-box recommendation that an engineer cannot interrogate will not (and should not) drive a multi-million-dollar decision.

Figure 5The most common upstream ML mistake. Random row splitting (left) scatters each well across train and test, so the model is graded partly on data it has effectively seen — inflating scores. Holding out entire wells (right) is the only honest test of whether a model generalizes to a new location.

In the subsurface, a model that respects physics and is tested on wells it has never seen beats a higher-scoring black box every time the answer actually matters.

The generative and agentic frontier

The newest wave extends beyond prediction into generation and autonomy. Generative models now synthesize plausible geological realizations and fill data gaps, supporting uncertainty studies that need many equiprobable scenarios. Foundation and large language models are being pointed at the mountain of unstructured upstream knowledge — well reports, end-of-well summaries, historical interpretations — to retrieve and synthesize what used to take an engineer days to dig out. And AI agents operating over unified data platforms can chain tasks: pull a well’s data, run a diagnostic, draft a summary, flag an anomaly. The promise is real, but so are the cautions that run through this whole article — physical consistency, honest validation, interpretability, and a human in the loop for any decision that spends real capital. The trajectory is clear: not AI replacing subsurface judgment, but AI compressing the distance between a question and a defensible, physics-consistent answer.

Closing

AI and ML are now woven through the upstream value chain — CNNs reading seismic, classifiers typing rock, surrogates accelerating simulation, sequence models forecasting production, and anomaly detectors saving pumps. What separates durable value from hype is discipline that the subsurface demands more than most domains: respect the physics, guard against leakage, test on unseen wells, keep the human in the loop, and never confuse a good test score with a good decision. Get those right, and machine learning becomes what it should be in upstream — not a replacement for reservoir engineering and geoscience, but a powerful amplifier of both.

References
LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep Learning. Nature, 521, 436–444.
Raissi, M., Perdikaris, P., Karniadakis, G. E. (2019). Physics-Informed Neural Networks. Journal of Computational Physics, 378, 686–707.
Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., Yang, L. (2021). Physics-Informed Machine Learning. Nature Reviews Physics, 3, 422–440.
Wu, X., Liang, L., Shi, Y., Fomel, S. (2019). FaultSeg3D: Using Synthetic Datasets to Train an End-to-End CNN for 3D Seismic Fault Segmentation. Geophysics, 84(3), IM35–IM45.
Bergen, K. J., Johnson, P. A., de Hoop, M. V., Beroza, G. C. (2019). Machine Learning for Data-Driven Discovery in Solid Earth Geoscience. Science, 363(6433).
Hall, B. (2016). Facies Classification Using Machine Learning. The Leading Edge, 35(10), 906–909.
Mohaghegh, S. D. (2017). Data-Driven Reservoir Modeling. Society of Petroleum Engineers.
Latrach, A., et al. (2024). A Critical Review of Physics-Informed Machine Learning Applications in Subsurface Energy Systems. (Preprint / review).
The Open Group. Open Subsurface Data Universe (OSDU) Data Platform.

Frequently asked questions

How is machine learning used in upstream oil and gas?

For tasks such as log prediction, facies classification, production forecasting, ESP failure prediction, and reservoir-property inference.

What are the main pitfalls of machine learning in the subsurface?

Data leakage, very small datasets, and extrapolation beyond training conditions — which can make a model look accurate in testing yet fail in practice.

What is physics-informed machine learning?

An approach that embeds physical laws or constraints into the model so predictions stay consistent with reservoir physics, improving reliability on limited data.

Does machine learning replace reservoir engineering?

No. It augments it; the most reliable workflows pair machine-learning pattern-finding with physical understanding and proper validation.