What Battery Arbitrage Teaches Us About Multi-Agent Architecture

There’s a moment in battery energy storage operations when the system has to decide: charge now at €43/MWh and bet that the evening peak will exceed €120, or hold capacity for the frequency containment reserve auction where yesterday’s clearing price was €18.4/MW but volatility is spiking?

This is not a trivial optimisation problem. It’s a multi-objective decision under uncertainty with time-dependent constraints, imperfect forecasts, and opportunity costs that only become visible in retrospect.

It is, in other words, exactly the same problem that agentic AI systems face when orchestrating multiple models across competing tasks.

The Dispatch Problem

In battery storage aggregation — the kind we run at Sidechain Power — the core challenge isn’t the hardware. Lithium iron phosphate cells are commoditised. Inverters are reliable. Grid connections are well-understood. The hard part is the dispatch algorithm: when to charge, when to discharge, when to reserve capacity, and when to sit idle.

The inputs are noisy: weather forecasts that shift hourly, day-ahead prices that gap on political news, frequency containment reserve auctions where competitor behaviour is opaque. The outputs are irreversible: once you’ve committed capacity to FCR-D, you can’t arbitrage that same megawatt-hour on the spot market.

Sound familiar?

The Agent Orchestration Problem

In multi-agent AI systems, the orchestrator faces the same structural challenge. You have a pool of resources (model tokens, API calls, compute budget). You have competing objectives (answer quality, latency, cost). You have imperfect information (you don’t know which model will produce the best output until you’ve spent the tokens).

The Actor-Critic pattern in agent orchestration is structurally identical to what battery operators call “rolling intrinsic” — a dispatch strategy that continuously re-evaluates position based on updated information:

Actor proposes an action (charge/discharge or model/prompt selection)
Critic evaluates the expected value against alternatives
System executes if the expected value exceeds a dynamic threshold
Feedback updates the priors for the next decision cycle

What Transfers

Three specific lessons from battery dispatch that directly improve agent architecture:

Opportunity cost awareness. Every token spent on one task is unavailable for another. Battery operators learn this viscerally — every MWh committed to arbitrage is capacity not earning ancillary service revenue. In agent systems, this maps to explicit budget allocation across subtasks with real-time rebalancing.

Forecast degradation curves. Weather forecasts are most accurate 4–6 hours ahead and degrade rapidly beyond 48 hours. Battery dispatch strategies weight recent forecasts exponentially higher than distant ones. The same principle applies to agent planning: long-horizon plans should be held loosely, short-horizon actions committed to firmly.

The value of optionality. Sometimes the optimal battery action is to do nothing — to preserve the option to act later when information improves. In agent systems, this translates to lazy evaluation: don’t commit to a complex multi-step plan when a simpler approach might resolve the query, preserving budget for cases that genuinely need it.

The Convergence

This isn’t an analogy. It’s the same mathematics — stochastic optimal control, partially observable Markov decision processes, multi-armed bandit theory. The battery dispatch literature and the agent orchestration literature cite the same foundational papers without knowing it.

The practitioners who will build the best systems in either domain are the ones who read across both.