Why We Designed Our Systems to Be Human in the Loop

There is a version of the pitch that goes like this: build an AI system smart enough, train it long enough, and eventually you can remove the human from the loop entirely. Full autonomy. The system decides, the system acts, the human goes home.

We rejected this framing early. Not because we could not build autonomous systems, but because full autonomy is the wrong goal for the environments we work in. Container terminals are high-stakes, operationally dense, and shaped by human expertise. Once we built StowAI (our RL agent for vessel stowage planning, now deployed in production), we began developing StackAI (yard placement) and JobAI (job dispatching). We designed all of them to keep human operators in the loop. This is a deliberate engineering decision that has made our systems better.

Full autonomy is the wrong target

The argument for full autonomy usually rests on two assumptions: that human decision-making is a bottleneck, and that algorithms can eventually learn everything the human knows. Both assumptions are wrong in the context of container terminal operations.

Terminal planners and crane operators are not bottlenecks. They are integrators. A planner synthesises information from sources that no single system captures: the vessel officer's preference for a particular loading sequence, the fact that a yard block is partially flooded after last night's rain, the informal agreement with a trucking company about gate slot timing. This knowledge is real, operationally significant, and almost entirely absent from any dataset we could train on.

The second assumption, that the algorithm will eventually learn it all, ignores the nature of the knowledge itself. Some operational expertise is just unrecorded. It is contextual in a way that makes it resistant to generalisation. A planner's decision to hold a particular container in a suboptimal position because they know that a priority vessel is arriving tomorrow is not a pattern in the data. It is judgment informed by relationships, communication, and situational awareness that exists outside the system entirely.

Building toward full autonomy means building toward a system that discards this knowledge. We chose to build toward a system that uses it.

Operator knowledge as a system input

When we designed our architecture, we treated operator input not as an override mechanism but as a first-class input to the decision process. This is an important distinction.

An override mechanism implies the system made a decision and the human corrected it. That framing positions the operator as a fallback, a safety net for when the algorithm gets it wrong. It also creates an adversarial dynamic: the system proposes, the human disposes, and neither learns much from the interaction.

What we built instead is a system where operator knowledge feeds into the decision process before the agent acts. StowAI receives the loading list and vessel information, but the planner can set constraints and adjusts priorities before the agent proposes a stowage plan. A planner who knows that a particular quay crane will be down for maintenance in two hours can communicate that, and the agent adjusts its plan accordingly. A yard supervisor who sees congestion building in a particular block can signal the system to redistribute incoming containers, not by manually assigning each one, but by adjusting the parameters that shapes the agent's decisions.

Our system connects to the terminal's existing Terminal Operating System (TOS) through standardised API connections. The operators keep working with the tools they already know. The AI slots into their existing workflow rather than asking them to adopt a new one. Human-in-the-loop only works if the loop is natural, not bolted on.

This is human centered design, augemnted by AI. The operators have access to state information that our sensors and data feeds do not capture. Excluding that information to preserve the purity of an autonomous system would make the system worse, not better.

Override data as training signal

Even with good information channels, operators will sometimes disagree with the system's recommendations. In a fully autonomous system, these disagreements are failures to be minimised. In our architecture, they are data.

Every time a planner overrides a stowage recommendation or deviates from a suggested sequence, we capture the context: what StowAI recommended, what the operator chose instead, and what the operational conditions were at the time. This override data is one of the most valuable signals we have for improving the system.

Not every override represents superior judgment. Sometimes an operator overrides out of habit, defaulting to a familiar pattern when the agent's recommendation was actually better. But the aggregate signal is remarkably informative. Clusters of overrides in specific operational conditions reveal gaps in our digital twin. Consistent overrides by experienced planners in particular scenarios point to factors our reward function does not adequately weight. The pattern of when operators trust the system and when they do not is itself a map of where our model succeeds and where it falls short.

We feed this signal back into our system. Scenarios where operators consistently deviate become test cases for the next iteration of the policy. The system does not just tolerate human input. It gets better because of it.

Trust scales with transparency

One of the patterns we have observed is that operator trust does not scale with system accuracy alone. A system can be right most of the time, and operators will still not trust it if they cannot understand why it made a particular recommendation.

This is rational behaviour, not resistance to change. A terminal planner who follows an opaque recommendation and it leads to a vessel delay bears personal responsibility. A planner who understands the reasoning behind a recommendation can evaluate it against their own experience, accept it with confidence, or reject it with cause. Transparency converts the operator from a passive recipient of instructions into an active participant in the decision.

This is why we train our agents against explainable KPIs. The objectives StowAI optimises for are defined in terms the operations team already understands: crane moves, rehandle rates, weight distribution, schedule adherence. When the agent proposes a stowage plan, the planner can see which KPIs drove the recommendation and how alternatives compare on the metrics they care about. They do not need to understand the algorithm itself. They need to understand the trade-offs, expressed in their own language.

This transparency has a compounding effect. When operators can see why the system recommends what it does, they develop trust. They learn in which situations the agent tends to be right and in which situations their own judgment is more reliable. Over time, this produces a human-machine collaboration that is genuinely better than either operating alone. The operator focuses their attention on the decisions where their contextual knowledge matters most, and defers to the system on the decisions where computational analysis adds the most value.

How we actually deploy

The human-in-the-loop philosophy is not just about the running system. It shapes how we get there.

Our deployment follows a staged process. First, we build a digital twin of the terminal. Our RL agents train inside this closed-loop simulation, learning from millions of scenarios rather than from historical data. The agent then arrives at the terminal with broad exposure to situations that are both common in real life, and to situations that might never have occured in real life. The operator's role during ramp-up is to catch the gaps between what the digital twin simulated and what the terminal actually does.

After training, we validate the agent's policies against the terminal's real operational data. Then we deploy with guardrails. It runs on a pilot basis, typically on a subset of vessels, under active operator supervision. The agent proposes stowage plans and flags anomalies for human review. Clear intervention paths mean the planner can override at any point, and those overrides feed back into the next training cycle.

This staged process (digital twin, simulation training, validation, supervised pilot, guarded deployment) is the mechanism through which trust gets built. Each stage gives the operator evidence about what the system can and cannot do, in terms they can verify against their own experience.

As the system demonstrates reliability within its initial scope, operators expand the boundaries. More vessel types and fewer manual reviews of routine plans. The system does not "earn more autonomy" in the sense of removing the human. The human redirects their judgment to where it adds the most value.

Why this makes better systems

The pragmatic argument for human-in-the-loop design is simple: it produces better outcomes than either full autonomy or full manual control.

Human-in-the-loop does not mean the same thing for every agent. The level of human involvement should match the nature of the decision.

StowAI produces an end-to-end stowage plan. It evaluates thousands of permutations in seconds, tracking cascading effects across crane schedules and vessel stability. But the planner reviews the full plan before accepting it. Stowage decisions are high-stakes, tightly coupled, and hard to reverse once loading begins. The human's role here is to evaluate the complete proposal, check it against conditions the agent cannot observe, and approve or send it back for revision.

StackAI, which we are developing now, sits at the other end of the spectrum. It is designed to automate stacking decisions entirely during normal operations. The yard planner does not review every container placement. Instead, the system runs autonomously and alerts the planner when something needs attention: not enough reefer slots for the next vessel, a developing congestion pattern, an allocation that conflicts with a known constraint. The human intervenes on exceptions, not on routine. JobAI follows a similar model for job dispatching.

This is not a single design applied uniformly. It is a deliberate choice about where human judgment adds the most value. For stowage, the planner's review of the whole plan is worth the time. For stacking and dispatching, the volume of decisions is too high and the individual stakes too low for plan-by-plan review. The better use of the planner's attention is watching for the situations the agent flags, and catching the ones it does not.

The human brings awareness that no model we have built can replicate. They know which trucking company always shows up early. They know the vessel officer who will reject any plan that puts heavy containers in bay fourteen. They know that the yard crane in block C has been running slow all week and maintenance is not coming until Thursday. They bring judgment, accountability, and a willingness to adapt to situations that fall outside any training distribution.

Designing the system to use these capabilities at the right level of involvement is not a philosophical position. It is an engineering choice that produces better results.

Efficiency is the point

I want to be direct about something. These systems will make terminals significantly more efficient. That is the entire reason they exist. Fewer reshuffles, tighter crane schedules, better yard utilisation, faster vessel turnaround. The gains compound as the system's scope expands from a single berth to the full operation. This then frees up those planners to focus on other critical tasks in the terminal.

But here is what matters for how you build: the path to those efficiency gains runs through human-in-the-loop design, not around it. A fully autonomous system that operators do not trust does not get deployed. A system that gets deployed but ignores operator knowledge produces worse plans than one that incorporates it. The terminals that will capture the biggest efficiency gains are the ones where the technology and the operators work together well enough that the system's scope can keep expanding.

The question is whether you build systems that actually reach production and deliver on their potential, or systems that stall in pilot because nobody on the quay trusts them. Human-in-the-loop is how you get to the efficiency gains. It is not a concession that limits it.