Why Start with Compute Governance, Not API Design
Compute and governance boundaries are the true foundation of AI-native infrastructure architecture.
The previous chapter presented a âThree Planes + One Closed Loopâ reference architecture. This chapter focuses on a core CTO/CEO-level question:
How should AI-native infrastructure be layered? What belongs in the âcontrol planeâ of APIs/Agents, what belongs in the âexecution planeâ of runtime, and what must be pushed down to the âgovernance plane (compute and economic constraints)â?
This question is critical because over the past year, many platform companies âpivoting to AIâ have fallen into a common trap: treating AI as an API morphology change rather than a system constraint change. When your system shifts from âserving requestsâ to âmodel behaviorâ (multi-step Agent actions with side effects), what truly determines system boundaries is often not the elegance of API design, but rather: whether compute, context, and economic constraints are institutionalized as enforceable governance boundaries.
The core argument of this chapter can be summarized as:
AI-native infrastructure must be designed starting from âConsequenceâ rather than stacking capabilities from âIntentâ; the control plane is responsible for expressing intent, but the governance plane is responsible for bounding consequences.
The Purpose of Layering: Engineering the Binding Between âIntentâ and âResource Consequencesâ
In AI-native infrastructure, mechanisms like MCP, Agents, and Tool Calling enhance system capabilities while also introducing higher risks. These risks are not abstract âuncontrollability,â but rather engineering âunbudgetable consequencesâ:
- Path explosion in behavior, long contexts, and multi-round reasoning bring long-tail resource consumption;
- The same âintentâ can lead to orders-of-magnitude differences in tokens, GPU time, and network/storage pressure;
- Without governance closed loops, systems will move toward âcost and risk runawayâ while becoming âmore capable.â
Therefore, the fundamental purpose of layering is not abstract aesthetics, but achieving a hard constraint goal:
Ensure each layer can translate upper-layer âintentâ into executable plans and produce measurable, attributable, and constrainable resource consequences.
In other words, layering is not about making architecture diagrams clearer, but about encoding âwho expresses intent, who executes, and who bears consequencesâ into system structure.
AI-Native Infrastructure Five-Layer Structure and âThree Planesâ Mapping
To help understand the layering logic, the diagram below refines the âThree Planesâ architecture from the previous chapter, proposing a more actionable âfive-layer structureâ:
- Top two layers = Intent Plane
- Middle two layers = Execution Plane
- Bottom layer = Governance Plane
Below is a detailed expansion of the five-layer architecture, showing the primary responsibilities and typical capabilities of each layer:
It is important to note that MCP belongs to Layer 4 (Intent and Orchestration Layer), not Layer 1. The reason is that MCP primarily defines âhow capabilities are exposed to models/Agents and how they are invoked,â addressing control plane consistency and composability, but does not directly take responsibility for âhow the resource consequences of capability invocations are metered, constrained, and attributed.â
MCP/Agent is the âNew Control Plane,â But Must Be Constrained by the Governance Layer
MCP/Agent is called the ânew control planeâ because it moves system âdecisionsâ from static code to dynamic processes:
- âTool catalogs + schemas + invocationsâ form a composable capability surface;
- Agents complete tasks by selecting tools, invoking tools, and iterating reasoning;
- âPolicyâ is no longer just in code branches but expressed as routing, priorities, budgets, and compliance intent.
However, it is crucial to emphasize an infrastructure stance, which is also the foundation of this chapter:
MCP/Agent can express intent, but the key to AI-native is: intent must be translated into governable execution plans and metered and constrained within economically viable boundaries.
This statement aims to correct two common misconceptions:
- Control plane is not the starting point: Treating MCP/Agent as âthe entry point for AI platform upgradesâ easily leads systems down a âcapability-firstâ path;
- Governance plane is the baseline: When compute and tokens become capacity units, any unconstrained âintent expressionâ will leak as cost, latency, or risk.
Therefore, system layering should be clear: Layer 4 is responsible for âexpression,â Layers 1/2/3 are responsible for âfulfillment and bearing consequences,â and the governance loop is responsible for âcorrection.â
âContextâ Is Rising to a New Infrastructure Layer
In traditional cloud-native systems, request states are mostly short-lived, relying more on application-layer state management. Infrastructure typically only handles âcomputation and networkingâ without needing to understand the economic value of ârequest context.â
AI-native infrastructure is different. Long-context, multi-turn dialogue, and multi-agent reasoning mean inference state often survives across requests and directly determines throughput and cost. In particular, KV cache and context reuse are evolving from âperformance optimization techniquesâ to âplatform capacity structures.â
This can be summarized as an infrastructure law:
When a state asset (context/state) becomes a determinant variable of system cost and throughput, it rises from application detail to infrastructure layer.
This trend is gradually appearing in the industry: inference context and KV reuse are explicitly elevated to âinfrastructure layerâ capability development directions. Future expansion will include distributed KV, parameter caching, inference routing state, Agent memory, and a series of âstate assets.â
The Foundation of AI-Native Infrastructure: Reference Designs and Delivery Systems
AI-native infrastructure is far more than âbuying a few GPUs.â Compared to traditional internet services, AI workloads have three characteristics that make the âfoundationâ more engineered and productized:
- Stronger topology dependencies: Network fabric, interconnects, storage tiers, and GPU affinity determine available throughput;
- Harder scarcity constraints: GPU and token throughput boundaries are less âelasticâ than CPU/memory;
- Higher delivery complexity: Multi-cluster, multi-tenant, multi-model/multi-framework coexistence means only âreplicable deliveryâ can scale.
Therefore, AI Infra is not just a component list, but must include âscalable delivery and repeatable operationâ system capabilities:
Reference Designs (validated designs)
- Codify âcorrect topology and ratiosâ into reusable solutions.
Automated Delivery
- Institutionalize deployment, upgrade, scaling, rollback, and capacity planning.
Governance Implementation
- Make budgeting, isolation, metering, and auditing default capabilities rather than after-the-fact patches.
From a CTO/CEO perspective, this means: what you purchase is not âhardwareâ but a âdelivery system for predictable capacity.â
âLayered Responsibility Boundariesâ from a CTO/CEO Perspective
To facilitate internal alignment on âwho is responsible for what and what is the cost of failure,â the table below maps âtechnical layersâ to âorganizational responsibilities,â avoiding the scenario where platform teams only build control planes while no one bears consequence boundaries.
| Layer | Typical Capabilities | Primary Owner (Recommended) | Cost of Failure |
|---|---|---|---|
| Layer 5 Business Interface | SLA, product experience, business goals | Product / Business | Customer experience and revenue impact |
| Layer 4 Intent/Orchestration (MCP/Agent) | Capability catalogs, workflow, policy expression | App / Platform / AI Eng | Behavior runaway, tool abuse |
| Layer 3 Execution (Runtime) | Serving, batching, routing, caching policies | AI Platform / Infra | Insufficient throughput, latency jitter |
| Layer 2 Context/State | KV/cache/context tier | Infra + AI Platform | Token cost spike, throughput collapse |
| Layer 1 Compute/Governance | Quotas, isolation, topology scheduling, metering | Infra / FinOps / SRE | Budget explosion, resource contention, incident spillover |
As you can see, the organizational challenge of AI-native is not in âwhether we have agents,â but in âwhether inter-layer closed loops are establishedâ. When model-driven amplification of consequences occurs, organizations must institutionalize governance mechanisms as platform capabilities: executable budgets, explainable consequences, attributable anomalies, and rewritable policies. This is the true meaning of âstarting from compute governanceâ rather than âstarting from API design.â
Conclusion
The layered design of AI-native infrastructure centers on engineering the binding between âintentâ and âresource consequences.â The control plane is responsible for expressing intent, while the governance plane is responsible for bounding consequences. Only by institutionalizing governance mechanisms as platform capabilities can we ensure cost, risk, and capacity remain controllable while enhancing capabilities. As context, state assets, and other new variables become infrastructure, AI Infra delivery systems will continue to evolve, becoming the foundation for sustainable enterprise innovation.
References
- Google SRE - Capacity Planning - sre.google
- AWS Well-Architected - Cost Optimization - aws.amazon.com
- Microsoft FinOps for AI - learn.microsoft.com
Submit Corrections/Suggestions