Migration Roadmap: From Cloud Native to AI Native
Migration is not ârebuilding the platform,â but using governance loops and organizational contracts to transform uncertainty into controllable engineering capabilities.
The previous five chapters have established: AI-native infrastructure is Uncertainty-by-Default. Therefore, the architectural starting point must be compute governance loops, not âconnect a model and call migration complete.â Otherwise, systems easily spiral out of control in three dimensions: cost (runaway cost), risk (unauthorized actions/side effects), and tail performance (P95/P99 and queue tail behavior).
This explains why the FinOps Foundation emphasizes: running AI/ML on Kubernetes, âelasticityâ easily evolves into uncontrollable cost overflow. FinOps must be incorporated into architecture and organization upfront as a shared operating model, not as an after-the-fact reconciliation exercise.
This article presents an actionable migration roadmap, covering both technical evolution paths and organizational implementation approaches. You donât need to ârebuild an AI platformâ all at once, but you must establish working governance loops at each stage: budget/admission, metering/attribution, sharing/isolation, topology/networking, and context assetization.
The North Star: From Platform Delivery to Governance Loops
The diagram below shows the migration path from bypass pilot to AI-first refactoring.
Cloud-native migration typically centers on âcapability deliveryâ: CI/CD, self-service platforms, service governance, and auto-scaling. Its default assumptions: systems are deterministic, costs grow linearly with requests, and scaling doesnât significantly alter system boundaries.
AI-native migration must center on âgovernance loopsâ, focusing on cost, risk, tail performance, and state assets. Its default assumptions are precisely the opposite: systems are inherently uncertain, and the âactions and consequencesâ of inference/agents drive costs and risks into nonlinear territory.
Elevating âLanding Zoneâ to North Star level here isnât chasing trendsâitâs because it naturally serves an organizational-level task: delineating responsibility boundaries between platform teams and workload teams. Major cloud providers universally use Landing Zones to host âshared governance baselinesâ (networking, identity, policies, auditing, quota/budget), while business teams iteratively build applications within controlled boundaries. For AI, this boundary is the carrier of the governance loop.
Migration Prerequisites: Build Three Foundations First, Then Scale Applications
You can run PoCs and build applications in parallel, but if these three foundations are missing, any âapplication explosionâ can easily transform into platform firefighting and financial disputes.
Foundation A: FinOps / Quotas as Control Plane (Finance and Quotas as Control Plane)
The first migration step is not âlaunch the first agent,â but incorporating budgets, alerts, showback/chargeback, and quotas into the infrastructure control plane:
- Budgets and alerts are not just financial reports, but triggers for runtime policies (rate limiting, degradation, queuing, preemption).
- showback/chargeback is not just accounting, but binding âcost consequencesâ to organizational decisions and product boundaries.
- Quotas are not static limits, but evolvable governance instruments (dynamic budgets and priorities by tenant/team/use-case).
Foundation B: Resource Governance (GPU Sharing/Isolation and Orchestration Capabilities)
The âelasticityâ of AI-native infrastructure is constrained by how scarce compute is governed. Treating GPUs as ordinary resources typically results in low utilization and uncontrolled contention. Therefore, you need viable combinations of sharing/isolation and orchestration capabilities:
- Sharing/partitioning: MIG/MPS/vGPU paths transform âexclusiveâ into âpooled.â
- Scheduling upgrades: Introduce explicit modeling of topology, queues, fairness, preemption, and cost tiers.
- Orchestration loop: Solidify isolation, preemption, and priority policies into executable rules.
The key is not which partitioning technology you choose, but whether you can elevate GPUs from âmachine assetsâ to first-class governance resources and incorporate them into budget and admission systems.
Foundation C: Fabric as a First-Class Constraint (Network/Interconnect as First-Class Constraint)
Training and high-throughput inference are extremely sensitive to congestion, packet loss, and tail latency. Ignoring networking and topology leads to âseemingly sporadic but actually structuralâ problems:
- Training JCT is amplified by tail behavior, invalidating capacity planning;
- Inference P99 and queue tails are amplified, making SLOs difficult to honor.
Therefore, you need to build reusable AI-ready network baselines: capacity assumptions, lossless strategies, isolation domain ĺĺďźmeasurement and acceptance criteria. Networking is not âoptimize later,â but baseline engineering that must land in Days 31â60.
Migration Path Selection: Layered by Organizational Risk and Technical Debt
Migration isnât âpick one path and see it through,â but mapping organizations with different risk appetites and debt structures to different starting approaches and exit criteria. Paths can advance in parallel, but each needs defined applicable conditions and exit criteria.
Path 1: Bypass Pilot / Skunkworks
Applicable when cloud-native platforms are running stably, but AI demand is just emerging, organizational uncertainty is high, and governance mechanisms are not yet mature.
The approach is establishing an âAI minimum closed-loop sandboxâ alongside the existing platform. The goal is not âfeature completeness,â but âmaking the loop workâ:
- Independent GPU pool (or at least independent queue) + basic admission and budget
- Minimal token/GPU metering and attribution
- Controlled inference/agent entry points (max context / max steps / max tool calls)
- âFailure-acceptableâ SLOs and cost caps (define boundaries first, then discuss experience)
Exit criteria:
- Cost curve is explainable (at minimum attributable to team/use-case)
- GPU utilization and isolation strategies form reusable templates
- Pilot capabilities can be ä¸ć˛ as platform capabilities (enter Path 2)
Path 2: Domain-Isolated Platform
Applicable when AI has entered multi-team, multi-tenant stages, requiring âpilot assetsâ to be solidified into platform capabilities to prevent cost and risk from spreading across domains.
The approach is building an AI Landing Zone, where the platform team centrally manages shared governance capabilities, and workload teams iteratively build applications within controlled boundaries.
Platform-side essential modules (recommend organizing by âgovernance loopâ):
- Identity/Policy: Unified identity, policy distribution, and auditing (policy-as-intent)
- Network/Fabric baseline: AI-ready network baseline and automated acceptance
- Compute governance: Quotas, budgets, preemption, fairness, isolation/sharing
- Observability & Chargeback: End-to-end metering, alerts, showback/chargeback
- Runtime catalog: âGolden pathsâ and templated delivery for inference/training runtimes
Exit criteria: Platform provides âreplicable AI workload landing approachesâ and can scale use case count under budget constraints, rather than relying on manual firefighting to maintain stability.
Path 3: AI-First Refactor (AI Factory / Replatform)
Applicable when AI is core business, requiring infrastructure to be treated as a âproduction lineâ rather than a âcluster,â and optimization objectives to switch from âshipping featuresâ to âthroughput/unit cost/energy efficiency.â
The approach centers on âstate assets + unit costâ refactoring:
- Context/state of inference/agents is explicitly governed and reused (no longer application-level tricks)
- Introduce Context Tier architectural assumptions: long context and agentic inference require inference state / KV cache to be reusable across nodes and sessions
- Drive platform evolution with âunit token cost, tail latency, throughput/energy efficiency,â not ânumber of new componentsâ
Exit criteria: Can consistently make engineering decisions using âunit cost and tail performance,â and treat context reuse as a platform capability rather than application team trick caching.
90-Day Actionable Plan: AI Landing Zone + Minimum Governance Loop
The goal is to establish âAI Landing Zone + minimum governance loopâ within 90 days, forming a replicable template. The key is not covering all scenarios, but connecting the admissionâmeteringâenforcementâfeedback loop.
Day 0â30: Establish the Ledger (Cost & Usage Ledger)
First, define attribution dimensions, establish budgets/alerts and baseline reports, and implement quotas/usage controls.
- Attribution dimensions: tenant/team/project/model/use-case/tool
- Establish budgets and alerts, baseline reports (cost + business value metrics)
- Implement quotas and usage controls (at minimum covering GPU quotas and key service quotas)
Deliverables:
- Cost and usage dashboard (weekly-level, traceable)
- âAdmission Policy v0â (max context / max steps / max budget)
Day 31â60: Establish Resource Governance (GPU Governance + Scheduling)
This phase requires evaluating GPU sharing/isolation strategies, introducing topology/networking constraints, and forming two golden paths for inference and training.
- GPU sharing/isolation strategy: MIG/MPS/vGPU/DRA path evaluation and PoC (executable strategy as acceptance criteria)
- Introduce topology/networking constraints, form AI-ready network baseline and capacity assumptions (including acceptance criteria)
- Form two templated delivery paths for inference/training
Deliverables:
- Workload templates (1 each for inference and training)
- Scheduling and isolation strategies (whitelisted, auditable)
Day 61â90: Establish the Loop (Enforcement + Feedback)
The final phase requires executing budget policies, migrating pilot use cases to the landing zone, and solidifying organizational interfaces.
- Execute budgets: rate limiting/queuing/preemption/degradation strategies, linked to SLOs
- Migrate pilot use cases to landing zone (or service landing zone capabilities)
- Solidify âorganizational interfaceâ: platform team vs workload team responsibility boundaries (forming executable contracts)
Deliverables:
- âAI Platform Runbook v1â (including oncall, changes, cost auditing)
- Two replicable use case landing paths (new use cases ⤠30 minutes to golden path)
Operating Model: The âContractâ Between Platform Teams and Workload Teams
Migration success depends on establishing clear, executable âorganizational contracts.â The contract essence: who is responsible for âcapability provision,â who is responsible for âbehavioral consequences.â
Platform teams provide (must be stable)
Landing zone, network baseline, identity and policies, budget/quota systems, metering/attribution, GPU governance capabilities, runtime golden paths
Workload teams own (must be self-service)
Model selection, prompt/agent logic, tool integration, SLO definition, business value measurement, use case risk classification and rollback paths
This is also why the FinOps Framework emphasizes operating model (personas, capabilities, maturity) rather than just tools: without âcontracts,â budgets are difficult to execute; if budgets cannot execute, loops cannot form.
Migration Anti-Patterns
Below are common migration anti-patterns and their consequences:
| Anti-Pattern | Typical Consequences |
|---|---|
| Build only API/Agent platform, without ledger and budget | runaway cost (most common, and difficult to remediate afterwards) |
| Treat GPUs as ordinary resources, without sharing/isolation and scheduling upgrades | Low utilization + uncontrolled contention, platform forced to allocate compute via âadministrative meansâ |
| Ignore networking and topology | Tail latency and training JCT amplified, capacity planning fails, SLOs difficult to honor |
| Context not assetized (only âtricky cachingâ within applications) | Unit cost out of control in long context/agentic era, reuse capabilities difficult to solidify as platform capabilities |
Summary
The core of AI-native migration is not a âmigration checklist,â but under uncertainty premises, incorporating cost, risk, and tail performance into a unified governance loop, using Landing Zone to carry organizational contracts, and using Context Tier to implement state reuse infrastructure capabilities. Only in this way can platform and business maintain controllability and efficiency during scaled evolution.
References
- McKinsey on AI Strategy - mckinsey.com
- Thoughtworks Technology Radar - thoughtworks.com
- Google Cloud Adoption Framework - cloud.google.com
Submit Corrections/Suggestions