The Long Game: How Google and Apple are Fast Catching Up in the AI Wars

When AI becomes infrastructure, the winners are decided by cost, control, and coherence.

Mar 17, 2026

In 1999, at Texas A&M, I was building the university’s first web-based course registration system. The architecture was straightforward and utilitarian. Web servers handled the interface, XML messages carried transactions back to the mainframe, and two teams with very different cultures learned to work in lockstep. It succeeded because each layer did what it was good at, without overengineering the whole.

Six months before launch, the server team recommended buying an Amdahl mainframe-class server for the web layer. We tested one. It was massive, expensive, and clearly built for a world the web was already moving past. It represented a philosophy of abundance and brute force. My team favored small, inexpensive Compaq servers running Windows. We chose the Compaq servers because of budget constraints, not their raw horsepower. That choice worked.

Today, the AI industry equates progress with brute force, assuming larger GPUs are the answer to every problem. History suggests otherwise. The long-term winners in computing have rarely been the vendors with the most horsepower. They have been the ones who integrated tightly, drove marginal costs toward zero, and let technology recede into the background. In today’s Dispatch, I use that lens to examine why Apple and Google are better positioned for what comes next, and why the current alliance between OpenAI, Microsoft, and Nvidia is more architecturally fragile than it appears.

The big picture: training vs inference

The current discourse around artificial intelligence is defined by a frantic, speculative energy. Institutions are watching a massive capital expenditure cycle where corporations rush to secure Nvidia GPUs as if they were vital commodities. This behavior reflects a classic gold rush mentality, characterized by announcements made before contracts are signed and commitments that carry no binding obligation.

To understand why this matters, you have to distinguish between the two fundamental modes of AI operation: training and inference. The industry conflates them, but they demand entirely different architectural and economic models.

Training is the act of building the model. It requires massive, centralized compute power. This is where Nvidia currently extracts its profits, selling units that cost tens of thousands of dollars to organizations desperate to participate in the boom. This phase is capital-intensive and centralized.
Inference is the act of using that model to do work. It is the moment a faculty member summarizes a PDF or a student asks for a schedule adjustment. Industry analysis, including work from firms like Sequoia Capital, suggests that by 2030 inference will account for the vast majority of AI compute demand. The existing architectural model for inference does not scale financially. A future where every routine interaction incurs a toll paid to a third-party GPU provider is not a sustainable institutional architecture. Relying on high-wattage, high-cost server hardware for everyday tasks is the Amdahl mistake repeated at scale.

The future belongs to those who can drive the cost of inference toward zero. Two companies have built the architecture to do that. One has built it at the edge. The other has built it at the backend. Both have done it without depending on Nvidia.

Apple’s Zero Marginal Cost Gambit

Apple’s strategy is a masterclass in leverage. While competitors scramble to build larger data centers, Apple has spent years optimizing its own silicon. The Neural Engine embedded in its M-series and A-series chips moves the primary AI workload from the data center to the device itself.

This is a pure architecture play. When a user engages with Apple Intelligence to rewrite an email, sort notifications, or summarize a document, Apple pays zero in cloud costs. The energy and compute are drawn from the device the user already purchased. That approach creates a powerful economic position. It allows for “Everyday AI” that is financially sustainable because it is local.

For tasks that do require cloud processing, Apple has introduced Private Cloud Compute, a system that redefines the relationship between server and user. Apple has constructed a server architecture where user data is cryptographically inaccessible even to Apple’s own administrators. The system deletes user data immediately after the request is fulfilled, and the hardware stack is verified by third-party auditors. This turns privacy from a compliance burden into a core infrastructure layer, offering a path to adoption that does not require compromising data sovereignty.

A nuance worth acknowledging: Apple’s advanced Siri capabilities are being partially powered by Google’s Gemini models. Some observers have read this as evidence that Apple is falling behind. The more accurate reading is the opposite. Apple is treating frontier language models as a commodity input, something to be sourced from the most capable provider at the lowest cost, while maintaining control of the user experience, the privacy architecture, and the device layer. That is exactly the behavior you would expect from a company that understands the infrastructure maturity thesis. You do not build the generator; you control the grid.

The iPhone 17 cycle has financially validated the on-device approach. Apple reported record holiday quarter revenue in January 2026, driven substantially by the AI-enabled hardware upgrade cycle. Consumers are buying new devices specifically to access on-device intelligence. That revenue pattern is structurally different from a subscription model tied to cloud inference costs. Apple’s margins do not erode with usage.

Google’s Vertical Integration as a Strategic Fortress

While Apple has secured the edge, Google has quietly secured the backend. Unlike its competitors, which are dependent on Nvidia’s pricing power and supply constraints, Google spent the last decade designing its own Tensor Processing Units (TPUs). That foresight, which began as early as 2015, has allowed the company to build a compute infrastructure largely immune to the market volatility affecting others.

Google’s TPU ecosystem offers a significant cost-performance advantage over Nvidia-based equivalents for inference tasks. The TPU is not a general-purpose device; it is a workhorse designed for systolic efficiency, moving data less and calculating more. Because Google controls the entire vertical stack, from chip design to model to end-user application, it is insulated from supply chain volatility. They do not pay the NVIDIA or Azure/AWS tax on their own compute infrastructure needs.

The broader market is beginning to recognize this position. Anthropic, one of the most significant AI labs operating today, has placed large orders for TPU capacity. OpenAI, seeking to reduce its Nvidia dependence, has signed agreements with Broadcom, which manufactures custom AI chips that include Google’s TPU architecture. When your most prominent competitors are routing compute through your infrastructure, you are no longer just a search company with an AI capability. You are becoming the grid operator for the next generation of AI applications.

In a 2035 world where AI is utility infrastructure, Google’s ability to control its own input costs will allow it to price aggressively. The company does not owe a margin to a hardware vendor for every transaction. That is a durable structural advantage.

The Fracture That Was Not Theoretical

The current alliance between OpenAI, Microsoft, and Nvidia appears formidable from a distance. It relies on a complex supply chain where incentives are frequently misaligned, and every layer of the stack demands a margin. That fragility is no longer a prediction. It has become a documented outcome.

In September 2025, Nvidia CEO Jensen Huang and OpenAI CEO Sam Altman stood together to announce what was described as a $100 billion strategic partnership to deploy 10 gigawatts of Nvidia infrastructure for OpenAI. The announcement generated enormous market attention. Five months later, no contract had been signed, no money had changed hands, and Nvidia’s own CFO confirmed publicly that the deal remained “a letter of intent” with no assurance that a definitive agreement would be completed. Huang had privately questioned OpenAI’s business discipline and expressed concern about the company’s competitive position against Google and Anthropic. By early March 2026, Huang stated publicly that the original $100 billion deal was “probably not in the cards.” The revised arrangement, reportedly a $30 billion equity stake with no chip-purchase obligations, represents roughly 30 cents on the dollar from the original headline.

This outcome illustrates something important about modular stacks. When your critical infrastructure provider is simultaneously financing your direct competitors, and when the terms of your supply relationship are non-binding announcements rather than executed contracts, you do not have a utility. You have a dependency. Dependencies are renegotiated when the leverage shifts.

The circular financing concern raised by market observers has also proven legitimate. Nvidia committed $10 billion to Anthropic while simultaneously being OpenAI’s primary hardware supplier. OpenAI, in turn, signed a separate binding agreement with AMD, Nvidia’s largest GPU competitor, for 6 gigawatts of hardware. Each company in this stack is hedging against the others. That is not the behavior of aligned partners. It is the behavior of organizations managing dependency risk.

Apple and Google face neither of these problems. Google does not negotiate a memorandum of understanding to access TPUs; it allocates them internally. Apple does not require a third-party GPU vendor’s approval to deploy intelligence features on 2.5 billion active devices. Their supply chains are self-directed.

What This Means for CIOs and Higher Education Leaders

History teaches us that in mature computing markets, vertical integration tends to prevail over modularity at the user-facing layer. This is not a universal rule; the Wintel era demonstrated that modular architectures can dominate for extended periods under the right conditions. But the conditions that sustained Wintel, standardized hardware, stable software interfaces, and low switching costs, do not fully describe today’s AI environment. AI capability is still volatile. Trust and privacy are active stakes for institutional leaders. The switching costs embedded in on-device AI architectures are significant. These conditions favor integrated platforms.

For a CIO charged with stewardship and long-term architectural honesty, the practical consequence is this: vendor commitments built on non-binding letters of intent are not infrastructure. They are options. When you anchor your institution’s AI capability to a vendor whose supply chain is itself dependent on a third party that is simultaneously financing your vendor’s competitors, you have not secured a capability. You have created a dependency on a negotiation you cannot control.

The “Nvidia tax” is the relevant concept here. Any AI service that routes every inference request through high-cost GPU infrastructure passes that cost along, either in direct pricing or in the financial fragility of the vendor providing it. For institutions managing tight operating budgets across multi-year horizons, that cost structure is not sustainable at scale. Everyday AI, the kind that handles scheduling, summarization, advising support, and administrative workflow, cannot be priced like frontier model research. The architecture has to support the economics.

This does not mean institutions should avoid OpenAI’s or Microsoft’s platforms. Many of those tools deliver genuine value. The question is architectural positioning over a five-to-ten-year horizon. Which vendor relationships are building toward lower marginal costs and greater institutional control, and which ones are building toward deeper dependency on supply chains the institution cannot see or influence?

Huang’s reported criticism of OpenAI’s business discipline was not merely industry gossip. It was a signal about how hardware providers evaluate their customers. Discipline, in this context, means the capacity to make binding commitments, manage capital responsibly, and build toward sustainable unit economics. Those are the same standards a CIO should apply to vendor evaluation. If the leading AI infrastructure provider is questioning whether its largest customer has the discipline to execute, that question deserves space in your own vendor risk assessment.

The final word

The temptation for leadership today is to chase the loudest innovations. There is pressure to deploy whatever generates the most excitement and to sign contracts with vendors dominating the headlines. The role of the CIO is to look past the spectacle.

The real Internet was not built during the boom. It was built in the years after, when scarcity forced architectural honesty and every dollar had to address a real constraint. Apple and Google are not winning the AI race because they are spending the most. They are better positioned because they built infrastructure they actually control, optimized for the economics of inference rather than training, and avoided the circular dependencies that are now visibly straining the OpenAI-Nvidia relationship.

The strongest institutional architectures are not built in abundance. They are built when resources are tight and choices are clear. That is the work of stewardship. It is also, not coincidentally, the work that Apple and Google have been doing quietly while the rest of the industry announced deals that were never signed.

Dispatches from an Internet Pioneer

Discussion about this post

Ready for more?