357: Cache Me If You Can - Now With Durability

Welcome to episode 357 of The Cloud Pod, where the weather is always cloudy! Justin and Matt are in the studio this week to bring you all the latest in cloud and AI news! Is AI costing more than the people it replaced? Are CEO’s suffering from AI psychosis? Is Opus 4.8 better than 4.7? We answer all of these questions and more this week – so let’s get started!

Titles we almost went with this week

💔 Valkey Stops Forgetting Your Data Like Your Ex
💸 AI Coding Tools Cost More Than the Coders They Replace
🏧 Microsoft Discovers AI Budgets Burn Faster Than Enthusiasm
💭 Executives Caught Hallucinating About AI Productivity Gains
💃 ABBA Said ” Dancing Queen”, but Google Said Data Center
🥂 AI Now Tells Your AWS Apps How Fragile They Really Are
🔨 Stop Playing VM Whack-a-Mole With Maintenance Windows
🧑‍🔬 Chaos Engineering for Apps Too Scared to Change
👓 AWS Rewires the Data Center With One Weird Optical Trick
🪨 IAM the One Spending All Your Bedrock Money
🧳 SQL Server Licenses Finally Pack Their Own Bags
📚 When AI Hype Meets Productivity Research, It Hurts
🕴️CEOs Gone Wild: Demos Versus Deployment Reality
🛌 Serverless Search Finally Learned to Nap Between Requests
🖥️ ElastiCache Finally Remembers Things After a Reboot
👻 Valkey Gets Durable So Your Data Stops Ghosting You
⏲️ Zero Data Loss Without Losing Your Microseconds Too
🌜 Microsoft Build 2026 Scout AI and Quantum Dreams

A big thanks to this week’s sponsors:

There are many cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but it’s really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer “insured commitments”, but remember to ask: Will you actually give me my rebate? Because Archera will.

Check out thecloudpod.net/archera to schedule a demo today.

General News

01:45 Microsoft data suggests using AI is more expensive than hiring people:

Microsoft canceled most internal Claude Code licenses just months after encouraging widespread adoption, redirecting employees to GitHub Copilot CLI instead.
This does not affect the broader Foundry partnership with Anthropic, but it signals that token costs at scale have become difficult to justify internally.
Uber’s situation adds context here: the company reportedly burned through its entire 2026 AI coding tools budget in four months after internal teams were incentivized to compete on usage. This illustrates how adoption incentives can create runaway costs that outpace projected savings.
The core economic tension worth discussing is whether AI tooling costs at scale can undercut the labor-savings argument.
When compute bills approach or exceed payroll savings, the ROI case for broad AI deployment gets more complicated for finance and engineering leaders to defend.
Companies appear to be responding with tighter governance rather than full rollbacks, including usage caps, narrower approvals, and more targeted deployments focused on measurable productivity gains. This suggests the industry is moving toward a more selective model of AI access rather than open adoption.
There is also an infrastructure cost layer beyond software licensing, as AI workloads drive substantial data center energy and water consumption. For cloud users, this has downstream implications for pricing on enterprise tools and digital services as providers absorb those operational costs.

03:09 📢 Justin – “This is going to be interesting to see what happens in the FinOps space, as we start getting more maturity in that area, and we start seeing open models become a bigger deal and customers looking at different options beyond the foundational models.”

06:13 Tech CEOs are apparently suffering from AI psychosis

Box CEO Aaron Levie coined the term “AI psychosis” to describe how executives overestimate AI capabilities because they interact with polished demos and prototypes rather than the messy last-mile work required to actually deploy and maintain AI systems in production.
The layoff data is worth noting: 115,430 tech workers have been cut in just the first five months of 2026, nearly matching all of 2025, with many companies citing AI productivity gains as justification even when other business factors are driving the decisions.
The research does not support the productivity assumptions behind these decisions. A UC Berkeley meta-analysis found no robust relationship between AI adoption and aggregate productivity gain, and MIT researchers project agents will reach base competence on most text tasks by 2029 and will need additional years to outperform humans.
A Harvard Business Review study identified a practical bottleneck problem: when AI increases output volume across an organization, the constraint shifts to the executives who must review and authorize that output, which can create organizational slowdowns rather than efficiency gains.
For cloud practitioners and developers, the practical takeaway is that AI agents still require substantial human review for code, contract terms, and hallucinated library calls, meaning infrastructure and workflow designs should account for human-in-the-loop requirements rather than assuming full automation.

08:30 📢 Matt – “You still need a human in the loop on a lot of things.”

AI Is Going Great – or How ML Makes Money

09:40 Introducing Always-On pricing: automatic savings for Databricks Lakebase

Databricks is introducing Always-On pricing for Lakebase, its managed Postgres offering, which gives a 25% discount on baseline compute capacity while retaining full autoscaling for traffic spikes, eliminating the traditional forced choice between provisioned and serverless database tiers.
The pricing model activates automatically after 24 hours of continuous use with scale-to-zero disabled, requiring no new contracts, no downtime, and no separate product provisioning, just a minimum compute unit configuration change.
Databricks recommends keeping scale-to-zero as the default for new or intermittent workloads where load patterns are unknown, and switching to Always-On only once historical usage data shows a consistent baseline floor of activity.
Through January 31, 2027, an additional 50% promotional discount stacks on top of the Always-On rate, which could meaningfully reduce costs for production Postgres workloads running continuously on Lakebase.
This commercial model change builds on existing Lakebase technical differentiators like storage-compute separation and instant branching, and is positioned as a cost management option for teams running established production workloads rather than a new infrastructure architecture.

11:33 📢 Justin – “…the reality is that a lot of the things that you pay for are the automation and deploying the server and updating the things. And so if you’re not running those code paths, and you’re not running those architectures, then the company’s also saving money, which is why they can turn some of that savings over to you.”

12:00 Introducing Claude Opus 4.8

Anthropic released Claude Opus 4.8, an incremental upgrade over Opus 4.7 with improvements in agentic task performance, tool calling efficiency, and honesty.
Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, while fast mode is now three times cheaper than previous models at 2.5x the speed.
A notable reliability improvement is that Opus 4.8 is approximately four times less likely than Opus 4.7 to let code flaws pass unremarked, and early testers report it proactively flags uncertainties rather than making unsupported claims.
On the Super-Agent benchmark, it completed every case end-to-end and scored 84% on Online-Mind2Web for browser-agent tasks.
Dynamic Workflows is a new research preview feature in Claude Code for Enterprise, Team, and Max plans that lets the model plan and run hundreds of parallel subagents in a single session. This enables codebase-scale migrations across hundreds of thousands of lines of code from start to merge, which is a practical capability for large engineering teams.
The Messages API now accepts system entries inside the messages array, letting developers update Claude’s instructions mid-task without breaking the prompt cache. This is useful for agentic workflows where permissions, token budgets, or environment context need to change as a task runs.
Anthropic previewed a higher-capability model class called Mythos, currently limited to cybersecurity use cases under Project Glasswing, with broader availability expected in the coming weeks pending additional safety safeguards.

12:40 Introducing dynamic workflows

Anthropic launched dynamic workflows in Claude Code as a research preview, available in the CLI, Desktop, and VS Code extension for Max, Team, and Enterprise plans, as well as via the Claude API on Amazon Bedrock, Vertex AI, and Microsoft Foundry.
The core capability lets Claude dynamically write orchestration scripts that spin up tens to hundreds of parallel subagents in a single session, with independent verification agents checking results before they surface to the user, making it suited for large-scale tasks like codebase-wide security audits or multi-thousand-file migrations.
A concrete example is the Bun runtime rewrite from Zig to Rust, where dynamic workflows produced roughly 750,000 lines of Rust with 99.8% of the existing test suite passing in eleven days, with hundreds of agents working on files in parallel and two reviewers per file.
Token consumption is a meaningful consideration here, as dynamic workflows use substantially more tokens than a standard Claude Code session, and Anthropic recommends starting with scoped tasks to understand usage before scaling up.
For Enterprise plan users, dynamic workflows are off by default and require admin enablement, while Max and Team plan users have them on by default and can trigger them by asking Claude directly or enabling the Ultracode setting via the effort menu.

13:38 📢 Matt – “I feel like every week, Anthropic is just on a roll. I switched over to it; I feel like I saw some improvement, but not that much. You read the internet, and everyone was complaining about 4.7 and 4.6, but I would kind of like to see them update some of the older models, too – Sonnet and Haiku.”

16:40 Anthropic raises $65B in Series H funding at $965B post-money valuation

Anthropic closed a $65 billion Series H round at a $965 billion post-money valuation, with run-rate revenue crossing $4.7 billion earlier this month, reflecting substantial enterprise adoption of Claude across global organizations.
The funding includes $15 billion from hyperscalers, with Amazon contributing $5 billion, and Anthropic has signed compute agreements totaling up to 10 gigawatts of capacity across Amazon, Google/Broadcom TPU infrastructure, and SpaceX Colossus GPU clusters.
Claude is now available on all three major cloud platforms (AWS, Google Cloud, and Microsoft Azure), with AWS remaining the primary cloud and training partner, giving enterprise customers flexibility in how they deploy and integrate the models.
Strategic hardware partnerships with Micron, Samsung, and SK hynix signal that Anthropic is securing memory and storage supply chain relationships directly, addressing compute scaling constraints at the infrastructure level rather than relying solely on cloud providers.
Claude Opus 4.8 was also announced alongside this funding news, targeting stronger performance in coding, agentic tasks, and long-running professional workflows, which aligns with the enterprise deployment focus described throughout the funding announcement.

Cloud Tools

19:27 Announcing no-code application fault injection

Gremlin’s Failure Flags by proxy lets teams run fault injection tests on serverless applications by routing traffic through a sidecar container, requiring zero code changes to the application itself. This addresses a longstanding gap where serverless platforms lack the infrastructure-level access needed for traditional reliability testing.
The proxy approach supports common failure scenarios like dropping availability zone-specific traffic, injecting latency, and generating exceptions to test error-handling logic.
It works across Kubernetes, AWS Lambda, AWS ECS, and Pivotal Cloud Foundry.
Intelligent Health Checks automatically establish baseline metrics for network throughput, latency, and error rate, then halt tests if any metric exceeds its threshold during a test run. This removes the need to configure separate observability integrations or set up API keys.
The practical value here is that teams can validate failure modes in serverless environments that were previously difficult or impossible to test, such as bad API responses, corrupted payloads, and message ordering issues. This gives engineering teams documented evidence of resilience rather than relying on assumptions.
The no-code deployment model lowers the barrier for teams that want chaos engineering coverage without modifying application code or waiting on development cycles to instrument SDKs.

20:29 📢 Justin – “This is a nice enhancement to the Gremlin platform if you are trying to do chaos engineering, although AI does a pretty good job at causing chaos engineering too.”

AWS

22:03 Introducing the next generation of Amazon OpenSearch Serverless for building your agentic AI applications

AWS announced the next generation of Amazon OpenSearch Serverless, which scales from zero to thousands of requests per second and back to zero when idle, offering up to 60% cost savings compared to provisioned OpenSearch Service clusters sized for peak capacity.
The new generation provisions resources in seconds and scales capacity up to 20 times faster than the previous generation, supporting full-text search and vector search collection types with an Express create option that requires no manual configuration.
Native integrations with Vercel and Kiro allow developers to deploy search and vector backends for AI agents directly from those platforms, and the OpenSearch Agent Skills repository provides pre-built domain knowledge and multi-step execution logic for common agent workflows.
Pricing is consumption-based using OpenSearch Compute Units for indexing, search, and GPU acceleration, with storage billed separately per GB-month. The classic OpenSearch Serverless infrastructure remains available for existing users who prefer it.
The next generation is generally available today across all AWS commercial regions where OpenSearch Serverless is currently supported, making it accessible without any regional rollout delays for most customers.

23:42 📢 Matt – “I feel like the ability of these systems to actually scale down to zero was minimal, even Aurora never truly scaled down to zero…So this type of scale up the capability, especially with it going faster, is just going to be really good – potentially for production workloads too.”

24:14 AWS Shield Advanced introduces DDoS attack flow logs

AWS Shield Advanced now provides packet-level DDoS attack flow logs, capturing source and destination IPs, ports, protocols, packet counts, and source country data during active attacks, published to S3, CloudWatch Logs, or Data Firehose at 5-minute intervals.
This fills a notable visibility gap for security teams, enabling post-incident forensic analysis and threat intelligence gathering that was previously difficult without third-party tooling or manual packet capture setups.
The logs integrate naturally with existing AWS analytics tools and workflows, meaning teams can pipe data into Athena, OpenSearch, or third-party SIEMs without significant new infrastructure investment.
A practical consideration: flow logs are only generated during active attacks and require Shield Advanced to already be configured on protected resources, so this is an add-on capability rather than a standalone offering. Shield Advanced pricing starts at $3,000 per month per organization.
Compliance teams benefit directly here, as the structured log data provides an auditable record of DDoS events, which is useful for regulatory reporting in industries like finance and healthcare.

24:29 📢 Justin – “Thank you? You only took a hundred years to get us this quality of life improvement.”

25:06 Amazon Thinks the Future of Data Centers Depends on a Technical Problem It Just Solved

AWS has been deploying a new networking architecture called RNG (Resilient Network Graphs) in data centers since late 2024, starting in Dublin and expanding to Germany and Spain, with most newly built data centers now using the design.
RNG uses a quasi-random flat network topology that eliminates the traditional fat-tree hierarchy of switches and routers, addressing longstanding inefficiencies in data center cabling and routing that have persisted since the mid-1980s.
The reported performance numbers are notable: 69 percent fewer routers and switches, 33 percent higher data throughput, 40 percent reduction in network power consumption, and 27 percent lower operating costs compared to traditional network designs.
A key hardware component is the ShuffleBox, a new optical device Amazon developed internally that physically organizes and shuffles cable connections between routers, replacing the tangled cable bundles typical of fat-tree setups with a more structured physical layout.
Notably, Amazon says RNG is not optimized for AI training workloads, which require more coordinated and centrally orchestrated data patterns, so this is primarily an efficiency improvement for general cloud infrastructure rather than a direct response to AI compute demand.

26:26 📢 Justin – “You know, in a situation where every customer has their own VPCs, you know, the reality is the network has to constantly morph and evolve, and so, it’s good to see they’ve done this. And I’m glad to see that this is solving a big problem for them – fully powered by the fact that they have ASICs that can custom do this work.”

27:55 Amazon RDS for SQL Server supports Bring Your Own Media

Amazon RDS for SQL Server now supports Bring Your Own Media (BYOM), allowing customers to reuse existing Microsoft SQL Server licenses, including Software Assurance, through Microsoft’s License Mobility program when migrating to RDS.
This feature directly addresses a common migration blocker where organizations were either paying for duplicate licenses or waiting for existing agreements to expire before moving to a managed database service.
BYOM integrates with AWS License Manager, giving customers a centralized way to track SQL Server license usage across their AWS environment and maintain licensing compliance.
The feature targets customers running SQL Server on-premises, on other clouds, or as self-managed instances on EC2 who want the operational benefits of RDS, such as automated backups, high availability, and monitoring without additional licensing costs.
Pricing for BYOM differs from standard RDS SQL Server pricing, so customers should review the Amazon RDS for SQL Server pricing page at aws.amazon.com/rds/sqlserver/pricing for regional availability and specific cost details before planning a migration.

28:03 📢 Justin – “You could always bring your own media to install SQL Server. This is really about bringing your own licensing.”

30:21 Amazon ElastiCache for Valkey now supports durability

ElastiCache for Valkey now supports durability via a Multi-AZ transactional log, allowing it to serve workloads where data loss is unacceptable, not just traditional caching scenarios.
Two write modes are available: synchronous writes guarantee zero data loss at single-digit millisecond write latency, while asynchronous writes maintain microsecond latency with a potential loss window of up to 10 seconds, and come at no additional cost.
Both options preserve microsecond read latency, meaning customers do not have to trade read performance for durability, which is a meaningful distinction compared to traditional durable databases.
AWS specifically calls out AI-oriented use cases like agent long-term memory, RAG knowledge bases, and workflow state management, positioning ElastiCache as a viable primary store for latency-sensitive AI applications rather than just a cache layer.
The feature is available now in all commercial, China, and GovCloud regions, starting with Valkey 9.0, and can be enabled at cluster creation via Console, SDK, or CLI.
Pricing details are on the ElastiCache pricing page since synchronous durability costs differ from the free asynchronous option.

31:12 📢 Matt – “I understand the use cases, but I still say you’re using cache wrong.”

32:11 AWS Cost and Usage Report 2.0 now supports Athena and Redshift integration

CUR 2.0 now matches CUR 1.0’s Athena and Redshift integration capabilities, closing a feature gap that had been a barrier for customers considering migration to the newer report format.
When selecting Athena or Redshift integration, exports are automatically delivered in the optimal format, either Parquet or GZIP, along with infrastructure templates, table definitions, and data loading instructions, removing the need for manual configuration or custom ETL pipelines.
Cost data refreshes in CUR 2.0 are automatically reflected in Athena and Redshift tables, meaning customers can query up-to-date billing data using standard SQL without building or maintaining additional data pipeline infrastructure.
Pricing for this feature follows existing costs for the underlying services: S3 storage for the exports, Athena query costs at $5 per TB scanned, and Redshift cluster or Serverless costs depending on the query engine chosen.
This feature is available across all commercial AWS regions but excludes GovCloud US and China regions, which is worth noting for customers operating in those environments who may still need to rely on CUR 1.0 or custom solutions.

33:17 📢 Justin – “I built this pipeline a couple of times, and use the new the newer format because it’s much better. But this is actually even better because they’ve kind of automated some of the other sharp edges of dealing with the current, like the pricing lists and that stuff, and the automatic index updates for Athena. So this is a nice quality of life improvement.”

GCP

34:20 Introducing Google AI Threat Defense to help you outpace the adversary

Google AI Threat Defense is a new automated security system that combines Wiz for exposure mapping, CodeMender for code remediation, Gemini for AI reasoning, and Mandiant for threat intelligence into a single vulnerability management workflow.
The goal is to shrink remediation time from weeks to minutes by automating the scan, prioritize, remediate, and monitor cycle.
The multi-model approach is a notable technical detail here: Google explicitly acknowledges no single AI model catches all vulnerability types, so the platform uses multiple frontier models via the Gemini Enterprise Agent Platform to cover application logic, cloud configuration, binary analysis, and exploitability validation across different asset types.
CodeMender is the code remediation agent at the center of the fix workflow, generating patches directly in a developer’s IDE or CLI, rewriting code to memory-safe languages, and automatically generating tests to verify fixes before deployment. It integrates with Wiz and a tool called Antigravity to coordinate library dependency changes across source control and production environments.
Wiz’s context-aware pen-testing agent continuously simulates attacks to validate exploitable paths, including application-layer and identity-driven risks, which distinguishes this from traditional attack surface management tools that only identify what is exposed without confirming actual exploitability.
Pricing details are not publicly disclosed in the announcement. Ecosystem partners, including Accenture, Deloitte, PwC, Netenrich, and TENEX.AI will handle deployment, ongoing management, and custom workflow integration for enterprise customers.

35:41 📢 Matt – “Here’s my wallet, just set it on fire.”

37:44 Vibe-coded AI Studio apps with Firestore, Firebase, Cloud SQL

Google AI Studio now supports full-stack app deployment to Cloud Run with either Firestore for document storage or Cloud SQL for PostgreSQL as a relational option, with the AI agent automatically selecting the appropriate database based on your prompt. This removes a common decision point for developers prototyping new applications.
New users can deploy up to two full-stack applications through the Google Cloud Starter Tier at no cost and without a billing account, lowering the barrier for developers who want to test production-grade infrastructure before committing to a paid plan.
Cloud SQL integration uses a new PostgreSQL developer edition that scales to zero when not in use, meaning you only pay during active usage. Cloud SQL support on the Starter Tier is noted as coming next month, so it is not fully available at announcement time.
Firebase Auth serves as the single login layer across the stack and enables Google Workspace integrations, including Sheets, Calendar, and Gmail, through a standard Sign in with Google flow.
The agent handles provisioning authentication, Firestore security rules, and database connections automatically, though Google notes users should review security rules before sharing apps.
When a project outgrows the Starter Tier limits, resources transfer directly to a standard billable Google Cloud project without requiring a rebuild, providing a straightforward path from prototype to production at aistudio.google.com.

40:22 📢 Justin – “I think this is an answer to Vercel, right? A lot of developers right now are doing POCs and building apps on top of Vercel because of how easy it is, and how much they don’t have to do. And so I feel like this is a direct response to that, in many ways.”

40:39 Nano Banana 2 and Nano Banana Pro available for everyone

Imagen 4 (marketed as Nano Banana 2) and Imagen 4 Pro (Nano Banana Pro) are now generally available on Vertex AI via the Gemini API, with enterprise SLA support for production deployments.
Both models support 1K and 2K image output at GA, with 4K output still in preview.
A notable new preview capability allows Nano Banana 2 to accept video files as input, enabling the model to analyze visual context and actions within footage to generate context-aware images like thumbnails and infographics. This extends the model beyond text, PDF, and image inputs.
Real-world adoption is already visible across retail and media, with Shopify using the models for product photography expansion, URBN compressing its trend-to-market pipeline, and WPP integrating them into its agentic marketing platform for clients like Verizon and Unilever.
Pricing is usage-based and varies depending on output resolution and throughput tier, with provisioned throughput options available for enterprise-scale deployments at cloud.google.com/gemini-enterprise-agent-platform/models/provisioned-throughput. Developers wanting to experiment without an enterprise SLA can access both models through the standard Gemini API.
The broader pattern here is Google positioning these image models as components within agentic creative workflows rather than standalone tools, which aligns with the Vertex AI platform strategy of bundling multimodal capabilities into end-to-end pipelines.

43:20 AlloyDB Hot Standby: Faster Failovers & Consistent Performance

AlloyDB for PostgreSQL now offers Hot Standby HA, where the standby node continuously applies write-ahead logs from the primary instead of sitting idle, eliminating the database startup phase during failover and reducing downtime to approximately 15 seconds in testing.
The key practical benefit beyond faster failover is post-failover performance consistency.
Because the standby node keeps its buffer cache warm by actively replaying logs, the new primary serves requests at normal throughput almost immediately, rather than degrading for several minutes while caches rebuild from disk.
Hot Standby is rolling out automatically to newly created AlloyDB instances running PostgreSQL 18, with earlier major versions to follow in the coming months. Google states this enhancement comes at no additional cost and remains covered under the existing 99.99% SLA.
Enterprises with low-tolerance workloads, such as financial services, e-commerce, or any application where post-failover performance degradation causes downstream problems, stand to benefit most, since the legacy HA model could leave systems running at reduced throughput for several minutes after a failover event.
For teams evaluating managed PostgreSQL options, this change narrows the gap between managed database services and self-managed setups, where hot standby configurations have long been standard practice.
More details are available at cloud.google.com/alloydb/docs/high-availability.

43:48 📢 Justin – “This is nice if you need high performance from AlloyDB.”

45:21 AlloyDB Remote MCP Server GA: Secure AI Agent Access to Your Data

The Remote MCP Server for AlloyDB is now generally available, giving AI agents a managed HTTP endpoint to securely query operational database data without the infrastructure overhead of local MCP server deployments.
This is part of Google’s broader rollout of 50+ managed MCP servers across its cloud services.
On the security side, the integration uses IAM for fine-grained access control down to specific tables or views, includes Model Armor for prompt injection and data exfiltration protection, and routes all queries through Cloud Audit Logs.
This addresses a real concern for teams connecting AI agents to sensitive production databases.
AlloyDB’s vector capabilities are a notable part of the pitch here, with support for over 10 billion vectors, up to 6x faster vector queries than standard PostgreSQL via the ScaNN index, and built-in AI functions for generating embeddings and reranking results. These features make it a practical backend for RAG-style agentic applications.
The Lakehouse Federation capability lets agents query AlloyDB operational data alongside BigQuery analytical data and Iceberg tables through a single PostgreSQL interface, reducing the need to move or duplicate data across systems.
Pricing is not detailed in the announcement, but AlloyDB offers a 30-day free trial cluster, and the MCP server runs on existing AlloyDB infrastructure.
A hands-on Codelab is available at codelabs.developers.google.com/alloydb-ai-mcp for teams wanting to evaluate the setup.

45:40 📢 Justin – “…managed MCPs are definitely all the rage right now. All the cloud providers are dropping them. I like this one, though, that’s kind of interesting. I don’t know that this should be your primary interface to a DB performance scale, but if you need an interface for an admin or for a user to do more ad hoc querying, this is way better than giving them a select star against the database.”

48:15 GKE standby buffers speed up autoscaling for less spend

GKE standby buffers address the longstanding tradeoff between overprovisioning costs and slow cold starts by suspending pre-initialized nodes to disk, releasing compute and memory costs while retaining only persistent disk and IP address charges.
This results in cost overhead in the low single-digit percent range compared to full overprovisioning.
Standby buffers resume 2-3x faster than provisioning fresh nodes, and when combined with active buffers, the two work in sequence: active buffers handle the immediate spike while standby nodes resume to cover sustained load. Benchmarks showed P50 latency dropping from 4-6 minutes to single-digit seconds under identical traffic conditions.
The feature replaces operationally complex workarounds like balloon pods and lowered HPA thresholds with a declarative CapacityBuffers API, where you simply define how much headroom you need, and GKE manages the rest.
Early customer results from Unico showed time-to-ready dropping from several minutes to 30 seconds.
Practical use cases include agentic workloads, CI/CD pipelines, batch jobs, game servers, and any spiky traffic pattern where scheduling latency matters. GKE benchmarks showed sub-second Agent Sandbox scheduling latency at up to 90% lower cost compared to complete overprovisioning.
Standby buffers are available for GKE clusters running version 1.36.0-gke.2253000 or later, and Google has published an open-source buffer sizing simulator at github.com/gke-labs/buffers-simulator to help teams tune buffer sizes for their specific performance targets.

49:45 Blue, yellow and green: Google invests in its first data center in Sweden.

Google broke ground on its first data center in Horndal, Sweden, expanding GCP infrastructure into the Nordic region to support growing demand for Search, Google Cloud, and YouTube services.
The facility uses air cooling instead of water cooling, which reduces water consumption compared to traditional data center designs, and includes off-site heat recovery to supply warmth to nearby homes and businesses.
For GCP customers in Northern Europe, this expansion means lower latency and improved regional availability for cloud workloads, particularly relevant for Swedish and broader Nordic businesses running latency-sensitive applications.
Google has supported over 700 megawatts of renewable energy additions to the Swedish grid since 2013, and this new facility continues that sustainability focus, which matters for enterprises with carbon reporting requirements.
A EUR 5 million community fund targeting education, sustainability, and workforce development accompanies the investment, signaling a longer-term regional commitment beyond just infrastructure capacity.

50:24 📢 Matt – “ I really like the fact that they’re using the air to cool it down, but then not just venting it out the other side, but venting it to homes and kind of getting that double whammy.”

Azure

51:03 Generally Available: Application Gateway for Containers – Service Mesh integration with Istio

Application Gateway for Containers now has generally available integration with Istio service meshes, automating mutual TLS connectivity between the gateway and mesh-enabled services to simplify secure north-south traffic management in Kubernetes environments.
The integration supports both upstream open-source Istio and the managed Istio add-on for AKS, giving teams flexibility to choose their preferred deployment model without changing their ingress configuration approach.
A notable operational benefit is the single ingress path for routing traffic to services both inside and outside the mesh, which reduces the need for repetitive mTLS definitions and separate gateway configurations.
Certificate lifecycle management is handled automatically, including trust establishment and rotation, which removes a common manual overhead for teams running secure service mesh workloads.
This feature is relevant for platform and infrastructure teams running AKS with service mesh architectures who want a managed ingress solution that integrates natively with their security model. Pricing follows existing Application Gateway for Containers rates, so teams should review the Azure pricing page for current cost details.

51:40 📢 Matt – “If I remember correctly from the beta of it, the biggest annoyance of this is that you can’t go from a current app gateway to an app gateway for containers. They’re different resources inside of Azure, so the fun part about this is you actually need to move and relaunch. And even though you *tell* customers to never whitelist your IP address on your app gateway, there’s definitely always one customer out there that does and then opens a SEV1 ticket. So great feature. Kinda wish the application gateway was all under one bigger umbrella, but I have other issues with the application gateway.”

52:24 Microsoft Build 2026: The 7 biggest announcements

Microsoft announced Scout, an always-on assistant built on OpenClaw that integrates with Microsoft 365 apps, including Outlook, OneDrive, and Teams. It handles background tasks like calendar management and expense reporting, and is currently in desktop preview for Frontier customers in the US with broader availability planned.
Microsoft revealed seven new AI models under its MAI lineup, including MAI-Thinking-1, its first reasoning model featuring 35 billion active parameters and a 128K context window. The model targets complex multi-step instructions, long-context reasoning, and code generation, signaling a continued push toward in-house model development rather than reliance on OpenAI.
Microsoft Execution Containers (MXC) introduce a sandboxed security layer for AI agents running on Windows via OpenClaw, giving developers defined guardrails over what agents can access on a device. A companion app lets users configure their own agents or connect to existing ones within this controlled environment.
The Surface RTX Spark Dev Box targets developers running local AI models, featuring Nvidia’s Arm-based Spark RTX chip and 128GB of unified memory with Visual Studio Code and GitHub Copilot preinstalled. Pricing and full specs have not been disclosed, with US availability expected later this year.
Microsoft’s Majorana 2 quantum chip delivers qubits rated at 1,000 times greater accuracy than its predecessor, using a new material stack with lead-based compounds. Microsoft projects it could achieve a practical quantum computer by 2029 based on this progress.

Con’t Microsoft launches Scout, an OpenClaw-inspired personal assistant

Microsoft launched Scout at Build 2026, an always-on agentic AI assistant built on the OpenClaw framework that integrates directly with Microsoft 365, allowing users to automate tasks across email, calendar, and other productivity tools with a persistent, personalized identity.
Scout operates across cloud, desktop, and web browser, and comes with prepackaged skills for calendar management and meeting agenda drafting, though the intended long-term value is in user-defined custom skills that the assistant learns and refines over time.
Access requires both enrollment in Microsoft’s Frontier early adopter program and an active GitHub Copilot subscription, so this is not a standalone product and adds cost on top of existing Copilot licensing.
Scout includes a built-in policy conformance system that continuously checks agent behavior against set guidelines and generates an audit trail for each check, directly addressing concerns raised by the OpenClaw incident, where an agent acted erratically inside a researcher’s inbox.
The customization loop where Scout adapts to individual user behavior is the core differentiator here, but it also raises practical questions for enterprise IT teams around governance, data access scope, and how user-defined agent skills interact with existing security policies.

53:57 📢 Matt – “It’s nice to see them actually get a new model out there, because it’s been it’s been a while and getting something out there that people can use – and honestly them internally getting off of OpenAI, you know… I’m sure for a long time they were using OpenAI and paying somewhere for it. So if they can run it all under their own model, probably gonna be better off in the long run as a business.”

58:10 AI alone won’t change your business. The system running it will.

Microsoft is positioning its agent platform as a five-layer system covering build, contextualize, run, govern, and improve, integrating GitHub, Azure Foundry, Microsoft IQ, Agent 365, and the Microsoft Security stack into a single workflow rather than separate tools.
Microsoft IQ is a notable new component that grounds agents in enterprise data from Microsoft 365, business systems, and the web via Web IQ, with Frontier Tuning allowing organizations to post-train models on their own workflows and data while keeping that trained intelligence within their own environment.
Azure Foundry serves as the production runtime for agents, supporting frameworks beyond Microsoft’s own stack, including LangGraph, Claude Agent SDK, and custom harnesses, with Fireworks AI integration for optimized open model inference and a built-in model router to balance quality, speed, and cost.
Agent 365 addresses a practical enterprise concern by providing a centralized catalog of all deployed agents across an organization, giving IT visibility into who deployed each agent, what data it can access, how it behaves, and what it costs, with policy enforcement built in.
Pricing details are not specified in the announcement, so listeners evaluating this platform should expect to assess costs across multiple components, including Foundry compute, IQ data connectors, and Frontier Tuning separately as they scope out deployments.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

357: Cache Me If You Can – Now With Durability