354: US-Tirefire-1 Lives Up To Its Stellar Reputation

Welcome to episode 354 of The Cloud Pod, where the weather is always cloudy! This week was sort of a tire fire for the cloud, with US-East-1 losing power, TanStack Supply chain being hit with an impressively creative attack, and Linux getting hit with a second vulnerability in as many weeks. But it’s not all bad news – Microsoft finally figured out we don’t want (or need) Copilot in EVERYTHING, and Anthropic introduced dreaming via Claude managed agents. There’s even more where that came from, plus an aftershow, so let’s get started!

Titles we almost went with this week

🔒 IAM Not Messing Around With AI Agent Security
🥳 Redis Who? Valkey 9.0 Crashes the Cache Party
🪫 US-EAST-1 Loses Power Again, Architects Say Told You So
💳 HTTP 402 Payment Required Now Actually Required for Bedrock Agents
📉 ElastiCache Finds Your Data With Vectors and Vibes
😆 Stop Squinting at Logs and Let AI Do It
🗺️ GKE Nodes Finally Stop Taking the Scenic Route
👀 AWS MCP Server Goes GA So Your AI Stops Lying
🧑‍💻 AI Agents Now Snitching on Your Sloppy Security Code
🪱 TanStack Supply Chain Worm Trusted SLSA and Lied
💭 I wonder if Claude is dreaming about how bad my code is
😵 US-EAST-1 Loses Power Again, The CloudPod Say Told You So
🪪 Will my credit card company accept my agent bought it as a fraud reason?
☁️ Extended RDS and Cloud SQL is a TAX without representation
🫖 Boston SQL Party – Throw your Extended RDS overboard
🔐 Everyday is a bad day for Cyber Security
⚖️ Azure Scale Sets Finally Let Your VMs Grow Up
📠 From 200 to 1000 VMs Without Starting Over
🧳 Availability Sets Pack Their Bags for Scale Sets

A big thanks to this week’s sponsors:

There are many cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but it’s really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer “insured commitments”, but remember to ask: Will you actually give me my rebate? Because Archera will.

Check out thecloudpod.net/archera to schedule a demo today.

Follow Up

01:26 Microsoft Cuts Copilot Bloat

Microsoft is actively removing Copilot integrations from products where adoption was low or user feedback was negative, including Gaming Copilot on Xbox and several Windows 11 entry points in Photos, Widgets, and Notepad.
The scale of the Copilot sprawl became concrete when a tech commentator counted 81 distinct Copilot products, a figure that circulated internally at Microsoft and drew attention from staff.
Microsoft executive Jacob Andreou publicly acknowledged the need to cut underperforming Copilots before deleting the post, signaling an internal shift toward consolidation under a single combined consumer and enterprise Copilot organization.
The financial case for trimming Copilots is direct: Microsoft noted during its most recent earnings that running certain Copilots was compressing margins, particularly free integrations in Windows where no additional revenue offsets the inference costs.
The products Microsoft is choosing to retain, such as Microsoft 365 Copilot, which saw 33 percent growth in paying users last quarter, point toward a narrower focus on enterprise workflows with measurable revenue attachment rather than broad surface-area coverage.

03:06 📢 Jonathan – “I think it’s just the invasive nature of the whole thing, because the ULA for Copilot, someone’s going to click right through it and not realize that everything they type in their app is now being sent to Copilot and used for training. And all of a sudden, they added a bunch of apps that everyone uses every day, like Notepad, and I think it’s quite an invasion previously. I love AI and using AI, but having it slammed in my face by Microsoft, having it enabled by default, and having it take screenshots and having it do all those things without explicitly opting in is what I’m unhappy about.”

General News

05:04 Linux bitten by second severe vulnerability in as many weeks

Linux has seen two severe privilege escalation vulnerabilities disclosed within a week of each other, with Dirty Frag (CVE-2026-43284 and CVE-2026-43500) allowing low-privilege users to gain root access across virtually all distributions.
The exploit is particularly concerning because it is deterministic, stealthy, and causes no crashes, making it difficult to detect while working reliably across different environments, including shared servers and virtual machines.
Proof-of-concept code was leaked publicly before most distributions had incorporated kernel patches, effectively turning this into a zero-day and accelerating real-world exploitation risk.
Microsoft has already observed signs of active experimentation in the wild.
Cloud and shared hosting environments face elevated exposure since the attack is well-suited to multi-tenant scenarios where untrusted users share underlying infrastructure.
Debian, AlmaLinux, and Fedora have released patches, and organizations running Linux workloads should prioritize checking their distribution’s patch status and applying updates promptly.

06:46 📢 Jonathan – “I think the easiest way to exploit it is through is supply chain attacks, because if you do an APT update or something and there’s an open source package that’s been packaged up into somebody’s Ubuntu repo, whenever those things run, they can run shell scripts – they can run arbitrary code when you the update and they’re all, to be fair, they’re already running his route at that point anyway, so it’s not quite as bad, but yeah.”

07:18 TanStack npm Packages Hit by Mini Shai-Hulud

On May 11, 2026, 84 malicious npm artifacts were published across 42 TanStack packages by hijacking the legitimate release pipeline using OIDC token extraction from runner memory, not stolen credentials.
This is notable as the first documented supply chain attack producing valid SLSA Build Level 3 provenance attestations, meaning standard provenance verification tools would show these packages as trusted.
The attack chained three vulnerabilities: a pull_request_target misconfiguration allowing fork code to run in the base repo context, GitHub Actions cache poisoning using a pre-computed cache key, and OIDC token extraction from runner memory using a technique first documented in the tj-actions/changed-files compromise from March 2025.
The worm self-propagated to over 200 packages beyond the initial TanStack packages, reaching Mistral AI, UiPath, and others within hours, and systematically harvested credentials from AWS IMDSv2, HashiCorp Vault, Kubernetes service accounts, and notably Claude Code session history files.
A critical remediation ordering issue exists: the payload installs a dead-man’s switch that runs rm -rf ~/ if the stolen GitHub token is revoked, meaning teams must disable the monitor service before rotating credentials or risk home directory destruction.
The practical takeaway for teams is that SLSA provenance is a necessary but insufficient supply chain control, and any npm package using OIDC trusted publishing without branch and workflow pinning is vulnerable to this class of attack.

08:16📢 Justin – “Brilliant – bravo for the creativity for this one. I had not thought of that attack vector before.”

15:02 AWS warns of EC2 ‘impairment’ as power loss hits notorious US-EAST-1 region

US-EAST-1 experienced another power loss event, causing EC2 instance impairment, continuing the region’s well-documented history of outages that have made it a cautionary tale for architects who skip multi-region or multi-AZ design.
This incident reinforces why AWS best practices around Availability Zone distribution exist in the first place. Running workloads across multiple AZs or regions is not optional for production systems with meaningful uptime requirements.
For teams still running single-AZ deployments in US-EAST-1, this is a practical reminder to review architecture decisions around Auto Scaling groups, RDS Multi-AZ, and Route 53 health checks as baseline resilience tools.
The article itself is thin on technical specifics, which points to a broader issue with AWS’s incident communication.
Customers often learn more from third-party sources than from the AWS Service Health Dashboard during active events.
Cost consideration for listeners: building multi-AZ or multi-region redundancy does increase infrastructure spend, but the business impact of unplanned downtime in a single-AZ setup typically outweighs that cost for any revenue-generating workload.

16:30 📢 Matthew – “I also feel like if you have a cloud architect at your company that’s recommending you don’t do multi AZ, just in general, you should probably fire the person.”

AI Is Going Great – or How ML Makes Money

18:47 New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration

Anthropic launched several updates to Claude Managed Agents, including dreaming in research preview, plus outcomes, multiagent orchestration, and webhooks in public beta.
Dreaming is a scheduled process that reviews past agent sessions to extract patterns, refine memory stores, and enable agents to self-improve between sessions without human intervention.
The outcomes feature lets developers write a rubric for success, then a separate grader evaluates agent output in its own context window to avoid being influenced by the agent’s reasoning.
Internal benchmarks showed up to 10 percentage point improvement in task success over standard prompting, with file generation gains of 8.4% for docx and 10.1% for pptx.
Multi-agent orchestration allows a lead agent to break complex jobs into parallel workstreams delegated to specialist subagents, each with its own model, prompt, and tools.
All events are persistent and traceable through the Claude Console, giving developers full visibility into which agent did what and in what order.
Real-world results from early adopters show measurable outcomes: Harvey saw roughly 6x improvement in completion rates using dreaming for legal drafting workflows, and Wisedocs reported 50% faster document review cycles using outcomes to enforce quality standards.
The combination of memory, dreaming, and multi-agent orchestration represents a shift toward agents that accumulate institutional knowledge over time rather than starting fresh each session, which has practical implications for enterprise teams running long-horizon or high-volume workloads.
Want to request access to Dreaming? You can do that here.

19:47 📢 Jonathan – “It’s a great feature. I built a retrospective agent a while ago, before Dreaming was around… but that went back through and looked at chunks of things, especially if I had to correct it to figure out if I said something wrong earlier in the chat? What led to the divergence from the intent in a way? So I guess this is an automated way of doing the same thing, and probably covers a wider range of problems than I have built.”

21:58 Agent view in Claude Code

Anthropic launched agent view in Claude Code on May 11, 2026, currently available as a Research Preview for Pro, Max, Team, Enterprise, and API plan users.
It provides a unified CLI interface for managing multiple Claude Code sessions simultaneously, accessible via the claude agents command or the left arrow key from any session.
The feature addresses a practical pain point for developers running parallel AI coding sessions, replacing the need to juggle multiple terminal tabs or tmux grids.
Each session row displays status, last response content, and interaction timestamp at a glance.
Two key commands extend session management flexibility: /bg moves an existing session to the background, while claude –bg [task] launches a new session directly in the background without occupying the foreground terminal.
Early use cases include dispatching multiple coding tasks in parallel and reviewing the resulting pull requests from a single list, managing long-running looping jobs like PR monitors with next-run times visible in the agent list, and quickly spinning up related tasks or codebase questions without losing context in the primary session.
For teams and organizations, this tooling supports scaling concurrent AI-assisted development workflows within existing rate limits, which is worth noting as a practical constraint when planning parallel workloads.
Want to install Claude Code? Do that here.

AWS

27:26 The AWS MCP Server is now generally available

The AWS MCP Server is now generally available as a managed remote MCP server that gives AI agents authenticated access to all 15,000+ AWS API operations using existing IAM credentials, with no additional charge beyond normal AWS resource costs. It is currently available in US East N. Virginia and Europe Frankfurt regions.
A core problem this solves is AI coding agents relying on stale training data, producing overly permissive IAM policies, and defaulting to CLI commands instead of CDK or CloudFormation.
The server addresses this by retrieving current AWS documentation at query time and providing Skills, which are curated best practices maintained by AWS service teams.
The new run_script tool lets agents execute sandboxed Python server-side with no network access, allowing multi-step API calls to be chained in a single round-trip rather than sequentially, which reduces both latency and context window consumption.
Enterprise governance is addressed through IAM context keys for fine-grained access control, CloudWatch metrics under the AWS-MCP namespace to separate agent calls from human calls, and full CloudTrail logging for compliance audit trails.
The server works with any MCP-compatible client, including Claude Code, Kiro, Cursor, and Codex, but requires a local proxy called MCP Proxy for AWS to bridge IAM SigV4 authentication to the OAuth 2.1 that MCP currently supports, which adds a setup step worth noting for teams evaluating adoption.

28:54 📢 Jonathan – “It seems like just like another abstraction on top of another abstraction at this point. We’ve already got the cloud configuration API that they built for Terraform to use. I would assume that this MCP talks to that. But why? Why not just teach it how to use CLA commands?”

32:34 AWS Marketplace now supports programmatic procurement with Agreements API

AWS Marketplace launched the Agreements API, allowing organizations to programmatically procure software, accept offers, track charges, manage entitlements, and update purchase orders without leaving their existing procurement tools.
Combined with the existing Discovery API, this creates a full end-to-end programmatic procurement workflow from product discovery through purchase, which is useful for enterprises running automated or policy-driven software acquisition processes.
Partners and ISVs can use these APIs to build custom storefronts on top of AWS Marketplace, giving them more control over how customers experience the procurement process within their own platforms.
The API is currently available only in US East (N. Virginia), which is worth noting for organizations with regional compliance or data residency requirements, as broader regional availability is not yet confirmed.
Getting started requires configuring IAM permissions and calling the API via the AWS SDK, with full documentation available in the AWS Marketplace Agreement APIs reference. No separate pricing for API usage was announced, as costs would reflect the underlying Marketplace product agreements.

33:26 📢 Justin – “This is nice, because they had a Koopa integration maybe six years ago, they announced it, then they basically did nothing – no one adopted it – and they kind of stopped working on it. So this is much better to have an agreements API that you can actually integrate into.”

35:04 Announcing Agent Toolkit for AWS — help AI coding agents build effectively on AWS

AWS launched the Agent Toolkit for AWS, a free suite of tools designed to help AI coding agents work more reliably on AWS by providing validated, up-to-date procedures called agent skills, reducing errors and token waste in multi-service workflows. It succeeds the MCP servers and plugins previously hosted on AWS Labs.
The toolkit launches with over 40 agent skills covering infrastructure-as-code, storage, analytics, serverless, containers, and AI services, with database, networking, and IAM skills planned soon. Skills give agents tested procedures rather than letting them improvise from potentially outdated training data.
The AWS MCP Server, now generally available, adds IAM-based guardrails, CloudWatch and CloudTrail observability, and sandboxed code execution, addressing the governance concerns that have made organizations hesitant to deploy coding agents in production environments. It is currently available only in US East N. Virginia and Europe Frankfurt.
Three pre-bundled plugins simplify setup by combining the MCP server with curated skill sets: AWS Core for full-stack application developers, AWS Data Analytics for data pipeline work, and AWS Agents for building production agents using Amazon Bedrock AgentCore.
The Agent Toolkit is available at no additional charge, with customers paying only for the underlying AWS resources their agents consume, making adoption straightforward for teams already using AWS services.

36:07 📢 Justin – “Everyone’s trying to get to agents, and how agents run on top of Bedrock Agent Core, and so, how baked are these things when things like Bedrock Agent Core are pretty new? I do appreciate it; I think the MCP server is probably where I would spend most of my time for building an agent for this, even though I just mocked it mercilessly, but the plugins might be good, or the skills might be good for certain things if you’re not familiar. If they could do an incognito skill.”

38:07 AWS Console Mobile App adds interactive graphs, AI log summaries, and natural language logs search to CloudWatch Alarms

AWS Console Mobile App now consolidates CloudWatch alarm investigation into a single view, combining interactive metric graphs, AI-generated log summaries, and natural language log search to reduce the time from alert to root cause identification.
The natural language log search supports typed queries, voice input, and pre-saved Logs Insights queries, which lowers the barrier for on-call engineers who need to investigate incidents quickly from a mobile device.
The AI-generated log summaries automatically highlight key contributing factors when an alarm triggers, which could reduce the cognitive load during off-hours incident response without requiring engineers to manually parse raw log data.
The feature is available at no additional cost beyond standard CloudWatch charges and works across all AWS Commercial Regions, accessible by downloading or updating the AWS Console Mobile App from the Apple App Store or Google Play Store.

40:01 Agents that transact: Introducing Amazon Bedrock AgentCore payments, built with Coinbase and Stripe

Amazon Bedrock AgentCore payments, now in preview, let AI agents autonomously pay for APIs, MCP servers, web content, and other agents using either a Coinbase wallet or Stripe Privy wallet, with spending limits enforced per session to prevent open-ended fund access.
The feature is built on the x402 protocol, an HTTP-native standard where agents handle HTTP 402 Payment Required responses automatically, executing stablecoin micropayments and continuing their task without interrupting the reasoning loop. Fiat payment support is on the roadmap.
Developers configure a funded wallet, set session spending limits, and the platform handles credential management, protocol negotiation, and transaction observability through existing AgentCore logs and traces, reducing what AWS describes as months of custom billing integration work.
The Coinbase x402 Bazaar MCP server is available through the AgentCore gateway, giving agents a discovery mechanism to find and pay for x402-enabled services dynamically rather than requiring developers to hardcode each integration.
The feature is available in preview across four regions: US East N. Virginia, US West Oregon, Europe Frankfurt, and Asia Pacific Sydney.
Pricing is not yet publicly specified, though the use cases described involve micropayments typically under one dollar or fractions of a cent per transaction.

41:11 📢 Justin – “And this is where you think you’re gonna make blockchain purchasing more popular, I don’t know if that’s the case.”

43:37 Announcing Valkey 9.0 for Amazon ElastiCache

Amazon ElastiCache now supports Valkey 9.0, which adds built-in full-text and hybrid search capabilities on top of existing vector similarity search, enabling real-time semantic retrieval and aggregations over terabytes of data with microsecond latency at no additional cost.
A notable performance improvement in 9.0 is up to 40% higher throughput for pipelined workloads, achieved through engine-level optimizations like faster command parsing and improved memory prefetching, which could reduce over-provisioning costs for high-throughput applications.
Two new operational features stand out: hash field expiration lets you apply TTLs to individual fields within a hash rather than the entire key, and multi-database support in cluster mode provides logical namespaces that simplify multi-tenant architectures and migrations from standalone Redis environments.
Valkey 9.0 is available now across all commercial AWS Regions, GovCloud, and China Regions for both node-based clusters and serverless caches, with no additional pricing beyond standard ElastiCache costs.
Existing clusters can be upgraded via the AWS Console, SDK, or CLI.
AWS continues to position Valkey as its recommended ElastiCache engine over Redis, and with 100-plus enhancements in this release, teams evaluating a Redis-to-Valkey migration have a more complete feature set to work with, particularly for AI-driven and real-time use cases.

44:30 Amazon ElastiCache now supports real-time full-text, exact-match, and numeric range search

Amazon ElastiCache now supports full-text, exact-match, and numeric range search directly within the cache layer, eliminating the need for a separate search service and enabling latency as low as microseconds with throughput up to millions of operations per second.
The feature is available at no additional cost for clusters running ElastiCache version 9.0 for Valkey, which AWS positions as its recommended open-source Redis alternative. Existing clusters can be upgraded via the console, SDK, or CLI.
Practical use cases include product inventory lookups, user session filtering, financial transaction range queries, and gaming leaderboards, all scenarios where data changes frequently and search results need to reflect the latest writes immediately.
Developers can combine search types in a single query, for example, filtering by category, price range, and text match simultaneously, which reduces application complexity compared to routing queries across multiple services.
The feature is available across all commercial AWS regions, GovCloud, and China regions, making it broadly accessible for regulated and global workloads without regional limitations.

44:40 Amazon ElastiCache now supports real-time hybrid search with vector and full-text

Amazon ElastiCache now supports hybrid search combining vector similarity and full-text search in a single query, available on ElastiCache for Valkey 9.0 at no additional cost.
This eliminates the need for a separate search service while delivering latency as low as microseconds and up to 99% recall across billions of embeddings.
The feature integrates with popular embedding providers, including Amazon Bedrock, SageMaker, Anthropic, and OpenAI, making it practical for teams already using those services to build RAG systems and AI agent memory without adding infrastructure.
A notable technical detail is that search indexes update in real time as writes complete, meaning applications always query current data rather than relying on batch index refreshes common in traditional search architectures.
Practical use cases include e-commerce product search, where users might combine exact product names with semantic descriptions, and generative AI applications where hybrid retrieval can reduce token costs by surfacing more precise context.
Availability spans all commercial AWS Regions, GovCloud, and China Regions for node-based clusters running Valkey 9.0 or above, with upgrade paths available through the AWS Console, SDK, or CLI.

45:18 📢 Justin – “…Amazon really only provides Nova embedding as like a Bedrock embedding model, unless you go use Coheer or Mistral or any of the others. So it’s definitely something to keep in mind, too. So you might have to bring your own embedding model if you don’t like Nova’s.”

46:30 AWS Capabilities by Region now supports availability notifications

AWS Capabilities by Region in AWS Builder Center now supports availability notifications, letting builders subscribe to alerts when specific services or features launch in their target Regions across 1,500+ services and 37 Regions.
Subscriptions work at the service level, meaning a single subscription to something like Amazon Bedrock automatically covers all underlying features, such as Knowledge Bases and Guardrails, removing the need to track each feature individually.
Notifications come through two channels: real-time in-app alerts within AWS Builder Center and a consolidated weekly email digest, giving builders flexibility in how they stay informed.
Practical use cases include monitoring service parity across Regions, planning migrations, and tracking when specific capabilities land in a target Region before committing to an expansion.
The feature is free for all authenticated users with an AWS Builder ID, with no additional cost to access or manage subscriptions through Settings > Notifications in AWS Builder Center at builder.aws.com/build/capabilities.

46:59 📢 Justin – “…this is great, and something that, as they get more regions out there, becomes a bit of a problem. We used to talk about all the different regions getting services – and we even were talking about them here, even though they were lightning round topic – we were just like, we can’t. It’s ridiculous how these things roll out over time, and what’s available and not available. But having the ability to see what it is, but then now sign up for a notification so I don’t have to go back to the builder center, even better.”

48:53 Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account

Claude Platform on AWS gives customers access to Anthropic’s native Claude Platform directly through their AWS account, eliminating the need for separate credentials, contracts, or billing relationships with Anthropic.
AWS is noted as the first cloud provider to offer this native integration.
The service uses IAM credentials and AWS Signature Version 4 for authentication, logs activity to CloudTrail, and bills through AWS Marketplace, meaning teams can manage Claude usage alongside existing AWS governance and cost tracking workflows.
An important technical distinction to understand: Claude Platform on AWS is operated by Anthropic and processes data outside the AWS security boundary, making it different from Claude models on Amazon Bedrock, which stay within AWS infrastructure.
Teams with regional data residency requirements should factor this into their decision.
The service includes access to features like Claude Managed Agents, MCP connector, web search, code execution, and the Files API, with workspaces providing IAM-based access controls and isolation between projects or teams.
Pricing follows Anthropic’s consumption-based model billed through AWS Marketplace, and the service is available across roughly 18 regions spanning North America, South America, Europe, and Asia Pacific.
Teams already using Claude Code or other Anthropic tools can point those clients at their workspace with minimal configuration changes.

49:35 📢 Justin – “This is nice because it’s a little bit better than just having the API. You get all the features that you kind of lose typically by using the Bedrock API with Cloud, so things like Chrome browse modes – that works in this as well. And basically, what this really is is a different way to contract and buy your Cloud managed services through AWS, which I think is handy.”

51:39 AWS Security Agent’s full repository code scanning feature now available in preview

AWS Security Agent now includes full repository code scanning in preview, offering context-aware security analysis that reasons about entire codebases rather than matching individual lines against known vulnerability patterns like traditional SAST tools do.
The scanner operates in four stages: profiling the application to map entry points and trust boundaries, dispatching specialized agents to high-risk components, deduplicating findings, and independently validating each candidate vulnerability before surfacing it to developers.
A notable distinction from existing tools is how findings are structured, with separate Verified and Could not verify sections, so developers know exactly what was confirmed in code versus what depends on runtime or deployment environment factors.
Practical use cases include running scans before penetration tests to clear lower-hanging issues, auditing acquired or open source code without needing institutional knowledge, and surfacing architectural trust boundary issues alongside implementation bugs.
Full repository code review is available now in preview at no additional charge for existing AWS Security Agent customers, with access through the AWS Security Agent console and a quickstart guide here.

52:56 📢 Jonathan – “I wonder what model they’re using underneath, because it’s not Nova. They trained something. I kind of wonder… it must suck that they built a model that’s so good that they can’t sell it to anybody.”

54:56 Microsoft exec Shawn Bice returns to AWS to lead reliability push for AI agents – GeekWire

Shawn Bice is returning to AWS as VP of AI Services to lead the Automated Reasoning Group, reporting to Swami Sivasubramanian, who oversees Agentic AI at Amazon.
Bice previously ran AWS’s database portfolio, including Aurora, DynamoDB, and RDS, before leaving in 2021.
The Automated Reasoning Group focuses on neurosymbolic AI, which combines pattern-matching capabilities with mathematical verification techniques to confirm software is behaving as intended.
The goal is to give businesses stronger guarantees about AI agent behavior before deploying them autonomously.
This hire comes after AWS acknowledged a limited service disruption in February tied to an AI agent making changes without human oversight, which raised questions about reliability controls in agentic systems.
Bice’s background in security at Microsoft, where he oversaw Security Copilot and Sentinel, appears relevant to addressing those concerns.
For AWS customers building or evaluating agentic AI workflows, this signals that AWS is investing in formal verification and trust mechanisms as a differentiator, rather than relying solely on model-level improvements.
Businesses in regulated industries may find this approach particularly relevant when assessing autonomous agent deployments.

GCP

56:44 Google Is Building an AI Agent That Could Be Its Answer to OpenClaw

Google is internally testing an AI agent called Remy, described as a 24/7 personal agent built on Gemini that can take actions on behalf of users rather than just answering questions or generating content.
It is currently in a dogfooding phase with employees using a staff-only version of the Gemini app.
Remy is designed to integrate deeply across Google services, with the ability to monitor for user-defined priorities, handle complex multi-step tasks proactively, and learn user preferences over time.
This goes beyond the existing Agent Mode features already available in Gemini at various subscription tiers.
No public launch timeline has been confirmed, and Google declined to comment on the project.
Google I/O later this month is expected to feature agent-related announcements, though it is unclear if Remy will be part of that.
For GCP customers and enterprise users, a deeply integrated personal agent that connects across Google Workspace and other services could have practical implications for workflow automation, though no pricing or enterprise deployment details are available yet.
The competitive context here is OpenClaw, a viral third-party AI agent that OpenAI moved to acquire talent from earlier this year.
Google’s internal effort signals that autonomous personal agents are becoming a standard product category rather than a niche capability.

57:25 📢 Justin – “I tried to use a lot of Gemini Enterprise every day at the day job, where we’re a big customer of it, and I’m always disappointed in the limited capabilities it has. I hope this comes quickly, because they need much better capabilities here.”

1:00:44 Google’s Gemma 4 AI models get 3x speed boost by predicting future

tokens

Google released Multi-Token Prediction drafters for Gemma 4, offering up to 3x faster token generation through speculative decoding.
The drafter models predict future tokens during idle compute cycles rather than waiting for the main model to process each token sequentially.
The MTP drafter for the E2B model is notably small at 74 million parameters, and it shares the main model’s key value cache to avoid redundant context recalculation. It also uses sparse decoding to narrow down likely token clusters, which contributes to the speed improvement on memory-constrained consumer hardware.
This addresses a specific bottleneck in local AI inference where slow VRAM-to-compute transfers leave processing units idle between tokens. Consumer GPUs lack the high-bandwidth memory found in enterprise hardware, so the drafter fills that gap by doing useful work during those transfer delays.
Gemma 4 runs under an Apache 2.0 license, a more permissive change from the custom license used in prior Gemma versions, which broadens options for developers building commercial or derivative applications.
The MTP drafters are currently labeled experimental, so production use cases should account for that status.
The practical audience here is developers running local inference on consumer or prosumer hardware who want faster generation without upgrading to enterprise accelerators. No additional cost is associated with the drafter models beyond the hardware and compute already in use.

1:02:00 📢 Jonathan – “But it’s only fast if you’ve got spare compute cycles. You know, if you’re to go full capacity, it’s actually a lot slower.”

1:04:18 Gemini 3.1 Flash-Lite is now generally available

Gemini 3.1 Flash-Lite is now generally available on the Gemini Enterprise Agent Platform, positioned as the lowest-latency and most cost-efficient model in the Gemini 3 series, designed for high-volume automated pipelines and agentic tasks like tool calling and orchestration.
Real-world performance metrics from early adopters are notable: Gladly reported roughly 60% lower costs compared to thinking-tier models, with p95 latency around 1.8 seconds for full reply generation and a 99.6% success rate under heavy concurrent load across SMS, WhatsApp, and Instagram channels.
The model supports multimodal inputs, enabling use cases like simultaneous text and image safety checks in gaming platforms and prompt enhancement for image generation pipelines, areas where cost previously limited sophisticated prompt engineering at scale.
Financial services teams are using Flash-Lite for latency-sensitive workflows, including real-time research during live calls, email triage, and high-volume data processing, with Ramp noting it leads on cost, latency, and intelligence tradeoffs across their model stack.
Pricing details are available here, and documentation for getting started is over here.

1:07:16 GKE node startup gets faster

GKE now delivers up to 4x faster node startup times for qualifying nodes in Autopilot mode, addressing cold-start latency that has historically forced teams to over-provision midle compute as a buffer against scaling delays.
The improvement comes from three architectural changes: intelligent compute buffers, fast-starting virtual machines, and a new control plane that allows VMs to resize without rebooting, all applied automatically without any configuration changes from users.
The feature is immediately available for GKE Autopilot workloads on supported hardware, and Standard cluster users can selectively apply it to specific pods using the Autopilot ComputeClass without migrating their entire cluster.
AI inference and GPU workloads stand to benefit most, since faster node provisioning reduces the gap between a traffic spike and when a model can actually serve requests, which directly affects end-user latency and accelerator costs.
Pricing follows existing GKE Autopilot rates with no additional charge for the faster provisioning, meaning the cost benefit comes indirectly through reduced need for idle standby nodes rather than a new pricing tier.

1:08:31 📢 Justin – “The need for instant, on-demand capacity at a Kubernetes node level feels rare to me, unless you’re doing something like agentic training.”

1:09:33 Postgres 18 and Extended Support for legacy versions in AlloyDB

AlloyDB now supports PostgreSQL 18 in general availability, bringing features like B-tree skip scans, parallel GIN index usage, native UUIDv7 support, and virtual generated columns to Google’s managed Postgres service.
Google is introducing Extended Support for older AlloyDB major versions, giving customers up to three years of continued security patches, bug fixes, and SLA coverage beyond community end-of-life dates.
Pricing for Extended Support has not been announced yet, but will carry an additional fee.
Extended Support timelines are automatically applied, starting with PostgreSQL 14 from February 2027 through February 2030, with similar three-year windows rolling forward for versions 15 through 17. Customers can opt out at any time by upgrading to a version still in regular support.
AlloyDB’s in-place major version upgrade path reduces upgrade time to minutes without requiring data migration or connection string changes, which is notable for large multi-tenant environments like UKG’s People Fabric platform that manages thousands of database objects.
AlloyDB’s compute-storage separation architecture offloads logging and maintenance tasks to a dedicated storage layer, with Google citing up to 2x better price-performance compared to self-managed PostgreSQL and elastic storage that scales automatically without pre-provisioning.

1:09:59 📢 Justin – “I can tell you that this is the feature that I hate the most about both Amazon and Google, doing this extended support; basically a taxing process where they start charging you more money because it’s old. It’s kind of annoying. I get why they do it. I mean, you have to maintain it, maintain test harnesses, all that. But I can’t imagine you’re doing that much changing to the orchestration layer. That code doesn’t have to change. It’s really just a way to tax people and make more money on old stuff, in my opinion.”

Azure

1:12:40 Restore a Deleted Logical Server (Preview) – Azure SQL Database

Azure SQL Database now supports soft delete retention for logical servers, currently in preview, allowing deleted servers to be recovered within a configurable window of 1 to 7 days.
This addresses a long-standing gap where accidental deletion of a logical server meant permanent loss of the server configuration and all associated databases.
The feature is particularly relevant for teams running automation, scripted cleanup jobs, or bulk operations where accidental deletion is a realistic risk. It also benefits dev and test environments, where servers are frequently created and destroyed.
Configuration is straightforward through the Azure portal, PowerShell, or Azure CLI, with the SQL Server Contributor role required to enable or use the feature.
One notable portal limitation is that soft delete retention can only be set on existing servers, not during initial server creation.
There are some meaningful limitations to be aware of: restoring a server does not automatically restore managed identities, Customer Managed Key encryption must be reconfigured after restore, and servers protected by the Microsoft Entra-only authentication policy cannot be restored without first removing that policy.
Servers older than two years automatically get a seven-day soft delete retention period, while servers under two years old have the feature disabled by default, so teams should proactively check and configure retention settings rather than assuming protection is in place. No additional pricing details are listed for this feature beyond standard Azure SQL Database costs.
The question really is: what did Matt delete that he was anxious to share this story?

1:14:22 AI Subagents ‘Coming Soon’ to Visual Studio Copilot

Microsoft principal product manager Mads Kristensen announced that Copilot subagents are coming soon to Visual Studio, bringing a feature that has been available in VS Code since around GitHub Universe 2025 to the full Windows IDE.
A subagent is an independent AI agent that handles a focused task, such as auditing config files or reviewing test coverage, and returns only a summary to the main agent, which helps manage context window consumption in large projects like complex .NET solutions.
VS Code’s current implementation supports custom subagents with their own tools, instructions, and model selections, parallel subagent execution, and recursive delegation up to a depth of 5, giving developers a sense of where Visual Studio’s implementation may eventually land.
Visual Studio already has built-in agents for debugging, profiling, testing, and modernization via agent mode, but subagent orchestration, where a parent agent delegates work internally to child agents, is not yet documented as an available Visual Studio feature.
No pricing details or specific release date have been provided beyond the coming soon signal, and no pricing changes are expected since this builds on existing Copilot subscription tiers.

1:14:46 📢 Justin – “I guess I’m happy that you finally got this; it’s been in Claude and all the other tools for a while, so congrats that you finally got what everyone else has.”

1:15:03 Public Preview: Migrate Availability Sets to Virtual Machine Scale Sets

Azure is finally letting you migrate VMs out of Availability Sets into Virtual Machine Scale Sets without nuking and rebuilding your workloads — this has been a long-standing pain point for anyone who stood up infrastructure before VMSS Flex was the recommended pattern.
The scale ceiling alone is worth paying attention to; Availability Sets cap out at 200 VMs, VMSS Flex goes to 1,000, and you get autoscaling, rolling upgrades, and zone-level resiliency that Availability Sets never had.
The migration is VM-by-VM and cancellable at any point, which is the right call for production workloads.
You can validate each machine before moving on, and anything not yet migrated stays put in the original set if you bail.
The portal experience does all-at-once migration, so if you need zero-downtime or rolling migration, you’ll want CLI, PowerShell, or the REST API; an important distinction that’s easy to miss if you just click through the guided flow.
Zonal migration is the recommendation for anything that needs serious resiliency. You get a 99.99% SLA vs 99.95% with fault domains only, and you can optionally resize VMs as part of the zonal move, which is a nice bonus if you’ve been wanting to right-size

After Show

1:18:46 Introducing Googlebook, designed for Gemini Intelligence

Google announced Googlebook, a new laptop category that merges Android and ChromeOS into a single platform, with devices expected from Acer, ASUS, Dell, HP, and Lenovo this fall. No pricing has been announced yet.
The Magic Pointer feature, developed with Google DeepMind, adds contextual Gemini suggestions directly to the cursor, allowing users to interact with on-screen content like dates, images, and text without switching applications.
A Create your Widget feature lets users generate custom desktop widgets through natural language prompts, pulling data from Gmail, Calendar, and web searches into a single personalized dashboard.
Quick Access enables direct browsing of phone files from the Googlebook file manager without manual transfers, and Android phone apps can be used online on the laptop without leaving the current workflow.
This announcement is primarily a consumer hardware story rather than a GCP or enterprise cloud infrastructure development, so GCP-focused listeners should note the Gemini integration angle but temper expectations about direct cloud platform implications until more technical details are released at googlebook.com.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

354: US-Tirefire-1 lives up to its Stellar Reputation