327: AWS Finally Admits Kubernetes Is Hard, Makes Robots Do It Instead

Welcome to episode 327 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are here to bring you all the latest news (and a few rants) in the worlds of Cloud and AI. I’m sure all our readers are aware of the AWS outage last week, as it was in all the news everywhere. But we’ve also got some new AI models (including Sora in case you’re low on really crappy videos the youths might like), plus EKS, Kubernetes, Vertex AI, and more. Let’s get started!

Titles we almost went with this week

☁️ Oracle and Azure Walk Into a Cloud Bar: Nobody Gets ETL’d
☕ When DNS Goes Down, So Does Your Monday: AWS Takes Half the Internet on a Coffee Break
📱 404 Cloud Not Found: AWS Proves Even the Internet’s Phone Book Can Get Lost
🧑‍🤝‍🧑 DNS: Definitely Not Staffed – How AWS Lost Its Way When It Lost Its People
👩‍❤️‍👨 When Larry Met Satya: A Cloud Love Story
🔍 Azure Finally Answers ‘Dude, Where’s My Data?’ with Storage Discovery
💔 Breaking: Microsoft Discovers AI Training Uses More Power Than a Small Country
🧑‍🔬 404 Engineers Not Found – AWS Learns the Hard Way That People Are Its Most Critical Infrastructure
🪡 Azure Storage Discovery: Finding Your Data Needles in the Cloud Haystack
🚙 EKS Auto Mode: Because Even Your Clusters Deserve Cruise Control
🎞️Azure Gets Reel: Microsoft Adds Video Generation to AI Foundry
🪙 The Great Token Heist: Vertex AI Steals 90% Off Your Gemini Bills
👛 Cache Me If You Can: Vertex AI’s Token-Saving Feature
🧑‍💼 IaC Just Got a Manager – And It’s Not Your Boss
🤖 From Musk to Microsoft: Grok 4 Makes the Great Cloud Migration
🦺 No Harness.. You are not going to make IACM happen
👾 Microsoft Drafts a Solution to Container Creation Chaos
🐚 PowerShell to the People: Azure Simplifies the Great Gateway Migration
🏠 IP There Yet? Azure’s Scripts Keep Your Address While You Upgrade

Follow Up

00:53 Glacier Deprecation Email

Standalone Amazon Glacier service (vault-based with separate APIs) will stop accepting new customers as of December 15, 2025.
S3 Glacier storage classes (Instant Retrieval, Flexible Retrieval, Deep Archive) are completely unaffected and continue normally
Existing Glacier customers can keep using it forever – no forced migration required.
AWS is essentially consolidating around S3 as the unified storage platform, rather than maintaining two separate archival services.
The standalone service will enter maintenance mode, meaning there will be no new features, but the service will remain operational.
Migration to S3 Glacier is optional but recommended for better integration, lower costs, and more features. (Justin assures us it is actually slightly cheaper, so there’s that.)

General News

02:24 F5 discloses major security breach linked to nation-state hackers – GeekWire

F5 disclosed that nation-state hackers maintained persistent access to their internal systems over the summer of 2024, stealing portions of BIG-IP source code and vulnerability details before containment in August.
The breach compromised product development and engineering systems, but did not affect customer CRM data, financial systems, or F5’s software supply chain, according to independent security audits.
F5 has released security patches for BIG-IP, F5OS, and BIG-IP Next products and is providing threat-hunting guides to help customers monitor for suspicious activity.
This represents the first publicly disclosed breach of F5’s internal systems, notable given that F5 handles traffic for 80% of Fortune Global 500 companies through its load-balancing and security services.
The incident highlights supply chain security concerns, as attackers targeted source code and vulnerability information, rather than customer data, potentially seeking ways to exploit F5 products deployed across enterprise networks.

03:12 📢 Justin – “A little concerning on this one, mostly because F5 is EVERYWHERE.”

AI is Going Great – Or How ML Makes Money

04:55 Claude Code gets a web version—but it’s the new sandboxing that really matters – Ars Technica

Anthropic launched web and mobile interfaces for Claude Code, their CLI-based AI coding assistant, with the web version supporting direct access to GitHub repositories and the ability to process general instructions, such as “add real-time inventory tracking to the dashboard.”
The web interface introduces multi-session support, allowing developers to run and switch between multiple coding sessions simultaneously through a left-side panel, plus the ability to provide mid-task corrections without canceling and restarting
A new sandboxing runtime has been implemented to improve security and reduce friction, moving away from the previous approach where Claude Code required permission for most changes and steps during execution
The mobile version is currently limited to iOS and is in an earlier development stage compared to the web interface, indicating a phased rollout approach
This positions Claude Code as a more accessible alternative to traditional CLI-only AI coding tools, potentially expanding its reach to developers who prefer web-based interfaces over command-line environments

05:51 📢 Ryan – “I haven’t had a chance to play with the web version, but I am interested in it just because I found the terminal interface limiting, but I also feel like a lot of the value is in that local sort of execution and not in the sandbox. A lot of the tasks I do are internal and require access to either company resources or private networks, or the kind of thing where you’re not going to get that from a publicly hosted sandbox environment.”

08:36 Open Source: Containerization Assist MCP Server

Containerization Assist automates the tedious process of creating Dockerfiles and Kubernetes manifests, eliminating manual errors that plague developers during the containerization process
Built on AKS Draft’s proven foundation, this open-source tool goes beyond basic AI coding assistants by providing a complete containerization platform rather than just code suggestions.
The tool addresses a critical pain point where developers waste hours writing boilerplate container configurations and debugging deployment issues caused by manual mistakes. (Listener beware, Justin mini rant here.)
As an open-source MCP (Model Context Protocol) server, it integrates seamlessly with existing development workflows while leveraging Microsoft’s containerization expertise from Azure Kubernetes Service. (Expertise is a stretch.)
This launch signals Microsoft’s commitment to simplifying Kubernetes adoption by removing the steep learning curve associated with container orchestration and manifest creation – or you could just use a pass.

09:47 📢 Matt – “The piece I did like about this is that it integrated in as an optional feature, kind of the trivia and the security thing. So it’s not just setting it up, but they integrated the next steps of security code scanning. It’s not Microsoft saying, you know, hey, it’s standard … they are building security in, hopefully.”

Cloud Tools

33:09 IaC is Great, But Have You Met IaCM?

IaCM (Infrastructure as Code Management) extends traditional IaC by adding lifecycle management capabilities, including state management, policy enforcement, and drift detection to handle the complexity of infrastructure at scale.
Key features include centralized state file management with version control, module and provider registries for reusable components, and automated policy enforcement to ensure compliance without slowing down teams.
The platform integrates directly into CI/CD workflows with visual PR insights showing cost estimates and infrastructure changes before deployment, solving the problem of unexpected costs and configuration conflicts.
IaCM addresses critical pain points like configuration drift, secret exposure in state files, and resource conflicts when multiple teams work on the same infrastructure simultaneously.
Harness IaCM specifically supports OpenTofu and Terraform with features like Variable Sets, Workspace Templates, and Default Pipelines to standardize infrastructure delivery across organizations.

13:04 📢 Justin – “So let me boil this down for you. We created our own Terraform Enterprise or Terraform Cloud, but we can’t use that name because it’s copyrighted. So we’re going to try to create a new thing and pretend we invented this – and then try to sell it to you as our new Terraform or OpenTofu replacement for your management tier.”

HugOps Corner – Previously Known as AWS

41:08 AWS outage hits major apps and services, resurfacing old questions about

cloud redundancy – GeekWire

AWS US-EAST-1 experienced a major outage starting after midnight Pacific on Monday, caused by DNS resolution issues with DynamoDB that prevented proper address lookup for database services, impacting thousands of applications, including Facebook, Snapchat, Coinbase, ChatGPT, and Amazon’s own services.
The outage highlighted ongoing redundancy concerns as many organizations failed to implement proper failover to other regions or cloud providers, despite similar incidents in US-EAST-1 in 2017, 2021, and 2023, raising questions about single-region dependency for critical infrastructure.
AWS identified the root cause as an internal subsystem responsible for monitoring network load balancer health, with core DNS issues resolved by 3:35 AM Pacific, though Lambda backlog processing and EC2 instance launch errors persisted through the morning recovery period.
Real-world impacts included LaGuardia Airport check-in kiosk failures, causing passenger lines, widespread disruption to financial services (Venmo, Robinhood), gaming platforms (Roblox, Fortnite), and productivity tools (Slack, Canva), demonstrating the cascading effects of cloud provider outages.
The incident underscores the importance of multi-region deployment strategies and proper disaster recovery planning for AWS customers, particularly those using US-EAST-1 as their primary region due to its status as AWS’s oldest and largest data center location.
We have a couple of observations: this one took a LONG time to resolve, including hours before the DNS was restored. Maybe they’re out of practice? Maybe it’s a people problem? Hopefully, this isn’t the new norm as some of the talent have been let go/moved on.

17:53 📢 Ryan – “If it’s a DNS resolution issue that’s causing a global outage, that’s not exactly straightforward. It’s not just a bug, you know, or a function returning the wrong value, or that you’re looking at global propagation, you’re looking at clients in different places, resolving different things, at the base parts of the internet for functionality. And so it does take a pretty experienced engineer to sort of have that in their heads conceptually in to order to troubleshoot. I wonder if that’s really the cause, where they’re not able to recover as fast. But I also feel like cloud computing has come a long way, and the impact was very widely felt because a lot more people are using AWS as their hosting provider than I think have been in the past. A little bit of everything, I think.”

AWS outage was not due to a cyberattack — but shows potential for ‘far worse’ damage – GeekWire

AWS’s US-EAST-1 region experienced an outage due to an internal monitoring subsystem failure affecting network load balancers, impacting major services including Facebook, Coinbase, and LaGuardia Airport check-in systems.
The issue was related to DNS resolution problems with DynamoDB, not a cyberattack.
The incident highlights ongoing single-region dependency issues, as US-EAST-1 remains AWS’s largest region and has caused similar widespread disruptions in 2017, 2021, and 2023. Many organizations still lack proper multi-region failover despite repeated outages from this location.
Industry experts warn that the outage demonstrates vulnerability to potential targeted attacks on cloud infrastructure monoculture. The concentration of services on single providers creates systemic risk similar to agricultural monoculture, where one failure can cascade widely.
The failure occurred at the control-plane level, suggesting AWS should implement more aggressive isolation of critical networking components. This may accelerate enterprise adoption of multi-cloud and multi-region architectures as baseline resilience requirements.
AWS resolved the issue within hours but the incident reinforces that even major cloud providers remain vulnerable to cascading failures when core monitoring and health check systems malfunction, affecting downstream services across their infrastructure.

Today is when Amazon’s brain drain finally caught up with AWS • The Register

AWS experienced a major outage on October 20, 2025 in US-EAST-1 region caused by DNS resolution failures for DynamoDB endpoints, taking 75 minutes just to identify the root cause and impacting banking, gaming, social media, and government services across much of the internet.
The incident highlights concerns about AWS’s talent retention, with 27,000+ Amazon layoffs between 2022-2025 and internal documents showing 69-81% regretted attrition, suggesting loss of senior engineers who understood complex failure modes and had institutional knowledge of AWS systems.
DynamoDB’s role as a foundational service meant the DNS failure created cascading impacts across multiple AWS services, demonstrating the risk of centralized dependencies in cloud architectures and the importance of regional redundancy for critical workloads.
AWS’s status page showed “all is well” for the first 75 minutes of the outage, continuing a pattern of slow incident communication that AWS has acknowledged as needing improvement in multiple previous post-mortems from 2011, 2012, and 2015.
The article suggests this may be a tipping point where the loss of experienced staff who built these systems is beginning to impact AWS’s legendary operational excellence, with predictions that similar incidents may become more frequent as institutional knowledge continues to leave.

-And that’s an end to Hugops. Moving on to the rest of AWS-

23:58 Monitor, analyze, and manage capacity usage from a single interface with \Amazon EC2 Capacity Manager | AWS News Blog

EC2 Capacity Manager provides a single dashboard to monitor and manage EC2 capacity across all accounts and regions, eliminating the need to collect data from multiple AWS services like Cost and Usage Reports, CloudWatch, and EC2 APIs.
Available at no additional cost in all commercial AWS regions.
The service aggregates capacity data with hourly refresh rates for On-Demand Instances, Spot Instances, and Capacity Reservations, displaying utilization metrics by vCPUs, instance counts, or estimated costs based on published On-Demand rates.
Key features include automated identification of underutilized Capacity Reservations with specific utilization percentages by instance type and AZ, plus direct modification capabilities for ODCRs within the same account.
Data exports to S3 extend analytics beyond the 90-day console retention period, enabling long-term capacity trend analysis and integration with existing BI tools or custom reporting systems.
Organizations can enable cross-account visibility through AWS Organizations integration, helping identify optimization opportunities like redistributing reservations between development accounts showing 30% utilization and production accounts exceeding 95%.

25:45 📢 Ryan – “This is kind of nice to have it built in and just have it be plug and play – especially when it’s at no cost.”

26:21 New Amazon EKS Auto Mode features for enhanced security, network control, and performance | Containers

EKS Auto Mode now supports EC2 On-Demand Capacity Reservations and Capacity Blocks for ML, allowing customers to target pre-purchased capacity for AI/ML workloads requiring guaranteed access to specialized instances like P5s. This addresses the challenge of GPU availability for training jobs without over-provisioning.
New networking capabilities include separate pod subnets for isolating infrastructure and application traffic, explicit public IP control for enterprise security compliance, and forward proxy support with custom certificate bundles. These features enable integration with existing enterprise network architectures without complex CNI customizations.
Complete AWS KMS encryption now covers both ephemeral storage and root volumes using customer-managed keys, addressing security audit findings that previously flagged unencrypted storage.
This eliminates the need for custom AMIs or manual certificate distribution.
Performance improvements include multi-threaded node filtering and intelligent capacity management that can automatically relax instance diversity constraints during capacity shortages.
These optimizations particularly benefit time-sensitive applications and AI/ML workloads requiring rapid scaling.
EKS Auto Mode is available for new clusters or can be enabled on existing EKS clusters running Kubernetes 1.29+, with migration guides available for teams moving from Managed node groups, Karpenter, or Fargate.
Pricing follows standard EKS pricing at $0.10 per cluster per hour plus EC2 instance costs.

27:33📢 Ryan – “This just highlights how terrible it was before.”

29:33 Amazon EC2 now supports Optimize CPUs for license-included instances

EC2 now lets customers reduce vCPU counts and disable hyperthreading on Windows Server and SQL Server license-included instances, enabling up to 50% savings on vCPU-based licensing costs while maintaining full memory and IOPS performance.
This feature targets database workloads that need high memory and IOPS but fewer vCPUs – for example, an r7i.8xlarge instance can be reduced from 32 to 16 vCPUs while keeping its 256 GiB memory and 40,000 IOPS.
The CPU optimization extends EC2’s existing Optimize CPUs feature to license-included instances, addressing a common pain point where customers overpay for Microsoft licensing due to fixed vCPU counts.
Available now in all commercial AWS regions and GovCloud regions, with no additional charges beyond the adjusted licensing costs based on the modified vCPU count.
This positions AWS competitively against Azure for SQL Server workloads by offering more granular control over licensing costs, particularly important as organizations migrate legacy database workloads to the cloud.
Interested in CPU options? Check those out here.

30:20📢 Justin – “This is a little weird to me, because I thought this already existed.”

31:46 AWS Systems Manager Patch Manager launches security updates notification for Windows

AWS Systems Manager Patch Manager now includes an “AvailableSecurityUpdate” state that identifies Windows security patches available but not yet approved by patch baseline rules, helping prevent accidental exposure from delayed patch approvals.
The feature addresses a specific operational risk where administrators using ApprovalDelay with extended timeframes could unknowingly leave systems vulnerable, with instances marked as Non-Compliant by default when security updates are pending.
Available across all AWS Systems Manager regions with no additional charges beyond standard pricing, the feature integrates directly into existing patch baseline configurations through the console at https://console.aws.amazon.com/systems-manager/patch-manager.
Organizations can customize compliance reporting behavior to maintain existing workflows while gaining visibility into security patch availability across their Windows fleet, particularly useful for enterprises with complex patch approval processes.
The update provides a practical solution for balancing security requirements with operational stability, allowing teams to maintain patch deployment schedules while staying informed about critical security updates awaiting approval.

30:20📢 Ryan – “It sounds like just a quality of life improvement, but it’s something that should be so basic, but isn’t there, right? Which is like Windows patch management is cobbled together and not really managed well, and so you could have a patch available, but the only way to find out that it was available previously to this was to actually go ahead and patch it and then see if it did something. And so now, at least you have a signal on that; you can apply your patches in a way that’s not going to take down your entire service if a patch goes wrong. So this is very nice. I think for people using the Systems Manager patch management, they’re going to be very happy with this.”

35:26 Introducing CLI Agent Orchestrator: Transforming Developer CLI Tools into a Multi-Agent Powerhouse | AWS Open Source Blog

AWS introduces CLI Agent Orchestrator (CAO), an open source framework that enables multiple AI-powered CLI tools like Amazon Q CLI and Claude Code to work together as specialized agents under a supervisor agent, addressing limitations of single-agent approaches for complex enterprise development projects.
CAO uses hierarchical orchestration with tmux session isolation and Model Context Protocol servers to coordinate specialized agents – for example, orchestrating Architecture, Security, Performance, and Test agents simultaneously during mainframe modernization projects.
The framework supports three orchestration patterns (Handoff for synchronous transfers, Assign for parallel execution, Send Message for direct communication) plus scheduled runs using cron-like automation, with all processing occurring locally for security and privacy.
Currently supports Amazon Q Developer CLI and Claude Code with planned expansion to OpenAI Codex CLI, Gemini CLI, Qwen CLI, and Aiden – no pricing mentioned as it’s open source, available at github.com/awslabs/cli-agent-orchestrator.
Key use cases include multi-service architecture development, enterprise migrations requiring parallel implementation, comprehensive research workflows, and multi-stage quality assurance processes that benefit from coordinated specialist agents.
We definitely appreciate another tool in the Agent Orchestration world.

37:46 Amazon ECS now publishes AWS CloudTrail data events for insight into API activities

Amazon ECS now publishes CloudTrail data events for ECS Agent API activities, enabling detailed monitoring of container instance operations, including polling (ecs: Poll), telemetry sessions (ecs: StartTelemetrySession), and managed instance logging (ecs: PutSystemLogEvents).
Security and operations teams gain comprehensive audit trails to detect unusual access patterns, troubleshoot agent communication issues, and understand how container instance roles are utilized for compliance requirements.
The feature uses the new data event resource type AWS::ECS::ContainerInstance and is available for ECS on EC2 in all AWS regions, with ECS Managed Instances supported in select regions.
Standard CloudTrail data event charges apply – typically $0.10 per 100,000 events recorded, making this a cost-effective solution for organizations needing detailed container instance monitoring.
This addresses a previous visibility gap in ECS operations, as teams can now track agent-level activities that were previously opaque, improving debugging capabilities and security posture for containerized workloads.

39:33 📢 Ryan – “This is definitely something I would use sparingly because the UCS API is agent API chatting. So this seems like it would be very expensive, very fast.”

GCP

41:22 G4 VMs powered by NVIDIA RTX 6000 Blackwell GPUs are GA | Google Cloud Blog

Google Cloud launches G4 VMs with NVIDIA RTX 6000 Blackwell GPUs, offering up to 9x throughput improvement over G2 instances and supporting workloads from AI inference to digital twin simulations with configurations of 1, 2, 4, or 8 GPUs.
The G4 VMs feature enhanced PCIe-based peer-to-peer data paths that deliver up to 168% throughput gains and 41% lower latency for multi-GPU workloads, addressing the bottleneck issues common in serving large generative AI models that exceed single GPU memory limits.
Each GPU provides 96GB of GDDR7 memory (up to 768GB total), native FP4 precision support, and Multi-Instance GPU capability that allows partitioning into 4 isolated instances, enabling efficient serving of models from under 30B to over 100B parameters.
NVIDIA Omniverse and Isaac Sim are now available on Google Cloud Marketplace as turnkey solutions for G4 VMs, enabling immediate deployment of industrial digital twin and robotics simulation applications with full integration across GKE, Vertex AI, Dataproc, and Cloud Run.
G4 VMs are available immediately with broader regional availability than previous GPU offerings, though specific pricing details were not provided in the announcement – customers should contact Google Cloud sales for cost information. (AKA $$$$.)

43:03 Dataproc 2.3 on Google Compute Engine | Google Cloud Blog

Dataproc 2.3 introduces a lightweight, FedRamp High-compliant image that contains only essential Spark and Hadoop components, reducing CVE exposure and meeting strict security requirements for organizations handling sensitive data.
Optional components like Flink, Hive WebHCat, and Ranger are now deployed on-demand during cluster creation rather than pre-packaged, keeping clusters lean by default while maintaining full functionality when needed.
Custom images allow pre-installation of required components to reduce cluster provisioning time while maintaining the security benefits of the lightweight base image.
The image supports multiple operating systems, including Debian 12, Ubuntu 22, and Rocky 9, with deployment as simple as specifying version 2.3 when creating clusters via gcloud CLI.
Google employs automated CVE scanning and patching combined with manual intervention for complex vulnerabilities to maintain compliance standards and security posture.

44:14 📢 Ryan – “But on the contrary, like FedRAMP has such tight SLAs for vulnerability management that you don’t have to carry this risk or request an exception because of Google not patching Flink as fast as you would like them to. At least this puts the control at the end user, where they can say, well, I’m not going to use it.”

44:45 BigQuery Studio gets improved console interface | Google Cloud Blog

BigQuery Studio’s new interface introduces an expanded Explorer view that allows users to filter resources by project and type, with a dedicated search function that spans across all BigQuery resources within an organization – addressing the common pain point of navigating through large-scale data projects.
The Reference panel provides context-aware information about tables and schemas directly within the code editor, eliminating the need to switch between tabs or run exploratory queries just to check column names or data types – particularly useful for data analysts writing complex SQL queries.
Google has streamlined the workspace by moving job history to a dedicated tab accessible from the Explorer pane and removing the bottom panel clutter, while also allowing users to control tab behavior with double-click functionality to prevent unwanted tab replacements.
The update includes code generation capabilities where clicking on table elements in the Reference panel automatically inserts query snippets or field names into the editor, reducing manual typing errors and speeding up query development workflows.
This interface refresh targets data analysts, data engineers, and data scientists who need efficient navigation across multiple BigQuery projects and datasets – no pricing changes mentioned as this appears to be a UI update to the existing BigQuery Studio service.

46:00 📢 Ryan – “Although I’m a little nervous about having all the BigQuery resources across an organization available on a single console, just because it sounds like a permissions nightmare.”

47:10 Manage your prompts using Vertex SDK | Google Cloud Blog

Google launches GA of Prompt Management in Vertex AI SDK, enabling developers to create, version, and manage prompts programmatically through Python code rather than tracking them in spreadsheets or text files.
The feature provides seamless integration between Vertex AI Studio’s visual interface for prompt design and the SDK for programmatic management, with prompts stored as centralized resources within Google Cloud projects for team collaboration.
Enterprise security features include Customer-Managed Encryption Keys (CMEK) and VPC Service Controls (VPCSC) support, addressing compliance requirements for organizations handling sensitive data in their AI applications.
Key use cases include teams building production generative AI applications that need version control, consistent prompt deployment across environments, and the ability to programmatically update prompts without manual code changes.
Pricing follows standard Vertex AI model usage rates with no additional charges for prompt management itself; documentation available at cloud.google.com/vertex-ai/generative-ai/docs/model-reference/prompt-classes.

47:43 📢 Justin – “If your prompt has sensitive data in it, I have questions already.”

49:05 Gemini Code Assist in GitHub for Enterprises | Google Cloud Blog

Google launches Gemini Code Assist for GitHub Enterprise, bringing AI-powered code reviews to enterprise customers using GitHub Enterprise Cloud and on-premises GitHub Enterprise Server.
This addresses the bottleneck where 60.2% of organizations take over a day for code changes to reach production due to manual review processes.
The service provides organization-level controls, including centralized custom style guides and org-wide configuration settings, allowing platform teams to enforce coding standards automatically across all repositories.
Individual teams can still customize repo-level settings while maintaining organizational baselines.
Built under Google Cloud Terms of Service, the enterprise version ensures code prompts and model responses are stateless and not stored, with Google committing not to use customer data for model training without permission. This addresses enterprise security and compliance requirements for AI-assisted development.
Currently in public preview with access through the Google Cloud Console, the service includes a higher pull request quota than the individual developer tier. Google is developing additional features, including agentic loop capabilities for automated issue resolution and bug fixing.
This release complements the recently launched Code Review Gemini CLI Extension for terminal-based AI assistance and represents part of Google’s broader strategy to provide AI assistance across the entire software development lifecycle.
Pricing details are not specified in the announcement.

51:08 📢 Ryan – “It’s just sort of the ability to sort of do organizational-wide things is super powerful for these tools, and I’m just sort of surprised that GitHub allows that. It seems like they would have to develop API hooks and externalize that.”

53:19 Vertex AI context caching | Google Cloud Blog

Vertex AI context caching reduces costs by 90% for repeated content in Gemini models by storing precomputed tokens – implicit caching happens automatically, while explicit caching gives developers control over what content to cache for predictable savings
The feature supports caching from 2,048 tokens up to Gemini 2.5 Pro’s 1 million token context window across all modalities (text, PDF, image, audio, video) with both global and regional endpoint support
Key use cases include document processing for financial analysis, customer support chatbots with detailed system instructions, codebase Q&A for development teams, and enterprise knowledge base queries
Implicit caching is enabled by default with no code changes required and clears within 24 hours, while explicit caching charges standard input token rates for initial caching, then a 90% discount on reuse, plus hourly storage fees based on TTL.
Integration with Provisioned Throughput ensures production workloads benefit from caching, and explicit caches support Customer Managed Encryption Keys (CMEK) for additional security compliance

54:18 📢 Ryan – “This is awesome. If you have a workload where you’re gonna have very similar queries or prompts and have it return similar data, this is definitely nicer than having to regenerate that every time. They’ve been moving more and more towards this. And I like to see it sort of more at a platform level now, whereas you could sort of implement this – in a weird way – directly in a model, like in a notebook or something. This is more of a ‘turn it on and it works’.”

55:30 Cloud Armor named Strong Performer in Forrester WAVE, new features launched

Cloud Armor introduces hierarchical security policies (GA) that enable WAF and DDoS protection at the organization, folder, and project levels, allowing centralized security management across large GCP deployments with consistent policy enforcement.
Enhanced WAF inspection capability (preview) expands request body inspection from 8KB to 64KB for all preconfigured rules, improving detection of malicious content hidden in larger payloads while maintaining performance.
JA4 network fingerprinting support (GA) provides advanced SSL/TLS client identification beyond JA3, offering deeper behavioral insights for threat hunting and distinguishing legitimate traffic from malicious actors.
Organization-scoped address groups (GA) enable IP range list management across multiple security policies and products like Cloud Next Generation Firewall, reducing configuration complexity and duplicate rules.
Cloud Armor now protects Media CDN with Network Threat Intelligence and ASN blocking capabilities (GA), defending media assets at the network edge against known malicious IPs and traffic patterns.

56:59📢 Ryan – “These are some pretty advanced features for a cloud platform provided WAF. It’s pretty cool.”

Azure

58:44 Generally Available: Observed capacity metric in Azure Firewall

Azure Firewall’s new observed capacity metric provides real-time visibility into capacity unit utilization, helping administrators track actual scaling behavior versus provisioned capacity for better resource optimization and cost management.
This observability enhancement addresses a common blind spot where teams over-provision firewall capacity due to uncertainty about actual usage patterns, potentially reducing unnecessary Azure spending on unused capacity units.
The metric integrates with Azure Monitor and existing alerting systems, enabling proactive capacity planning and automated scaling decisions based on historical utilization trends rather than guesswork.
Target customers include enterprises with variable traffic patterns and managed service providers who need granular visibility into firewall performance across multiple client deployments to optimize resource allocation.
While pricing remains unchanged for Azure Firewall itself (starting at $1.25/hour plus $0.016/GB processed), the metric helps justify right-sizing decisions that could significantly impact monthly costs for organizations running multiple firewall instances.

Generally Available: Prescaling in Azure Firewall

Azure Firewall prescaling allows administrators to reserve capacity units in advance for predictable traffic spikes like holiday shopping seasons or product launches, eliminating the lag time typically associated with auto-scaling firewall resources.
This feature addresses a common pain point where Azure Firewall’s auto-scaling couldn’t respond quickly enough to sudden traffic surges, potentially causing performance degradation during critical business events.
Prescaling integrates with Azure’s existing capacity planning tools and can be configured through Azure Portal, PowerShell, or ARM templates, making it accessible for both manual and automated deployment scenarios.
Target customers include e-commerce platforms, streaming services, and any organization with predictable traffic patterns that require guaranteed firewall throughput during peak periods.
While specific pricing wasn’t detailed in the announcement, prescaling will likely follow Azure Firewall’s existing pricing model where customers pay for provisioned capacity units, with costs varying by region and SKU tier.
When you combine these two announcements, they’re pretty good!

1:01:35 Public Preview: Environmental sustainability features in Azure API Management

Azure API Management introduces carbon-aware capabilities that allow organizations to route API traffic and adjust policy behavior based on carbon intensity data, helping reduce the environmental impact of API infrastructure operations.
The feature enables developers to implement sustainability-focused policies such as throttling non-critical API calls during high carbon intensity periods or routing traffic to regions with cleaner energy grids.
This aligns with Microsoft’s broader carbon negative commitment by 2030 and provides enterprises with tools to measure and reduce the carbon footprint of their digital services at the API layer.
Target customers include organizations with ESG commitments and sustainability reporting requirements who need granular control over their cloud infrastructure’s environmental impact.
Pricing details are not yet available for the preview, but the feature integrates with existing API Management tiers and will likely follow consumption-based pricing models when generally available.

1:02:44📢 Matt – “So APIMs are one, stupidly expensive. If you have to be on the premier tier, it’s like $2,700 a month. And then if you want HA, you have to have two of them. So whatever they’re doing to the hood is stupidly expensive. If you ever had to deal with the SharePoint, they definitely use them because I’ve hit the same error codes as we provide to customers. On the second side, when you do scale them, you can scale them to be multi-region APIMs in the paired region concept, so in theory, what you can do based on this is route a cheaper or more environmentally efficient one, you could route to your paired region and then have the traffic coming that way.”

1:06:09 Unlock insights about your data using Azure Storage Discovery

Azure Storage Discovery is now generally available as a fully managed service that provides enterprise-wide visibility into data estates across Azure Blob Storage and Data Lake Storage, helping organizations optimize costs, ensure security compliance, and improve operational efficiency across multiple subscriptions and regions.
The service integrates Microsoft Copilot in Azure to enable natural language queries for storage insights, allowing non-technical users to ask questions like “Show me storage accounts with default access tier as Hot above 1TiB with least transactions” and receive actionable visualizations without coding skills. Because a non-technical person is asking this question. In the ever-wise words of Marcia Brady, “Sure, Jan.”
Key capabilities include 18-month data retention for trend analysis, insights across capacity, activity, security configurations, and errors, with deployment taking less than 24 hours to generate initial insights from 15 days of historical data.
Pricing includes a free tier with basic capacity and configuration insights retained for 15 days, while the standard plan adds advanced activity, error, and security insights with 18-month retention – specific pricing varies by region at azure.microsoft.com/pricing/details/azure-storage-discovery.
Target use cases include identifying cost optimization opportunities through access tier analysis, ensuring security best practices by highlighting accounts still using shared access keys, and managing data redundancy requirements across global storage estates.

1:08:35📢 Ryan – “Well, I’ll tell you when I was looking for this report, I had a lot of natural language – and I was shouting it at my computer.”

1:09:52 Sora 2 in Azure AI Foundry: Create videos with responsible AI | Microsoft Azure Blog

Azure AI Foundry now offers OpenAI’s Sora 2 video generation model in public preview, enabling developers to create videos from text, images, and existing video inputs with synchronized audio in multiple languages.
The platform provides a unified environment combining Sora 2 with other generative models like GPT-image-1 and Black Forest Lab’s Flux 1.1, all backed by Azure’s enterprise security and content filtering for both inputs and outputs.
Key capabilities include realistic physics simulation, detailed camera control, and creative features for marketers, retailers, educators, and creative directors to rapidly prototype and produce video content within existing business workflows.
Sora 2 is currently available via API through Standard Global deployment in Azure AI Foundry, with pricing details available on the Azure AI Foundry Models page.
Microsoft positions this as part of their responsible AI approach, embedding safety controls and compliance frameworks to help organizations innovate while maintaining governance over generated content.
We’re not big fans of this one.

1:10:12 Grok 4 is now available in Microsoft Azure AI Foundry | Microsoft Azure Blog

Microsoft brings xAI’s Grok 4 model to Azure AI Foundry, featuring a 128K-token context window, native tool use, and integrated web search capabilities. The model emphasizes first-principles reasoning with a “think mode” that breaks down complex problems step-by-step, particularly excelling at math, science, and logic puzzles.
Grok 4’s extended context window allows processing of entire code repositories, lengthy research papers, or hundreds of pages of documents in a single query. This eliminates the need to manually chunk large inputs and enables comprehensive analysis across massive datasets without losing context.
Azure AI Content Safety is enabled by default for Grok 4, addressing enterprise concerns about responsible AI deployment. Microsoft and xAI conducted extensive safety testing and compliance checks over the past month to ensure business-ready protection layers.
Pricing starts at $2 per million input tokens and $10 per million output tokens for Grok 4, with faster variants available at lower costs.
The family includes Grok 4 Fast Reasoning for analytical tasks, Fast Non-Reasoning for lightweight operations, and Grok Code Fast 1 specifically for programming workflows.
The model’s real-time data integration allows it to retrieve and incorporate external information beyond its training data, functioning as an autonomous research assistant. This capability is particularly valuable for tasks requiring current information like market analysis or regulatory updates.

1:11:04 Generally Available: Enhanced cloning and Public IP retention scripts for Azure Application Gateway migration

Azure releases PowerShell scripts to help customers migrate from Application Gateway V1 to V2 before the April 2026 retirement deadline, addressing a critical infrastructure transition need.
The enhanced cloning script preserves configurations during migration while the Public IP retention script ensures customers can maintain their existing IP addresses, minimizing disruption to production workloads.
This migration tooling targets enterprises running legacy Application Gateway Standard or WAF SKUs who need to upgrade to Standard_V2 or WAF_V2 for continued support and access to newer features.
The scripts automate what would otherwise be a complex manual migration process, reducing the risk of configuration errors and downtime during the transition.
Customers should begin planning migrations now as the 2026 deadline approaches, with these scripts providing a standardized path forward for maintaining application delivery infrastructure.
You know would be even easier than PowerShell? How about just doing it for them? Too easy?
(Listener alert: This time it’s a Matt rant.)

Oracle

1:14:59 Oracle Expands AI Agent Studio for Fusion Applications with New Marketplace, LLMs, and Vast Partner Network

Oracle AI Agent Studio expands with new marketplace LLMs and partner integrations for Fusion Applications, allowing customers to build AI agents using models from Anthropic, Cohere, Meta, and others alongside Oracle’s own models.
The platform enables the creation of AI agents that can automate tasks across Oracle Fusion Cloud Applications, including ERP, HCM, and CX, with pre-built templates and low-code development tools for business users.
Oracle is partnering with major consulting firms like Accenture, Deloitte, and Infosys to help customers implement AI agents, though this likely means significant professional services costs for most deployments.
The AI agents can handle tasks like expense report processing, supplier onboarding, and customer service inquiries, with Oracle claiming reduced manual work by up to 50% in some use cases.
Pricing details remain unclear, but the service requires Oracle Fusion Applications subscriptions and likely additional fees for LLM usage and agent deployment based on Oracle’s typical pricing model.

1:15:45📢 Ryan – “They’re partnering with these giant firms that will come in with armies of engineers who will build you a thing – and hopefully document it before running away.”

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

Kubernetes

327: AWS Finally Admits Kubernetes is Hard, Makes Robots Do It Instead