319: AWS Cost MCP: Your Billing Data Now Speaks Human

Welcome to episode 319 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are in the studio to bring you all the latest in cloud and AI news. AWS Cost MCP makes exploring your finops data as simple as english text. We’ve got a sunnier view for junior devs, a Microsoft open source development, tokens, and it’s even Kubernetes’ birthday – let’s get into it!

Titles we almost went with this week:

👩‍❤️‍👨From Linux Hater to Open Source Darling: A Microsoft Love Story
🧑‍💻20,000 Lines of Code and a Dream: Microsoft’s Open Source Glow-Up
🐧Ctrl+Alt+Delete Your Assumptions: Microsoft Goes Full Penguin
🪙Token and Esteem: Amazon Bedrock Gets a Counter
🔍CSI: Cloud Scene Investigation
🏃The Great SQL Migration: How AI Became the Universal Translator
🧮Token and Ye Shall Receive: Bedrock’s New Counting Feature
🪨The Count of Monte Token: A Bedrock Tale – mk
📈Ctrl+Z for Your Database: Now with Built-in Lag Time
🩹IP Freely: GKE Takes the Pain Out of Address Management
🧑‍💼AWS CEO: AI Can’t Replace Junior Devs Because Someone Has to Fix the AI’s Code
⏰Better Late Than Never: RDS PostgreSQL Gets Time Travel
🔈The SQL Whisperer: Teaching AI to Speak Database
🌊DigitalOcean Goes Full Chatbot: Your Infrastructure Now Speaks Human
🪖Musk vs Cook: The App Store Wars Episode AI
💌Firestore Goes Mongo: A Database Love Story
🎂GKE Turns 10: Now With More Candles and Less Complexity
💻Prime Day Infrastructure: Now With 87,000 AI Chips and a Robot Army
⚖️AWS Scales to Quadrillion Requests: Your Black Friday Traffic Looks Cute
👱AWS billing now speaks human, thanks to MCPs
🏰The Bastion Holds: Azure’s New Gateway to Kubernetes Kingdoms
🪫The Surge Before the Merge: Azure’s New Upgrade Strategy
📨CNI Overlay: Because Your Pods Deserve Their Own ZIP Code

AI Is Going Great – or How ML Makes Money

00:46 Musk’s xAI sues Apple, OpenAI alleging scheme that harmed X, Grok

xAI filed a lawsuit against Apple and OpenAI, alleging anticompetitive practices in AI chatbot distribution, claiming Apple deprioritizes competing AI apps like Grok in the App Store while favoring ChatGPT through direct integration into iOS devices.
The lawsuit highlights tensions in AI platform distribution models, where cloud-based AI services depend on mobile app stores for user access, potentially creating gatekeeping concerns for competing generative AI providers.
Apple’s partnership with OpenAI to integrate ChatGPT into iPhone, iPad, and Mac products represents a shift toward native AI integration rather than app-based access, which could impact how cloud AI services reach end users.
The dispute underscores growing competition in the generative AI market, where multiple players, including xAI’s Grok, OpenAI’s ChatGPT, DeepSeek, and Perplexity, are vying for market position through both cloud APIs and mobile distribution channels.
For cloud developers, this case raises questions about AI service distribution strategies and whether direct device integration partnerships will become necessary to compete effectively against app store-based distribution models.

01:55 📢 Justin – “There’s always a potential for conflict of interest when you have a partnership like this, but also the app store – there’s a ton of companies that track downloads and track usage of these things, and I don’t know that they have hard evidence here, other than this is just a way to keep Apple distracted while they make Grok better.”

04:14 AWS CEO says AI replacing junior staff is ‘dumbest idea’ • The Register

AWS CEO Matt Garman argues that using AI to replace junior developers is counterproductive, since they’re the least expensive employees and most engaged with AI tools, warning that eliminating entry-level positions creates a pipeline problem for future senior talent.
Garman criticizes the standard metric of measuring AI value by percentage of code written, noting that more lines of code don’t equal better code – and that over 80% of AWS developers already use AI tools for various tasks, including unit tests, documentation, and code writing.
The CEO emphasizes that future tech workers need to learn critical thinking and problem-solving skills, rather than narrowly focused technical skills, as rapid technological change means that specific skills may not sustain a 30-year career.
This perspective aligns with AWS’s push for their Kiro AI coding assistant while acknowledging that AI should augment rather than replace human developers, particularly as organizations need experienced developers to evaluate and implement AI-generated code properly.
Garman’s comments come amid industry concerns about AI’s impact on employment and follow recent issues with AWS’s Q Developer tool, which had security vulnerabilities, highlighting the ongoing need for human oversight in AI development.

05:25 📢 Ryan – “I do really think the industry is using AI wrong, and I think that the layoffs are a sign of that. And it’s really easy to say ‘oh, well our mid to senior developer staff can now do all these junior tasks, so let’s replace them,’ but I don’t think that’s a sustainable model.”

AWS

11:14 Count Tokens API is now supported for Anthropic’s Claude models now in Amazon Bedrock

Amazon Bedrock now offers a Count Tokens API for Claude models, enabling developers to calculate token usage before making inference calls, which helps predict costs and avoid unexpected rate limit issues.
This API addresses a common pain point where developers would submit prompts that exceed context windows or trigger throttling, only discovering the issue after the fact and potentially incurring unnecessary costs.
The feature enables more efficient prompt engineering by allowing teams to test different prompt variations and measure their token consumption without actually running inference, which is particularly useful for optimizing system prompts and templates.
Currently limited to Claude models only, Amazon is prioritizing Anthropic’s integration, while potentially planning similar support for other Bedrock models, such as Titan, or third-party options.
For cost-conscious organizations, this pre-flight check capability allows better budget forecasting and helps implement guardrails before expensive model calls, critical as enterprises scale their AI workloads.

12:10 📢 Justin – “Now, I appreciate the idea of allowing better budget forecasting, but budget forecasting does not move with the scale of AI, so there is no way that you’re getting an accurate forecast unless you have very specific prompts that you’re going to reuse a LOT of times.”

13:39 Announcing the AWS Billing and Cost Management MCP server

AWS releases an open-source Model Context Protocol (MCP) server for Billing and Cost Management that enables AI assistants like Claude Desktop, VS Code Copilot, and Q Developer CLI to analyze AWS spending patterns and identify cost optimization opportunities.
The MCP server features a dedicated SQL-based calculation engine that handles large volumes of cost data and performs reproducible calculations for period-over-period changes and unit cost metrics, providing more comprehensive functionality than simple API access.
This integration enables customers to utilize their preferred AI assistant for FinOps tasks, including historical spending analysis, cost anomaly detection, workload cost estimation, and AWS service pricing queries, all without needing to switch to the AWS console.
The server connects securely using standard AWS credentials, with minimal configuration required, and is now available in the AWS Labs GitHub repository as an open-source project.
By supporting the MCP standard, AWS enables customers to maintain their existing AI toolchain workflows while gaining access to comprehensive billing and cost management capabilities previously available only in Amazon Q Developer in the console.

14:33 📢 Justin – “All I want to know is, can I ask the MCP to tell me what the hell EC2 Other is?”

16:07 Amazon RDS for Db2 now supports read replicas

Amazon RDS for Db2 now supports up to three read replicas per database instance, enabling customers to offload read-only workloads from the primary database and improve application performance through asynchronous replication.
Read replicas can be deployed within the same region or cross-region, providing both performance scaling for read-heavy applications and disaster recovery capabilities through replica promotion to handle read/write operations.
The feature requires IBM Db2 licenses for all vCPUs on replica instances, which customers can obtain through AWS Marketplace On-Demand licensing or bring their own licenses (BYOL). Note: You’re going to want to do this. On-demand pricing is going to be high. Don’t say we didn’t warn you.
This addition brings RDS for Db2 to feature parity with other RDS engines, such as MySQL and PostgreSQL, which have long supported read replicas, making it more viable for enterprise workloads that require high availability and read scaling.
Key use cases include analytics workloads that require consistent read performance, geographic distribution of read traffic, and maintaining standby instances for disaster recovery without the complexity of manually managing replication.

11:26 Amazon RDS for PostgreSQL now supports delayed read replicas

Amazon RDS for PostgreSQL now supports delayed read replicas, allowing you to configure a time lag between source and replica databases to protect against accidental data deletions or modifications.
The feature enables faster disaster recovery by allowing you to pause replication before problematic changes propagate, then resume up to a specific log position and promote the replica as primary – significantly faster than traditional point-in-time restores, which can take hours for large databases.
Available in all AWS regions where RDS PostgreSQL operates at no additional cost beyond standard RDS pricing, making it an accessible safety net for production databases.
This addresses a common enterprise need for protection against human error while maintaining the performance benefits of read replicas for scaling read workloads.
The implementation follows similar delayed replication features in MySQL and other database systems, bringing PostgreSQL on RDS to feature parity with competitor offerings.

18:39 📢 Justin – “The chances of me being able to realize that I screwed up that badly within 15 minutes before this replicated is probably pretty slim.”

23:07 AWS services scale to new heights for Prime Day 2025: key and milestones | AWS News Blog

AWS infrastructure handled record-breaking Prime Day 2025 traffic with DynamoDB processing 151 million requests per second, ElastiCache serving 1.5 quadrillion daily requests, and Lambda handling 1.7 trillion invocations per day, demonstrating AWS’s ability to scale for extreme workloads.
Amazon deployed over 87,000 AWS Inferentia and Trainium chips to power the Rufus AI shopping assistant, while SageMaker AI processed 626 billion inference requests, demonstrating a significant investment in custom silicon for AI workloads at scale.
AWS Outposts at Amazon fulfillment centers sent 524 million commands to 7,000 robots with peak volumes of 8 million commands per hour (160% increase from 2024), highlighting edge computing’s role in modern logistics and same-day delivery operations.
AWS Fault Injection Service ran 6,800 experiments (8x more than 2024) to test resilience, enabled by new ECS support for network fault injection on Fargate and CI/CD pipeline integration, emphasizing chaos engineering as standard practice for high-availability systems.
AWS rebranded Infrastructure Event Management to AWS Countdown, expanding support to include generative AI implementation, mainframe modernization, and sector-specific optimization for elections, retail, healthcare, and sports events.

28:22 📢 Justin – “What I don’t want our listeners to take away from this is ‘Hey, I should install Fizz and use it on Black Friday!’ If you haven’t had a culture of that chaos testing and the resiliency and redundancy built into your engineering culture for more than a year…do not do that.”

GCP

36:25 Choose the right Google AI developer tool for your workflow | Google Cloud Blog

Google has diversified its AI developer tooling into six distinct offerings: Jules for GitHub automation, Gemini CLI for flexible code interactions, Gemini Code Assist for IDE integration, Firebase Studio for browser-based development, Google AI Studio for prompt experimentation, and the Gemini app for prototyping.
The tools are categorized by interaction model: delegated/agentic (Jules), supervised (Gemini CLI and Code Assist), and collaborative (Firebase Studio and AI Studio), each targeting different developer workflows and skill levels.
Jules stands out as a GitHub-specific agent that can autonomously handle tasks such as documentation, test coverage, and code modernization through pull requests, offering a free tier and paid Pro/Ultra options.
Firebase Studio enables non-professional developers to build production-grade applications in a Google-managed browser environment, complete with built-in templates and Gemini-powered code generation, during its free preview period.
Most tools offer generous free tiers with access to the Gemini model. At the same time, paid options provide higher rate limits and enterprise features through Vertex AI integration, making AI-assisted development accessible across various budget levels.

37:40 📢 Ryan – “The Gemini App – a lot of the documentation that is accompanying the app – is very likely to lead you astray, in terms of whether this is something that can handle a production deployment referencing that API endpoint.”

40:13 Gemini 2.5 Flash Image on Vertex AI | Google Cloud Blog

Google has launched Gemini 2.5 Flash Image on Vertex AI in preview, adding native image generation and editing capabilities with state-of-the-art performance for both functions. The feature includes built-in SynthID watermarking for responsible use.
The model introduces three key capabilities: multi-image fusion, which combines multiple reference images into a unified visual, character, and style consistency across generations without requiring fine-tuning; and conversational editing, utilizing natural language instructions.
Early adopters include Adobe, integrating it into Firefly and Express, WPP testing it for retail and CPG applications, and Figma adding it to their AI image tools, indicating broad enterprise interest across creative workflows.
The conversational editing feature enables iterative refinement through simple text prompts, maintaining object consistency while allowing for significant adjustments—a capability that Leonardo.ai’s CEO describes as enabling entirely new creative workflows.
Available now in preview on Vertex AI with documentation for developers, this positions Google to compete directly with other cloud providers’ image generation services while leveraging their existing Vertex AI infrastructure.

41:49 📢 Justin – “I had complained about how expensive Veo was; now you can make three videos a day with Veo in Geimini Pro.”

43:07 Gemini Cloud Assist investigations performs root-cause analysis | Google Cloud Blog

Gemini Cloud Assist investigations is a new AI-powered root cause analysis tool that automatically analyzes logs, configurations, metrics, and error patterns across GCP environments to diagnose infrastructure and application issues, reducing troubleshooting time from hours to minutes, according to early users.
The service provides multiple access points, including API integration for Slack and incident management tools, direct triggering from Logs Explorer or monitoring alerts, and seamless handoff to Google Cloud Support with full investigation context preserved.
Unlike traditional monitoring tools, this approach leverages Google’s internal SRE runbooks and support knowledge bases, combined with Gemini AI, to generate ranked observations, probable root causes, and specific remediation steps, rather than just surfacing raw data.
Key differentiator is the comprehensive signal analysis across Cloud Logs, Asset Inventory, App Hub, and Log Themes in parallel, automatically building resource topology and correlating changes to identify issues that would be difficult to spot manually in distributed systems.
Currently in preview with no pricing announced, this positions GCP competitively against AWS DevOps Guru and Azure Monitor’s similar AI-driven troubleshooting capabilities, particularly valuable for organizations with complex Kubernetes or Cloud Run deployments.

46:23 Automate SQL translation: Databricks to BigQuery with Gemini | Google Cloud Blo

Google introduces automated SQL translation from Databricks Spark SQL to BigQuery using Gemini AI, addressing the growing need for cross-platform data migration as businesses diversify their cloud ecosystems. The solution combines Gemini with Vertex AI’s RAG Engine to handle complex syntax differences, function mappings, and geospatial operations like H3 functions.
The architecture leverages Google Cloud Storage for source files, a curated function mapping guide, and a few-shot examples to ground Gemini’s responses, resulting in more accurate translations. The system includes a validation layer using BigQuery’s dry run mode to catch syntax errors before execution.
Key technical challenges include handling differences in window functions (like FIRST_VALUE syntax variations), data type mappings, and Databricks-specific functions that need BigQuery equivalents. The RAG-enhanced approach significantly improves translation accuracy compared to using Gemini alone.
This capability targets organizations looking to reduce operational costs by migrating analytics workloads from Databricks to BigQuery’s serverless architecture. Industries with complex SQL workloads and geospatial analytics would benefit most from automated translation versus manual query rewriting.
While no specific pricing is mentioned, the solution promises to reduce migration time and errors compared to manual translation efforts. Google positions this as part of their broader strategy to simplify multi-cloud data operations and lower barriers for customers switching between platforms.

47:13 📢 Justin – “I find it interesting that they call out that their product is not as good as Databricks by saying ‘we’ll help you build all the things that you need for equivalents!’ And likes, that’s helpful. Thanks, Google.”

48:28 Measuring the environmental impact of AI inference | Google Cloud Blog

Google released a technical paper detailing their methodology for measuring AI inference environmental impact, revealing that a median Gemini Apps text prompt uses only 0.24 watt-hours of energy, 0.03 grams of CO2e emissions, and 0.26 milliliters of water – substantially lower than many public estimates and equivalent to watching TV for less than 9 seconds.
Their comprehensive measurement approach accounts for complete system dynamic power, idle machines, CPU/RAM usage, data center overhead (PUE), and water consumption—factors often overlooked in industry calculations that only consider active GPU/TPU consumption. This makes it one of the most comprehensive assessments of AI’s operational footprint.
Google achieved a 33x reduction in energy consumption and a 44x reduction in carbon footprint for Gemini text prompts over 12 months through full-stack optimizations, including Mixture-of-Experts architectures, quantization techniques, speculative decoding, and its custom Ironwood TPUs, which are 30x more energy-efficient than first-generation TPUs.
The methodology provides a framework for consistent industry-wide measurement of AI resource consumption, addressing growing concerns about AI’s environmental impact as inference workloads scale – fundamental as enterprises increasingly deploy generative AI applications.
Google’s data centers operate at an average PUE of 1.09 and the company is pursuing 24/7 carbon-free energy while targeting 120% freshwater replenishment, demonstrating how infrastructure efficiency directly impacts AI workload sustainability.

50:09 📢 Justin – “I do appreciate that they’re trying something here.”

52:44 From silos to synergy: New Compliance Manager, now in preview | Google Cloud Blog

Google Cloud Compliance Manager enters preview as an integrated Security Command Center feature, unifying security and compliance management across infrastructure, workloads, and data.
It addresses the growing challenge of managing multiple regulatory frameworks by providing a single platform for configuration, monitoring, and auditing compliance requirements.
The platform introduces two core constructs: Frameworks (collections of technical controls mapped to regulations, such as CIS, SOC2, ISO 27001, and FedRAMP) and CloudControls (platform-agnostic building blocks for preventive, detective, and audit modes). Organizations can utilize pre-built frameworks or create custom ones, leveraging AI-powered control authoring to expedite deployment.
This positions Google Cloud competitively against AWS Security Hub and Azure Policy/Compliance Manager by offering bidirectional translation between regulatory controls and technical configurations. The integration with Security Command Center provides a unified view that competitors typically require multiple tools to achieve.
Key differentiator is the automated evidence generation for audits, validated through Google’s FedRAMP 20X partnership, which could significantly reduce manual compliance work for regulated industries like healthcare, finance, and government. The platform supports deployment at the organization, folder, and project levels for granular control.
Available now in preview through the Google Cloud Console under Security > Compliance navigation. While pricing details aren’t provided, interested organizations can contact their Google Cloud account team or email compliance-manager-preview@google.com for access and feedback opportunities.

54:01 📢 Ryan – “The automated evidence gathering is spectacular on these tools. And it’s really what’s needed – even from a security engineer standpoint – being able to view those frameworks to see the compliance metrics, and how you’re actually performing across those things, and what’s actually impactful is super important too.”

59:50 GKE Auto-IPAM simplifies IP address management | Google Cloud Blog

GKE Auto-IPAM dynamically allocates and deallocates IP address ranges for nodes and pods as clusters scale, eliminating the need for large upfront IP reservations and manual intervention during scaling operations.
This addresses a critical pain point in Kubernetes networking where poor IP management leads to IP_SPACE_EXHAUSTED errors that halt cluster scaling and deployments, particularly problematic given IPv4 address scarcity.
The feature works with both new and existing clusters running GKE version 1.33 or higher, currently configurable via gcloud CLI or API, with Terraform and UI support coming soon.
Unlike traditional static IP allocation approaches used by other cloud providers, GKE Auto-IPAM proactively manages addresses on demand, reducing administrative overhead while optimizing IPv4 utilization.
Key beneficiaries include organizations running resource-intensive workloads requiring rapid scaling, as the feature ensures sufficient IP capacity is dynamically available without manual planning or intervention.

1:00:58📢 Ryan – “I think it was just last week that Google announced that you could add IP_Space to existing clusters.”

1:02:47 Firestore with MongoDB compatibility is now GA | Google Cloud Blog

Firestore now supports MongoDB-compatible APIs in GA, allowing developers to use existing MongoDB code, drivers, and tools with Firestore’s serverless infrastructure that offers up to 99.999% SLA and multi-region replication with strong consistency.
The service includes over 200 MongoDB Query Language capabilities, unique indexes, and new aggregation stages like $lookup for joining data across collections, addressing enterprise needs for complex queries and data relationships.
Enterprise features include Point-in-Time Recovery for 7-day rollback capability, database cloning for staging environments, managed export/import to Cloud Storage, and change data capture triggers for replicating data to services like BigQuery.
Available through both Firebase and Google Cloud consoles as part of Firestore Enterprise edition with pay-as-you-go pricing and a free tier, targeting industries like financial services, healthcare, and retail seeking MongoDB compatibility without operational overhead.
This positions Google against AWS DocumentDB and Azure Cosmos DB’s MongoDB API by leveraging Firestore’s existing serverless architecture rather than building a separate MongoDB-compatible service.

1:04:42 GKE gets new pricing and capabilities on 10th birthday | Google Cloud Blog

GKE is transitioning to a single paid tier in September 2025, which includes multi-cluster management features such as Fleets, Teams, Config Management, and Policy Controller, all at no additional cost. Optional à la carte features will be available as needed.
Autopilot mode, which provides fully managed Kubernetes without requiring deep expertise, will soon be available for all clusters, including existing GKE Standard clusters on a per-workload basis with the ability to toggle on and off.
GKE now supports larger clusters to handle AI workloads at scale, with customers such as Anthropic, Moloco, and Signify utilizing the platform for training and serving AI models on TPUs, as well as running global services.
The new container-optimized compute platform in Autopilot delivers improved efficiency and performance, allowing workloads to serve more traffic with the same capacity or maintain existing traffic with fewer resources.
After 10 years since its launch and 11 years since Kubernetes was open-sourced from Google’s Borg system, GKE continues to incorporate learnings from running Google’s own services, such as Vertex AI, into the managed platform.
Happy Birthday…

Azure

1:09:17 From 20,000 lines of Linux code to global scale: Microsoft’s open-source journey | Microsoft Azure Blog

Microsoft has evolved from contributing 20,000 lines of Linux code in 2009 to becoming the largest public cloud contributor to CNCF over the past three years, with 66% of Azure customer cores now running Linux workloads.
Azure Kubernetes Service powers some of the world’s largest deployments, including Microsoft 365‘s COSMIC platform, which runs millions of cores, and OpenAI’s ChatGPT, serving 700 million weekly users with just 12 engineers managing the infrastructure.
Microsoft has open-sourced multiple enterprise-grade tools, including Dapr for distributed applications, KAITO for AI workload automation on Kubernetes, and Phi-4 Mini, a 3.8 billion parameter AI model optimized for edge computing.
The company’s open-source strategy focuses on upstream-first contributions, then downstream product integration, contrasting with AWS and GCP’s tendency to fork projects or build proprietary alternatives.
Azure’s managed services like AKS and PostgreSQL abstract operational complexity while maintaining open-source flexibility, enabling rapid scaling without large operations teams, as demonstrated by ChatGPT handling over 1 billion queries daily.

1:11:15 📢 Matt – “I’m confused by that fourth thing, because they fully backed Redis when they changed the licensing and were the only cloud that did, but we focus on open source first…”

1:15:02 DocumentDB joins the Linux Foundation – Microsoft Open Source Blog

Microsoft’s DocumentDB, an open-source MongoDB-compatible database built on PostgreSQL, has joined the Linux Foundation to ensure vendor-neutral governance and broader community collaboration.
The project provides a NoSQL document database experience while leveraging PostgreSQL’s reliability and ecosystem.
The move positions DocumentDB as a potential industry standard for NoSQL databases, similar to ANSI SQL for relational databases, with companies like Yugabyte and SingleStore already joining the technical steering committee. This contrasts with AWS DocumentDB, which remains a proprietary managed service.
DocumentDB offers developers MongoDB wire protocol compatibility without vendor lock-in, using standard PostgreSQL extensions under the MIT license rather than requiring a forked database engine. This approach enables existing PostgreSQL deployments to add document database capabilities without requiring a migration to a separate system.
The project targets organizations wanting MongoDB-style document databases but preferring PostgreSQL’s operational model, backup tools, and existing infrastructure investments. Unlike Azure Cosmos DB’s multi-model approach, DocumentDB focuses specifically on document workloads with PostgreSQL’s proven scalability.
With the Linux Foundation governance, DocumentDB provides an open alternative to proprietary document databases from cloud vendors, potentially reducing costs for self-managed deployments while maintaining compatibility with MongoDB applications and tools.

56:01 📢 Justin – “Now the question is, can I take these DocumentDB extensions and put them on Cloud SQL from Google without having to use Firestore? That’s the real question.”

1:17:31 Public Preview: Azure Bastion now supports connectivity to private AKS clusters via tunneling

Azure Bastion now enables secure tunneling from local machines to private AKS clusters’ API servers, eliminating the need for VPN connections or exposing clusters to public internet while maintaining standard kubectl workflows.
This feature addresses a common security challenge where organizations want private AKS clusters but struggle with developer access, competing with AWS Systems Manager Session Manager and GCP Identity-Aware Proxy for Kubernetes access.
The tunneling capability works with existing Kubernetes tooling and supports both private and public clusters with API server authorized IP ranges, reducing operational complexity for teams managing multiple cluster types.
Target customers include enterprises with strict security requirements and regulated industries that need private clusters but want to avoid managing complex VPN infrastructure or jump boxes for developer access.
While Azure Bastion pricing starts at $0.095/hour plus data transfer costs, this feature could reduce overall infrastructure costs by eliminating dedicated VPN gateways or bastion hosts typically required for private cluster access.

1:18:36 📢 Matt – “Azure Bastion is actually pretty good. We use it at my day job, and it’s really not bad.”

1:23:37 Generally Available: Application Gateway adds MaxSurge support for zero-capacity-impact upgrades

Azure Application Gateway now provisions new instances during rolling upgrades before taking old ones offline through MaxSurge support, eliminating the capacity drops that previously occurred during version transitions.
This addresses a long-standing pain point where Application Gateway upgrades would temporarily reduce available capacity as instances cycled, potentially impacting application availability during maintenance windows.
The feature brings Azure closer to AWS Application Load Balancer‘s connection draining capabilities, though AWS still maintains an edge with more granular control over instance replacement timing.
Enterprise customers running mission-critical workloads will benefit most, as they can now perform gateway updates during business hours without risking performance degradation or connection drops.
While the feature itself doesn’t add direct costs, it may temporarily increase compute charges during upgrades as both old and new instances run simultaneously before the transition completes.

1:24:53 📢 Matt – “It’s amazing this wasn’t there and native, and why is this something you have to think about? It’s supposed to be a managed service. I have to tell it the number of nodes, tell it to do these things…it just feels like a very clunky managed service. And you still have to bring your own certificate.”

1:26:00 Generally Available: Azure Migrate now supports migration to disks with Zone-Redundant Storage (ZRS) redundancy

Azure Migrate now enables direct migration to Zone-Redundant Storage (ZRS) disks, which automatically replicate data synchronously across three availability zones in a region for enhanced durability and availability compared to locally redundant storage.
This feature addresses a key gap for organizations requiring high availability during cloud migrations, as they can now maintain zone redundancy from the start rather than converting disks post-migration, reducing operational overhead and potential downtime.
ZRS disks provide 99.9999999999% (12 9’s) durability over a given year and protect against datacenter-level failures, making this particularly valuable for mission-critical workloads that need continuous availability during zone outages.
While AWS offers similar zone-redundant storage options through EBS Multi-Attach and GCP has regional persistent disks, Azure’s integration directly into the migration tool streamlines the process compared to competitors, who require post-migration configuration.
The feature targets enterprises with strict compliance requirements and those running stateful applications where data loss or extended downtime during zone failures would have a significant business impact, though ZRS disks typically cost 50% more than standard locally redundant storage.

1:28:40 📢 Matt – “This is more for backup. So if you’re running a file server in one region, in one zone, and that zone goes down, your data is still in the other zone – so you spin up a server and attach it.”

Other Clouds

1:31:45 DigitalOcean MCP Server is now available | DigitalOcean

DigitalOcean launched an MCP (Model Context Protocol)
Server that enables developers to manage cloud resources using natural language commands through AI tools like Claude and Cursor.
The server runs locally and currently supports 9 services, including App Platform, Databases, Kubernetes, and Droplets.
MCP is an open-source standard that provides a consistent way for AI systems to connect with external tools and data sources. This eliminates the need for fragmented integrations and allows developers to perform cloud operations directly within their development environment.
The implementation allows developers to use plain English commands like “deploy a Ruby on Rails app from my GitHub repo” or “create a new PostgreSQL database” instead of writing scripts or navigating multiple dashboards. Users maintain control of their API credentials, which stay local.
Security is managed through service scoping, where developers can restrict AI assistant access to only specific services using flags. This prevents context bloat and limits access to only necessary resources while maintaining audit trails and error handling.
The service is currently free and in public preview with hundreds of developers already using it daily for provisioning infrastructure, monitoring usage, and automating cloud tasks. It works with Claude, Cursor, VS Code, Windsurf, and other MCP-compatible clients.

Cloud Journey

1:00:42 A guide to platform engineering | Google Cloud Blog

We had homework to watch the full video — We tried but it was so boring.

The blog post is good. Video is a recording of a conference talk…but man. We promise to find more interesting topics for the next Cloud Journey installation.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

AWS Cost MCP