307: The AI Assistant That Finally Understands Your Kubernetes Cluster (We Are Doomed)

Welcome to episode 307 of The Cloud Pod – where the forecast is always cloudy! Who else is at a conference? Justin is coming to us this week from sunny San Diego where he’s attending FinOps – so we have that news to look forward to for next week. Matt and Ryan are also on hand today to share the latest news from Kubernetes, Salesforce acquisitions, and the strange case of Azure making AWS more cost effective.

Titles we almost went with this week:

🏖️The Great Redis Escape: One Year Later, Valkey is Living Its Best Life
🎽Cache Me If You Can: How Valkey Outran Redis’s License Policies
📈Tier Today, Gone Tomorrow: AWS’s New Storage Class That Moves Your Data So
You Don’t
🤖Hey AI, Deploy My App: AWS Makes It Actually Work
🧮AWS Finally Calculates What You’ll Actually Pay
🎲The Price is Right: AWS Edition
💰From List Price to Real Price: AWS Gets Transparent
🌳Red Hat and AWS Sitting in a Tree, R-H-E-L-I-N-G
📁Dockerfile? More Like Dockefile-It-For-Me with Amazon’s New MCP Server
🔍Elementary, My Dear Watson: Amazon Q Becomes Sherlock Holmes for AWS
🧢CUD You Believe It? Red Hat Gets the Discount Treatment
👩‍❤️‍👨Committed Relationship Status: It’s Complicated (But 20% Cheaper)
🫳RHEL Yeah! Google Drops Prices on Enterprise Linux
📀Disk Today, Gone Tomorrow: Azure’s Vanishing OS Storage
🫖ATL1: Where GPUs Meet Sweet Tea and Southern Hospitality
☁️AWS Launches Operation Cloud Sovereignty
🧱The Great Firewall of Europe: AWS Edition
🏰Amazon Builds a GDPR Fortress in Germany

General News

01:46 What Salesforce’s $8B acquisition of Informatica means for enterprise data and AI | VentureBeat

Salesforce just dropped $8 billion to acquire Informatica.
This purchase was really about building the data foundation needed for agentic AI to actually work in enterprise environments – we’re talking about combining Informatica’s 30 years of data management expertise with Salesforce’s cloud platform to create what they’re calling a “unified architecture for agentic AI.”
This acquisition fills a massive gap in Salesforce’s data management capabilities, bringing in critical pieces like data cataloging, integration, governance, quality controls, and master data management – all the unsexy but absolutely essential plumbing that makes AI agents trustworthy and scalable in real enterprise deployments.
The timing here is fascinating, because Informatica literally just announced their own agentic AI offerings last week at Informatica World, so Salesforce is essentially buying a company that’s already pivoted hard into the AI space – rather than trying to build these capabilities from scratch.
There’s going to be some interesting overlap with MuleSoft, which Salesforce bought for $6.5 billion back in 2018, but analysts are saying Informatica’s data management capabilities are more comprehensive and updated – this could mean some consolidation challenges ahead as they figure out how to integrate these overlapping technologies.
For enterprise customers, this could be a game-changer because it promises to automate those painful, time-consuming data processes that typically take days or weeks. These AI agents can handle data ingestion, integration, and pipeline orchestration with minimal human intervention.
The $8 billion price tag is actually lower than the rumored $11 billion bid from last year, which might indicate either tough negotiations or perhaps some concerns about integration challenges. Remember, Salesforce has already spent over $50 billion on acquisitions including Slack, Tableau, and MuleSoft.

02:56 📢 Justin – “Just keep your hands off slack, okay guys? That’s all I care about.”

Cloud Tools

05:13 Gomomento: Valkey Turns One How The Community Fork Left Redis In The Dust

Valkey has officially hit its one-year milestone as the community-driven fork of Redis, and it’s fascinating to see how quickly it’s gained traction after Redis Labs switched to a more restrictive license in March 2023.
The Linux Foundation stepped in to support this open-source alternative, and major players like AWS, Google Cloud, and Oracle have all thrown their weight behind it, essentially creating a unified response to Redis’s licensing changes.
What’s really impressive about Valkey is how it’s maintained complete compatibility with Redis while actually pushing innovation forward – they’ve already released version 8.0 with features like improved memory efficiency and better performance for large-scale deployments.
This shows the community isn’t just maintaining a fork, they’re actively improving upon the original codebase.
For developers and engineers, the practical impact is that you can continue using all your existing Redis tooling and client libraries without any changes, but now you have the peace of mind that comes with a truly open-source solution backed by the Linux Foundation. No more worrying about future licensing surprises or restrictions on how you can use your in-memory data store.
The performance improvements in Valkey 8.0 are particularly noteworthy – they’ve managed to reduce memory overhead by up to 20% for certain workloads while maintaining the same blazing-fast performance Redis users expect. This is crucial for companies running large-scale caching layers where even small efficiency gains can translate to significant cost savings.
Looking ahead, Valkey’s roadmap includes some exciting features like native support for vector similarity search and improved clustering capabilities, which suggests they’re not just playing catch-up but actually positioning themselves to lead in the in-memory database space.
The irony here is that Redis’s attempt to monetize through licensing restrictions may have actually accelerated innovation in the space by spurring the creation of a well-funded, community-driven alternative that’s now pushing the entire ecosystem forward faster than before.

06:37 📢 Ryan – “I haven’t seen a lot of talk of Redis recently and every new greenfield application that I’ve seen or worked around now is looking at Valkey or using Valkey actively. So I feel like this is going to go the same way as Elasticsearch and the licensing change there where it just won’t be the go-to option anymore.”

07:59 The Harness MCP Server

Harness just released their MCP Server, which implements the Model Context Protocol – an open standard that lets AI agents like Claude Desktop, Windsurf, or Cursor securely connect to your Harness workflows without writing custom APIs or brittle glue code, essentially turning Harness into a plug-and-play backend for AI agents.
This addresses a major pain point where customers are excited about AI but struggle with giving their agents secure access to delivery data from pipelines, environments, and logs.
The MCP Server acts as a lightweight local gateway that translates between AI tools and the Harness platform while maintaining enterprise-grade security controls.
What’s clever here is that Harness is dogfooding their own solution – they’re using the same MCP server internally that they’re offering to customers, which means it’s battle-tested and provides consistency across different AI agents and environments without the maintenance headache of multiple adapters.
The security story is particularly strong – it uses JSON-RPC 2.0 for communication, integrates with Harness’s existing RBAC model, handles API keys directly in the platform, and ensures no sensitive data ever gets sent to the LLM, which should make security teams much more comfortable with AI integrations.
From a practical standpoint, this enables some interesting use cases like customer success engineers using AI to instantly check release statuses without bothering the dev team, or building Slack bots that alert on failed builds and surface logs with minimal setup time.

10:31 📢 Justin – “The key success of being able to build a successful MCP though is to have APIs. So if you were already behind on getting to APIs, I think this is the struggle for you. Now you’re doubly behind – because you’re not only behind on the API spec, but you’re also behind on the MCP part as well.”

12:12 Hashicorp: Terraform Adds New Pre Written Sentinel Policies

HashiCorp has released a collection of pre-written Sentinel policies that automatically enforce AWS Foundational Security Best Practices within Terraform workflows, essentially giving teams a ready-made security guardrail system that prevents common misconfigurations before infrastructure gets deployed. This is huge for organizations struggling to balance developer velocity with security compliance requirements.
These policies cover critical security controls like ensuring S3 buckets aren’t publicly accessible, requiring encryption for EBS volumes and RDS instances, and enforcing proper IAM configurations – basically all those security checks that teams know they should implement but often get overlooked in the rush to ship features. The beauty is that these policies run during the plan phase, catching issues before any resources are actually created.
What’s particularly clever about this release is how it addresses the skills gap problem. Not every organization has security experts who can write complex policy-as-code rules, so having HashiCorp provide battle-tested policies out of the box dramatically lowers the barrier to entry for implementing proper cloud security governance.
Teams can literally copy-paste these policies into their Terraform Cloud or Enterprise setup and immediately start benefiting.
The timing of this release is perfect given the increasing focus on supply chain security and infrastructure compliance, with regulations getting stricter and breach costs rising, having automated policy enforcement that aligns with AWS’s own security recommendations gives organizations a defensible security posture they can point to during audits.
Plus, it shifts security left in the development process without requiring developers to become security experts overnight.

AWS

17:00 Amazon FSx for Lustre launches new storage class with the lowest-cost and only fully elastic Lustre ﬁle storage

Amazon just launched FSx for Lustre Intelligent-Tiering, which is essentially the first fully elastic Lustre file storage in the cloud – meaning it automatically grows and shrinks as you add or delete data, so you’re only paying for what you actually use instead of over provisioning storage like you would on-premises, and at less than $0.005 per GB-month, it’s claiming to be the lowest-cost high-performance file storage option available.
This is a game-changer for HPC workloads like seismic imaging, weather forecasting, and genomics analysis that generate petabytes of data – the service automatically moves your data between three tiers (Frequent Access, Infrequent Access after 30 days, and Archive after 90 days), potentially reducing storage costs by up to 96% compared to other managed Lustre options without any manual intervention.
For AI/ML teams trying to maximize their expensive GPU utilization, this is particularly interesting because it delivers up to 34% better price performance than on-premises HDD file systems, and with Elastic Fabric Adapter and GPU Direct Storage support, you’re getting up to 12x higher per-client throughput compared to previous FSx for Lustre systems.
The tiering is completely transparent to applications – whether your data is in the Frequent Access tier or has been moved to Archive, you can still retrieve it instantly in milliseconds, which means you can migrate existing HDD or mixed HDD/SSD workloads without any application changes.
The service is launching in 15 AWS regions including major hubs in North America, Europe, and Asia Pacific, and the pricing model is consumption-based – you pay for the data and metadata you store, operations when you write or read non-cached data, plus your provisioned throughput capacity, metadata IOPS, and SSD cache size.

18:28 📢 Justin – “I imagine this is truly fantastic for people who have workloads where they’re getting the performance increase out of Lustre. So that’s pretty rad that it’s automatic. It feels a little strange that you can retrieve it at the same speed, but at different costs; I would just force everything to the lower tier, but I imagine you don’t have that option.”

19:45 Enhance AI-assisted development with Amazon ECS, Amazon EKS and AWS Serverless MCP server | AWS News Blog

AWS is bringing AI-powered development assistance to the next level with new Model Context Protocol servers for ECS, EKS, and Serverless, which essentially give your AI coding assistants like Amazon Q Developer real-time, contextual knowledge about your specific AWS environment instead of relying on outdated documentation.
Imagine having an AI that actually knows your current cluster configuration and can help you deploy containers in minutes using natural language commands.
The real game-changer here is that these MCP servers bridge the gap between what LLMs know from their training data and what’s actually happening in your AWS account right now, so when you ask your AI assistant to help deploy an application, it can configure load balancers, networking, auto-scaling, and monitoring with current best practices rather than generic advice from two years ago.
What’s particularly impressive is how these tools handle the entire development lifecycle – in the demo, they showed creating a serverless video analysis application using Amazon Nova models, then migrating it to containers on ECS, and finally deploying a web app on EKS, all through natural language prompts in the command line without writing deployment scripts or YAML files.
The troubleshooting capabilities are where this really shines for DevOps teams – when deployments fail, the MCP servers can automatically fetch logs, identify issues, and even fix configuration problems, turning what used to be hours of debugging into a conversational problem-solving session with your AI assistant.
This fits perfectly into AWS’s broader AI strategy by making their services more accessible to developers who might not be container or Kubernetes experts, essentially democratizing cloud deployment by letting you say “deploy this app to EKS and make it scalable” instead of learning the intricacies of Kubernetes manifests and AWS networking.

21:58 📢 Ryan – “I want it to completely shield me from learning Kubernetes. I’ll never know it now – I’m just gonna ask the robot to do it.”

22:13 AWS Pricing Calculator, now generally available, supports discounts and purchase commitment – AWS

In news we’ve been waiting FOREVER for, AWS finally brings their Pricing Calculator into the console as a generally available feature, and it’s about time – this tool now lets you create cost estimates that actually reflect what you’ll pay after applying your existing discounts and commitments like Savings Plans or Reserved Instances, which is a game-changer for financial planning.
The big innovation here is that you can now import your historical usage data directly into the calculator to create estimates based on real-world patterns, or build estimates from scratch for new workloads – and it gives you three different rate configurations to see costs before discounts, after AWS pricing discounts, and after both discounts AND your purchase commitments are applied.
This is particularly valuable for enterprises doing their annual budget planning or preparing for board presentations because you can finally show realistic cost projections that account for your negotiated Enterprise Discount Programs and existing Reserved Instance coverage, rather than just list prices that nobody actually pays.
The ability to export estimates in both CSV and JSON formats with resource-level detail is a subtle but important feature that’ll make FinOps teams happy – you can now integrate these estimates directly into your internal financial planning tools or build automated workflows around cost modeling.
What’s interesting is that AWS is positioning this as both a workload estimator AND a full AWS bill estimator, which suggests they’re trying to help customers understand not just what a new project will cost, but how it impacts their overall AWS spend when layered onto existing infrastructure.
For organizations considering multi-year commitments or trying to optimize their Savings Plans strategy, this tool becomes essential because you can now model different commitment scenarios and see the actual impact on your bottom line before pulling the trigger on those purchases.
The fact that this is available in all commercial regions (except China) means most AWS customers can start using it immediately – and given that it’s free to use, there’s really no excuse not to be doing more sophisticated cost modeling for your AWS workloads.

23:58 📢 Ryan – “I hope it’s not something terrible where you have to feed it all your discount data and your code usage.”

24:30 Announcing Red Hat Enterprise Linux for AWS

Red Hat is finally bringing RHEL 10 to AWS with deep native integration, marking a significant shift from just running RHEL on EC2 instances to having a purpose-built, AWS-optimized version that includes pre-tuned performance profiles and built-in CloudWatch telemetry right out of the box.
This isn’t just another Linux distro in the AWS Marketplace – they’ve baked in AWS CLI, optimized networking with Elastic Network Adapter support, and created AWS-specific performance profiles, which means enterprises can skip a lot of the manual optimization work they typically do when deploying RHEL workloads.
This comes as organizations are looking to standardize their Linux deployments across hybrid environments, and having RHEL with native AWS integration could simplify migrations for shops that are already heavy Red Hat users on-premises.
One of the more innovative aspects is the inclusion of “image mode using container-native tooling,” which suggests Red Hat is bringing their edge computing and immutable OS concepts from RHEL for Edge into the cloud, potentially making updates and rollbacks much cleaner.
While the announcement mentions flexible procurement options through EC2 Console and AWS Marketplace, the real question will be pricing – traditionally RHEL has commanded a premium, and it’ll be interesting to see if the AWS-optimized version carries additional costs beyond standard RHEL subscriptions.
This is available across all AWS regions including GovCloud, which signals that AWS and Red Hat are serious about capturing government and compliance-heavy workloads that have traditionally relied on RHEL’s security certifications and long-term support guarantees.

24:58 📢 Justin – “Let’s be honest – no one does the manual optimization work.”

26:21 Introducing agentic capabilities for Amazon Q Developer Chat in the AWS Management Console and chat applications – AWS

Amazon Q Developer just got a major upgrade with new agentic capabilities that essentially turn it into your personal AWS troubleshooting detective – it can now break down complex problems into steps, consult multiple AWS services, and piece together answers from across your entire infrastructure without you having to manually dig through logs and configurations.
This is a game-changer for DevOps teams because instead of asking simple questions like “What’s an S3 bucket?”, you can now ask something like “Why is my payment processing Lambda throwing 500 errors?” and Q will automatically check CloudWatch logs, examine IAM permissions, investigate connected services like API Gateway and DynamoDB, and even look at recent changes to figure out what’s going wrong.
The multi-step reasoning capability is the real innovation here – Amazon Q now shows its work as it investigates your problem, asking for clarification when needed and explaining its reasoning process, which not only helps solve the immediate issue but also helps engineers understand their systems better and learn troubleshooting patterns.
What’s particularly impressive is that this works across 200+ AWS services through their APIs, meaning Q can pull together information from virtually any part of your AWS infrastructure to answer questions, making it incredibly powerful for organizations with complex, multi-service architectures.
The integration with Microsoft Teams and Slack is brilliant for enterprise teams because it brings this troubleshooting power directly into where engineers are already working and collaborating, eliminating the context switching between chat apps and the AWS console during incident response.

27:35 📢 Ryan – “And, if you add in instructions for your agent to respond in a snarky and sort of condescending way, you really have automated me out of a job.”

**Show note editor note: Welcome to my world, Ryan.**

28:59 AWS cooks up Euro cloud outfit to soothe sovereignty nerves • The Register

AWS is launching a European Sovereign Cloud by the end of 2025, creating a legally independent entity based in Germany with EU-only staff, infrastructure, and leadership – essentially building a firewall between European customer data and potential US government reach under laws like the Cloud Act.
This move directly responds to growing European anxiety about data sovereignty, especially with the Trump 2.0 administration’s aggressive foreign policy stance, and follows similar announcements from Microsoft and Google Cloud who are also scrambling to address European concerns about US tech dependence.
AWS is creating a completely autonomous infrastructure with its own Route 53 DNS service using only European top-level domains, a dedicated European Certificate Authority, and the ability to operate indefinitely even if completely disconnected from AWS’s global infrastructure.
What’s really interesting is the governance structure – they’re establishing an independent advisory board with four EU citizens, including at least one person not affiliated with Amazon, who are legally obligated to act in the best interest of the European Sovereign Cloud rather than AWS corporate.
The timing couldn’t be more critical as European politicians are increasingly vocal about reducing dependence on US tech, especially after Microsoft reportedly blocked ICC prosecutor access to email in compliance with US sanctions, which really spooked EU officials about their vulnerability.
For AWS customers in Europe, this means they’ll finally have an option that addresses regulatory compliance concerns while maintaining AWS’s service quality, though it remains to be seen how pricing will compare to standard AWS regions and whether the Cloud Act truly has no reach here.
The bigger picture shows how geopolitical tensions are literally reshaping cloud infrastructure – we’re moving from a globally interconnected cloud to regional sovereign clouds, which could fundamentally change how multinational companies architect their systems.
While AWS promises “no critical dependencies on non-EU infrastructure,” the parent company remains American-owned, so there’s still debate about whether this truly protects against Cloud Act requirements – it’s a legal gray area that will likely need court testing to resolve.

GCP

37:07 Get committed use discounts for RHEL | Google Cloud Blog

Google Cloud is bringing committed use discounts to Red Hat Enterprise Linux, offering up to 20% savings for customers running predictable RHEL workloads on Compute Engine – this is a big deal for enterprises who’ve been paying full on-demand prices for their RHEL subscriptions in the cloud.
The way these RHEL CUDs work is pretty straightforward – you commit to a one-year term for a specific number of RHEL subscriptions in a particular region and project, and in exchange you get that 20% discount off the standard on-demand pricing, which really adds up when you’re running enterprise workloads 24/7.
What’s interesting here is Google’s positioning compared to AWS and Azure – while both competitors offer various discount mechanisms for compute resources, Google is specifically targeting the RHEL subscription costs themselves, which is a significant expense for many enterprises running traditional workloads in the cloud.
The sweet spot for these discounts kicks in when you’re utilizing RHEL instances about 80% or more of the time over the year, which honestly describes most production enterprise workloads – Google’s research shows the majority of RHEL VMs run 24/7, so this pricing model actually aligns well with real-world usage patterns.
One thing to watch out for is that these commitments are completely inflexible – once you purchase them, you can’t edit or cancel, and you’re on the hook for the monthly fees regardless of actual usage, so you really need to nail your capacity planning before pulling the trigger.

38:22 📢 Justin – “So if I’m committing to the license, but I can move it between any type of instance class, I actually am okay with that – and if that’s something we’re going to see for other operating systems in the future, where maybe Windows has a discount if I’m willing to commit and things like that, this could be an interesting move by Google in general.”

39:11 Launching our new state-of-the-art Vertex AI Ranking API | Google Cloud

Blog

Google just launched their Vertex AI Ranking API, which is essentially a precision filter that sits on top of your existing search or RAG systems to dramatically improve result relevance – they’re claiming it can help businesses avoid that scary 82% customer loss rate when users can’t find what they need quickly, and it addresses the fact that up to 70% of retrieved passages in traditional search often don’t contain the actual answer you’re looking for.
Google is positioning this as a drop-in enhancement rather than a rip-and-replace solution – you can keep your existing search infrastructure and just add this API as a reranking layer, which means companies can get state-of-the-art semantic search capabilities in minutes instead of going through months of migration, and they’re offering two models: a default one for accuracy and a fast one for latency-critical applications.
The performance benchmarks are pretty impressive – Google’s claiming their semantic-ranker-default-004 model leads the industry in accuracy on the BEIR dataset compared to other standalone reranking services, and they’re backing this up by publishing their evaluation scripts on GitHub for reproducibility, plus they say it’s at least 2x faster than competitive reranking APIs at any scale.
This feels like Google’s answer to the reranking capabilities we’ve seen from players like Cohere and their Rerank API, but Google’s bringing some unique advantages with their 200k token context window for long documents and native integrations across their ecosystem – you can use it directly in AlloyDB with a simple SQL function, integrate it with RAG Engine, or even use it with Elasticsearch, which shows they’re thinking beyond just their own stack.

40:13 📢 Justin – “Basically this is their answer to Cohere and Elasticsearch.”

41:02 Project Shield blocked a massive recent DDoS attack. Here’s how. | Google

Cloud Blog

Google’s Project Shield just proved its worth by defending KrebsOnSecurity against a staggering 6.3 terabits per second DDoS attack – that’s roughly 63,000 times faster than average US broadband and one of the largest attacks ever recorded, showing that even free services can provide enterprise-grade protection when backed by Google’s infrastructure.
Project Shield is completely free for eligible organizations like news publishers, government election sites, and human rights defenders. It’s essentially Google weaponizing their massive global infrastructure for good, letting at-risk organizations piggyback on the same defenses that protect Google’s own services.
The technical stack behind Project Shield is impressive – it combines Cloud Load Balancing, Cloud CDN, and Cloud Armor to create a multi-layered defense that blocked this attack instantly without any manual intervention, filtering 585 million packets per second at the network edge before they could even reach the application layer.
This is a great example of how cloud providers are differentiating beyond just compute and storage – while AWS has Shield and Azure has DDoS Protection, Google’s approach of offering this as a free service to vulnerable organizations shows they’re thinking about cloud infrastructure as a force for protecting free speech and democracy online.
For regular GCP customers, this attack validates Google’s DDoS protection capabilities – the same technologies protecting KrebsOnSecurity through Project Shield are available to any Google Cloud customer, with features like Adaptive Protection using machine learning to dynamically adjust rate limits in real-time.
The simplicity of implementation is noteworthy – organizations just change their DNS settings to point to Project Shield’s IP addresses and configure their hosting server info, making it easy to enable or disable protection with a simple DNS switch, which is crucial for organizations that might not have dedicated security teams.
This incident highlights the escalating DDoS threat landscape – attacks have grown from the 620 Gbps Mirai botnet attack in 2016 to this 6.3 Tbps monster in 2024, a 10x increase that shows why organizations need to think seriously about DDoS protection as attacks become more sophisticated and volumetric.

44:07 Cloud Run GPUs are now generally available | Google Cloud Blog

Google just made GPU computing truly serverless with Cloud Run GPUs going GA, and the killer feature here is that you only pay for what you use down to the second.
Imagine spinning up an NVIDIA L4 GPU for AI inference, having it automatically scale to zero when idle, and only paying for the actual seconds of compute time, which is a game-changer compared to keeping GPU instances running 24/7 on traditional cloud infrastructure.
The cold start performance is genuinely impressive – they’re showing sub-5 second startup times to get a GPU instance with drivers installed and ready to go, and in their demo they achieved time-to-first-token of about 19 seconds for a Gemma 3 4B model including everything from cold start to model loading to inference, which makes this viable for real-time AI applications that need to scale dynamically.
What’s really clever is how they’ve removed the traditional barriers to GPU access – there’s no quota request required for L4 GPUs anymore, you literally just add –gpu 1 to your command line or check a box in the console, making this as accessible as regular Cloud Run deployments, which democratizes GPU computing for developers who previously couldn’t justify the complexity or cost.
The multi-regional deployment story is strong with GPUs available in five regions including US, Europe, and Asia, and you can deploy across multiple regions with a single command for global low-latency inference – they showed deploying Ollama across three continents in one go, which would be a nightmare to set up with traditional GPU infrastructure.
At Next ’25 they demonstrated scaling from 0 to 100 GPU instances in just 4 minutes running Stable Diffusion, which really showcases the elasticity – this kind of burst scaling would cost a fortune with reserved GPU instances but makes perfect sense with per-second billing for handling viral AI applications or unpredictable workloads.
Early customers like Wayfair are reporting 85% cost reductions by combining L4 GPU performance with Cloud Run’s auto-scaling, while companies like Midjourney are using it to process millions of images – the combination of reasonable GPU pricing with true scale-to-zero capabilities seems to be hitting a sweet spot for AI workloads that don’t need constant GPU availability.

45:49 📢 Ryan – “Anything that scales down to zero is ok in my book.”

46:50 GKE Volume Populator streamlines AI/Ml data transfers | Google Cloud Blog

Google just released GKE Volume Populator, and this is actually a pretty clever solution to a real pain point in AI/ML workflows – basically, if you’re storing your training data or model weights in Cloud Storage but need to move them to faster storage like Hyperdisk ML for better performance, you previously had to build custom scripts and workflows to orchestrate all those data transfers, but now GKE handles it automatically through the standard Kubernetes PersistentVolumeClaim API.
What’s really interesting here is that Google is leveraging the Kubernetes Volume Populator feature that went GA in Kubernetes 1.33, but they’re adding their own special sauce with native Cloud Storage integration and fine-grained namespace-level access controls – this means you can have different teams or projects with their own isolated access to specific Cloud Storage buckets without having to manage complex IAM policies across your entire cluster.
The timing on this is perfect for AI/ML workloads because one of the biggest challenges teams face is efficiently loading massive model weights – Abridge AI reported they saw up to 76% faster model loading speeds and reduced pod initialization times by using Hyperdisk ML with this feature, which is huge when you’re dealing with large language models that can be hundreds of gigabytes.
From a cost optimization perspective, this is actually quite smart because your expensive GPU and TPU resources aren’t sitting idle waiting for data to transfer – the pods are blocked from scheduling until the data transfer completes, so you can use those accelerators for other workloads in the meantime, which could save significant money on compute costs.

Azure

49:44 New AI innovations that are redefining the future for software companies | Microsoft Azure Blog

Microsoft is making a big push to turn every software developer into an AI developer with Azure AI Foundry, their new unified platform that brings together models, tools, and services for building AI apps and agents at scale.
What’s really interesting here is they’re positioning this as the shift from AI assistants that wait for instructions, to autonomous agents that can actually be workplace teammates.
The Azure AI Foundry Agent Service is now generally available, and it lets developers orchestrate multi-agent workflows where AI agents can work together to solve complex problems.
This is Microsoft’s answer to the growing demand for agentic AI that can automate decision-making and complex business processes, which AWS and GCP haven’t quite matched yet in terms of a unified platform approach.
Microsoft is seriously expanding their model catalog with some heavy hitters – they’ve got Grok 3 from xAI available today, Sora from OpenAI coming soon in preview, and over 10,000 open-source models from Hugging Face, all with full fine-tuning support, which gives developers way more choice than what you typically see in competing cloud platforms.
The real game-changer here might be what they’re calling “Agentic DevOps” – GitHub Copilot is evolving from just helping you write code to actually doing code reviews, writing tests, fixing bugs, and even handling app modernization tasks that used to take months but can now be done in hours, which could fundamentally change how software teams operate.
They’ve introduced a Site Reliability Engineering agent that monitors production systems 24/7 and can autonomously troubleshoot issues as they arise across Kubernetes, App Service, serverless, and databases – essentially giving every developer access to the same expertise that powers Azure at global scale, which is a pretty compelling value proposition for teams that can’t afford dedicated SRE staff.
For startups and ISVs, Microsoft is sweetening the deal with flexible Azure credits through Microsoft for Startups, and they’re reporting that AI and machine learning offer revenue in their marketplace grew 100% last year – companies like Neo4j have seen 6X revenue growth in 18 months through the marketplace, which shows there’s real money to be made here.

53:13 📢 Ryan – “The way I hope AI rolls out is that it does stuff like this, but then it still requires supervision – the SRE engineers, the DevOps engineers that you already have – are now freed up to do more impactful things. So maybe it’s refining prompts for these agents, giving them those constraints by, you know, thinking about how they basically operate and all those like things that aren’t written down as intangibles and really getting that executed into prompts.”

54:05 Announcing dotnet run app.cs – A simpler way to start with C# and .NET 10 – .NET Blog

Microsoft just made getting started with C# dramatically easier with .NET 10 Preview 4 by introducing the ability to run a single C# file directly using `dotnet run app.cs`, eliminating the need for project files or complex folder structures – essentially bringing Python-like simplicity to C# development while maintaining the full power of the .NET ecosystem.
This new file-based approach introduces clever directives that let you reference NuGet packages, specify SDKs, and set MSBuild properties right within your C# file using simple syntax like `#:package [email protected]`, making it perfect for quick scripts, learning scenarios, or testing code snippets without the overhead of creating a full project structure.
What’s particularly brilliant about this implementation is that it’s not a separate dialect or limited version of C# – you’re writing the exact same code with the same compiler, and when your script grows beyond a simple file, you can seamlessly convert it to a full project using `dotnet project convert app.cs`, which automatically scaffolds the proper project structure and translates all your directives.
The feature even supports Unix-style shebang lines, allowing you to create executable C# scripts that run directly from the command line on Linux and macOS, positioning C# as a viable alternative to Python or Bash for automation scripts and CLI utilities – imagine writing your cloud automation scripts in strongly-typed C# instead of wrestling with shell scripts.
This addresses a long-standing pain point where developers had to rely on third-party tools like dotnet-script or CS-Script to achieve similar functionality, but now it’s built right into the core .NET CLI, requiring no additional installations or configurations beyond having .NET 10 Preview 4 installed.
The timing is perfect as more cloud platforms and services provide .NET SDKs, allowing developers to quickly prototype API integrations, test cloud service connections, or build automation scripts without the ceremony of setting up a full project – you could literally test an Azure Storage connection in a single file and run it immediately.
Visual Studio Code support is already available through the pre-release version of the C# extension, with IntelliSense for the new directives, and Microsoft is exploring multi-file support and performance improvements for future previews, suggesting this feature will only get more powerful as .NET 10 approaches release.
This democratizes C# development in a way that makes it accessible to beginners while still being useful for experienced developers who want to quickly test ideas or build utilities, effectively positioning C# as both a powerful enterprise language and a convenient scripting language in one package.

56:20 📢 Ryan – “I’m very mixed on this, because it’s like, .NET development; the development patterns I see are already so detached from the running environment, so I feel like this is a further abstraction on top of all the leveraged libraries and frameworks that are part of .NET.”

57:45 Announcing General Availability: Ephemeral OS Disk support for v6 Azure

VMs | Microsoft Community Hub

Microsoft just made ephemeral OS disks generally available for their latest v6 VM series, and this is a big deal for anyone running stateless workloads because you’re getting up to 10X better OS disk performance by using local NVMe storage instead of remote Azure Storage – essentially eliminating network latency for your operating system disk operations.
The beauty of ephemeral disks is that they’re perfect for scale-out scenarios like containerized microservices, batch processing jobs, or CI/CD build agents where you don’t need persistent OS state – you can reimage a VM in seconds and get back to a clean state, which is fantastic for auto-scaling scenarios where you’re constantly spinning up and tearing down instances.
This puts Azure in a really competitive position against AWS’s instance store volumes and GCP’s local SSDs, though Microsoft’s implementation is particularly interesting because it specifically targets the OS disk placement on NVMe storage while still allowing you to use regular managed disks for your data volumes if needed.
The v6 VM series that support this feature – like the Dadsv6 and Ddsv6 families – are already Azure’s latest generation with AMD EPYC processors, so you’re combining cutting-edge CPU performance with blazing-fast local storage, making these ideal for performance-sensitive workloads that can tolerate the ephemeral nature of the OS disk.
From a cost perspective, ephemeral OS disks are essentially free since you’re not paying for managed disk storage – you’re just using the local storage that comes with your VM, which could lead to significant savings for large-scale deployments where you might have hundreds or thousands of VMs that don’t need persistent OS disks.
One thing to keep in mind is that these disks are truly ephemeral – if your VM gets deallocated or moved to different hardware for maintenance, you lose everything on that OS disk, so this isn’t for everyone – you really need to architect your applications to be stateless and store any important data elsewhere.
The deployment is surprisingly straightforward with just a few extra parameters in your ARM templates or CLI commands, and the fact that it works with marketplace images, custom images, and Azure Compute Gallery images means you can pretty much use it with any existing VM deployment pipeline you already have.
For DevOps teams and platform engineers, this feature is particularly exciting because it enables faster VM boot times, quicker scale-out operations, and better performance for temporary workloads like build agents or test environments where persistence is actually a liability rather than an asset.

1:03:22 Generally Available: Support for AWS Bedrock API in AI Gateway Capabilities in Azure API Management

Announcing expanded support for AWS Bedrock model endpoints across all Generative AI policies in Azure API Management’s AI Gateway.
This release enables you to apply advanced management and optimization features such as Token Limit Policy, Token Metric Policy, and Semantic Caching Policy to AWS Bedrock models, empowering you to seamlessly manage and optimize your multi-cloud AI workloads.
Key benefits include:
- Apply token limiting, tracking, and logging to AWS Bedrock APIs for better control
- Enable semantic caching to enhance performance and response times for Bedrock models.
- Achieve unified observability and governance across multi-cloud AI endpoints.

1:04:06 📢 Justin – “Azure, we thank you for making AWS more cost effective and responsive with your capabilities and features.”

Other Clouds

1:07:20 Introducing ATL1: DigitalOcean’s new AI-optimized data center in Atlanta

DigitalOcean is making a serious play for the AI infrastructure market with their new ATL1 data center in Atlanta, which is their largest facility to date with 9 megawatts of total power capacity across two data halls.
It’s specifically designed for high-density GPU deployments that AI and machine learning workloads demand.
This marks a significant shift in DigitalOcean’s strategy from being primarily known as a developer-friendly cloud provider for smaller workloads to now competing in the GPU infrastructure space, deploying over 300 GPUs including top-tier NVIDIA H200 and AMD Instinct MI300X clusters in just the first data hall.
The timing of this expansion is particularly interesting as we’re seeing massive demand for GPU resources driven by the AI boom, and DigitalOcean is positioning themselves as a more accessible alternative to the hyperscalers for startups and growing tech companies that need GPU compute but don’t want the complexity or cost structure of AWS, Azure, or GCP.
By choosing Atlanta as their location and partnering with Flexential for the facility, DigitalOcean is strategically serving the Southern U.S. market where there’s been significant tech growth, offering lower latency for regional customers while maintaining their promise of simplicity and cost-effectiveness that made them popular with developers in the first place.
The integration of GPU infrastructure alongside their existing services like Droplets, Kubernetes, and managed databases creates an interesting one-stop-shop proposition for companies building AI applications, allowing them to keep their entire stack within DigitalOcean’s ecosystem rather than mixing providers.
With a second data hall planned for 2025 with even more GPU capacity, this represents a multi-year commitment to AI infrastructure, suggesting DigitalOcean sees this as core to their future rather than just riding the current AI hype wave.
This expansion brings DigitalOcean to 16 data centers across 10 global regions, which while still small compared to the hyperscalers, shows they’re serious about geographic distribution and reducing latency for their growing customer base.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod

Kubernetes

307: The AI Assistant That Finally Understands Your Kubernetes Cluster (We are Doomed)

Titles we almost went with this week:

General News

Cloud Tools

AWS

GCP

Azure

Other Clouds

Closing

Leave a Reply Cancel reply

Blog

Archives

Categories

Blog

Archives

Categories