302: It’s So Hot, Even Windows Is Hotpatching

Welcome to episode 302 of The Cloud Pod – where the forecast is always cloudy! This week Justin and Ryan are on hand to bring you all the latest in Cloud (and AI news.) We’ve got hotpatching, Project Greenland, and a rollback of GPT-4.o, which sort of makes us sad – and our egos are definitely less stroked. Plus Saas, containers, and outposts – all of this and more. Thanks for joining us in the cloud!

Titles we almost went with this week:

🌥️The Cloud Pod was never accused of being sycophantic
📮2nd Gen outposts!?! I didn’t even know anyone was using Gen 1
🙅AWS Outposts 2nd Gen… not with AI (GASP)
👷If you’re doing SaaS wrong, Google & AWS have your back this week with new Features
🔥Patching, so hot right now
🔷Larger container sizes for Azure…. You don’t say
🟢AWS Green reporting detects hotspots… surprisingly close to Maryland…..
🪈Visual pipeline for Opensearch… I want to like this… but I just can’t

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.

General News

01:37 Sharing new DORA research for gen AI in software development

The DORA team at Google has released a new report, “Impact of Generative AI In Software Development.” The report is based on data and developer interviews, and the report aims to move beyond hype to offer a proper perspective on AI’s impact on individuals, teams and organizations.
Click on the link in our show notes to access the full report. However, Google has highlighted a few key points in the blog post.
AI is Real – A staggering 89% of organizations are prioritizing the integration of AI into their applications, and 76% of technologists are already using AI in some part of their daily work.
Productivity gains confirmed: Developers using Gen AI report significant increases in flow, productivity, and job satisfaction. For instance, a 25% increase in AI adoption is associated with a 2.1% increase in individual productivity.
Organization benefits are tangible: Beyond individual gains, Dora found strong correlations between AI adoption and improvements in crucial organizational metrics. A 25% increase in AI adoption is associated with increases in document quality, code quality, code review speeds and approval speeds.
If you are looking to utilize AI in your development organization, they provide five practical approaches for both leaders and practitioners.
- Have transparent communications
- Empower developers with learning and experimentation
- Establish clear policies
- Rethink performance metrics
- Embrace fast feedback loops

045:06 📢 Ryan – “Those are really good approaches, but really difficult to implement in practice. You know, in my day job, watching the company struggle to get a handle on AI from all the different angles you need to, from data protection, legal liability – just operationally – it’s very hard. So I think having a mature program where you’re rolling that out with intent and being very specific with your AI tasks I think will go a long way with a lot of companies.”

AI Is Going Great – Or How ML Makes Its Money

08:55 Introducing our latest image generation model in the API

You can now generate images via the ChatGPT API via gpt-image-1, enabling developers and businesses to easily integrate high-quality, professional-grade image generation directly in their tools and platforms.
The GPT-image-1 API is priced per token, with separate pricing for text and image tokens.
Text input is $5 per 1M tokens, Image input tokens are $10 per 1 million tokens, and Image output or generated images is $40 per 1M token.

09:47 📢 Ryan – “It’s still tricky pricing these things out…forecasting these things in a way that you can coordinate as a business is really challenging.”

12:03 OpenAI rolls back update that made ChatGPT a sycophantic mess

ChatGPT is becoming less of a suck up apparently.
ChatGPT users have grown frustrated with the overly positive and complementary output generated by the model.
This rollback will occur on the GPT-4o model, which is the default model you get access to via ChatGPT.
OpenAI says that as you interact with the chatbot, OpenAI gathers data on the responses people like more, then the engineers revise the production model using a technique called reinforcement learning from human feedback.
However, that’s where things went off the rails – turning ChatGPT into the world’s biggest suck up.
Users could present ChatGPT with completely terrible ideas or misguided claims, and it might respond “Wow, you’re a genius” or “This is on a whole different level.” Which, to be fair, “on a whole different level” doesn’t necessarily mean GOOD.
Designing the model’s tone is important to make them something you want to chat with, and this sycophantic response process results in a toxic feedback loop.
Claude is a little more realistic, but honestly – it’s sort of a let down.

Cloud Tools

14:30 Targeted by 20.5 million DDoS attacks, up 358% year-over-year: Cloudflare’s 2025 Q1 DDoS Threat Report

Cloudflare has released their Q1 DDOS threat report, and it isn’t great if you’re trying to protect internet resources.
They even touched on a late breaking DDOS attack observed in April 2025 that are some of the largest publicly disclosed.
Cloudflare says they blocked an intense packet rate attack, peaking 4.8 billion packets per second, 52% higher than their previous benchmark, and also defended against a 6.5 tbps flood, matching the highest bandwidth reports ever reported.
In the first quarter though:
- Blocked 20.5 Million DDOS attacks, representing 358% YoY increase and 198% quarter-over-quarter increase.
- One third of the attacks, 6.6 million, targeted the CloudFlare network infrastructure directly, as part of an 18-day multi-vector attack campaign.
- Furthermore, in the first quarter of 2025, Cloudflare blocked approximately 700 hyper-volumetric DDoS attacks that exceeded 1 Tbps or 1 BPSS or about eight attacks per day

15:57 📢 Justin – “I was thinking about this earlier, actually. Typically DDoS attacks are compromised computers that are then used in these massive attacks, and they’re all controlled by botnets and this has been going on for over a decade now – and it just keeps getting worse… I mean, I’m a computer guy, so all my shit’s locked down and secure, and I have firewalls, but do normal people just go raw dogging on the internet and their computers get hacked and compromised all the time?”

AWS

20:09 In the works – New Availability Zone in Maryland for US East (Northern Virginia) Region

This might explain the recent update to the API to include location in the response, with Amazon announcing a new Availability Zone for US-East in Maryland vs Virginia.
Today the AWS US-East region is 6 Availability Zones, now with this new Maryland zone opening in 2026, they will have 7 AZ’s connected by high-bandwidth, low-latency network connections over dedicated, fully redundant fiber.
With this new AZ joining an ever growing list of new regions including New Zealand, KSA, Taiwan and the AWS European Sovereign Cloud AWS is investing heavily in Datacenter capacity.

25:40 Enhance real-time applications with AWS AppSync Events data source integrations

AWS AppSync events now support data source integrations for channel namespaces, enabling developers to create more sophisticated real-time applications.
With the new capabilities, you can associate AWS Lambda functions, Amazon DynamoDB tables, Amazon Aurora databases and other data sources with channel namespace handlers.
Leveraging AppSync events you can build rich, real-time applications with features like data validation, event transformation and persistent storage of events.
You can integrate these event flow workflows by transforming and filtering events using Lambda functions or save batches of events to DynamoDB using the new AppSync_JS batch utilities.

26:45 📢 Ryan – “I kind of like this thing because it’s a little bit of putting a Band-Aid on your around your managed application, but sure is powerful when you can use it.”

29:49 Amazon EKS introduces node monitoring and auto repair capabilities

EKS now provides node monitoring and auto repair capabilities.
This new feature enables automatic detection and remediation of node-level issues in EKS clusters, improving your availability and reliability of K8 apps.
There are two components responsible for detecting node failures:
- The Node Monitoring Agent– that detects a wide range of issues
  - It is bundled into the container image that runs as a daemonSet in all worker nodes.
  - The agent communicates any issue it finds by updating the status of the K8 node object in the cluster and by emitting K8 events.
  - Detects GPU Failures related to Hardware Issues, Driver Issues, Memory Problems or unexpected performance drops
  - Kubelet Health
  - ContainerD issues
  - Networking CNI problems, missing route table entries and packet drop issues
  - Disk Space and I/O errors
  - CPU throttling, memory pressure and overall system load
  - Kernel panics
- Node Repair System: This is a backend component that collects health information and repairs worker nodes.
  - System either replaces or reboots nodes in response to the conditions within, at most 30 minutes
  - If a GPU failure is detected it will replace or reboot that node within, at most, 10 minutes.
  - Repair actions are logged and can be audited
  - Repair system respects user-specific disruption controls, such as Pod Disruption budgets. If zonal shift is activated in your EKS cluster, then node auto repair actions are halted

32:29 📢 Ryan – “I do like that it’s built in to the existing agent, you know, in terms of those health checks. And hopefully that the thresholds and the tuning of this is, you know, tunable where you can set it. Or it’s just completely like hands off running and it just works like magic. That would also be acceptable.”

33:42 Prompt Optimization in Amazon Bedrock now generally available

Prompt Optimization in Bedrock is now GA.
Prompt engineering is the process of designing prompts to guide FMs to generate relevant responses.
These prompts must be customized for each FM according to its best practices and guidelines, which is a time-consuming process that delays application development.
Prompt optimization can now automatically rewrite prompts for better performance and more concise responses on Anthropic, Llama, Nova, Deepseek, Mistral, and Titan Models.
You can compare optimized prompts against original versions without deployment and save them in Amazon Bedrock Prompt Management for prompt lifecycle management.
Prompt Optimization will take $0.030 per 1000 tokens. Want more info on pricing? You can find that here.

34:22 📢 Justin – “This is one of those things you create the prompts, you optimize them once for each of the models, and they don’t really change all that often. That’s the guidelines that change.”

36:20 AWS announces upgrades to Amazon Q Business integrations for M365 Word and Outlook

AWS announced upgrades to its Amazon Q business integrations for M365 Word and Outlook to enhance their utility when performing document and email-centered tasks.
The upgrade includes company knowledge access, image file attachment support, and expanded prompt context windows.
With company knowledge support, users can now ask questions about their company’s indexed data directly through the Word and Outlook integrations, allowing them to instantly find relevant information when drafting their documents and emails without needing to switch context.
We are *shocked* that you’re not locked into Microsoft’s AI capabilities.

38:42 Announcing Serverless Reservations, a new discounted pricing option for Amazon Redshift Serverless

Amazon Redshift now offers Serverless Reservations for Redshift Serverless, a new discounted pricing option that helps you save up to 24% and gain greater cost predictability for your analytics workload.
With Serverless Reservations, you can commit to a specific number of Redshift Processing Units (RPUs) for a one-year term, and choose between two payment options: a no-upfront option that provides a 20% discount for on-demand rates, or an all-upfront option that provides a 24% discount.

39:06 📢 Justin – “Save all the monies!”

39:37 AWS Transfer Family introduces Terraform module for deploying SFTP server endpoints

AWS Transfer Family introduces a Terraform module for deploying managed file transfer (MFT) server endpoints backed by Amazon S3.
This enables you to leverage IaC to automate and streamline centralized provisioning of MFT servers and users at scale.
AWS Transfer Family provides a fully-managed file transfer for SFTP, AS2, FTPS, FTP and Web Browser-based interfaces directly into and out of AWS storage services.

39:57 📢 Justin – “If you’re using FTP you should stop immediately.”

42:10 Introducing a guided visual pipeline builder for Amazon OpenSearch Ingestion

Amazon is releasing a new visual user interface for creating and editing Amazon OpenSearch Ingestion pipelines on the AWS console
This new capability gives you a guided visual workflow, automatic permission creations, and enhanced real-time validations to streamline the pipeline development process.
The new workflow simplifies pipeline development, reducing setup time and minimizing errors, making it easier to ingest, transform, and route data to Amazon OpenSearch Service.

43:02 📢 Justin – “All of Ryan’s grey hair in his goatee and the reason why I have no color in my goatee is because of ElasticSearch.”

44:35 Announcing second-generation AWS Outposts racks with breakthrough performance and scalability on-premises

Amazon is announcing the second generation of AWS Outpost Racks, which marks the latest innovation from AWS for edge computing.
The new generation includes support for the latest x86 powered EC2 instances, simplified network scaling and configurations, and accelerated networking instances designed specifically for ultra-low latency and high-throughput workloads.
The enhancements deliver greater performance for a broad range of on-premise workloads, as well as delivering greater performance for a broad range of on-premises workloads, such as core trading systems of financial services and telecom 5G core networks.
Multiple customers have taken advantage of Outposts, including AthenaHealth, FanDuel, Riot Games, etc.
The second generation outpost rack can provide low latency, local data processing, or data residency needs, such as game servers for multiplayer online games, customer transaction data, medical record, industrial and manufacturing control systems, telecom BSS, and edge inference of a variety of ML models.
Justin is impressed that they didn’t slather AI all over this. Missed opportunity!
You can get the 7th generation of X86 processors on outpost racks (C7I, M7I, and R7I optimized instances)
They note that Support for more latest generation EC2 and GPU enabled instances is coming soon (which we guess explains the lack of AI.)

45:40📢 Justin – “You know what this announcement doesn’t say a thousand times? No AI. Not a single mention of it. They did mention inference for a variety of ML models, and they do specifically call out CPU based ML models, and that’s because none of these instances support GPUs yet…but they do promise that they are coming soon – both the latest generation EC2 and GPU enabled instances.”

48:16 Reduce your operational overhead today with Amazon CloudFront SaaS Manager

Amazon is announcing the GA of Amazon CloudFront SaaS Manager, a new feature that helps SaaS providers, web development platform providers, and companies with multiple brands and websites to efficiently manage delivery across multiple domains.
Cloudfront SaaS manager addresses critical challenge organizations face: managing tenant websites at scale, each requiring TLS certificates, Distributed denial of service (DDoS) protection and performance monitoring
With Cloudfront SaaS manager, web development platform providers and enterprise SaaS providers who manage a large number of domains will use simple API’s and reusable configurations that use CloudFront edge locations worldwide, AWS WAF, and AWS Certificate Manager.
Multi-Tenant SaaS deployments is a strategy where a single cloudfront distribution serves content for multiple distinct tenants (users or organizations.) CloudFront SaaS Manager utilizes a new template-based distribution model, known as a multi-tenant distribution, to serve content across multiple domains while sharing configuration and infrastructure. However, if supporting single websites or applications, a standard distribution would be better or recommended.
A template distribution defines the base configuration that will be used across domains ,such as the origin configurations, cache behaviors, and security settings.
Each template distribution has a distribution tenant to represent domain-specific origin paths or origins domain names, including web access control list overrides and custom TLS certificates.

50:05 📢 Justin – “So now you have a very complicated set of CloudFront configurations because every one of them has to have its own CloudFront configuration – because you did custom URL vanity URLs. But now you can use this to help you make that less toil, which is appreciated, but it’s also a *terrible* model. And I don’t recommend it for a SaaS application if you can help it.”

52:22 Amazon Route 53 Profiles now supports VPC endpoints

AWS announced support for VPC endpoints in Route 53 profiles, allowing you to create, manage, and share private hosted zones for interface VPC endpoints across multiple VPCs and AWS accounts within your organization.
This enhancement for Amazon Route 53 profiles simplifies the management of VPC endpoints by streamlining the process of creating and associating interface VPC endpoint managed private zones (PHZs) with VPCs and AWS accounts, without requiring manual association.

GCP

53:56 Introducing SaaS Runtime

We missed this announcement at Google Next, but they unveiled the preview of SaaS Runtime, a fully managed Google Cloud service management platform designed to simplify and automate the complexities of infrastructure operations, enabling SaaS providers to focus on their core business.
Based on their internal platform for serving millions of users across multiple tenants, SaaS runtime leverages their extensive experience managing services at Google Scale.
SaaS runtime helps you model your SaaS environment, accelerate deployments and streamline operations with a rich set of tools to manage at scale, with automation at its core.
SaaS Runtime vision includes:
- Launch quickly, customize and iterate: SaaS Runtime empowers you with pre-built customizable blueprints, allowing for rapid iteration and deployment. You can easily integrate AI architecture blueprints into existing systems through simple data model abstractions.
- Automate operations, observe and scale tenants: As a fully managed service, SaaS runtime allows automation at scale. Starting from your current continuous integration/continuous delivery (CI/CD) pipeline, onboard to SaaS runtime and then scale it to simplify service management, tenant observability and operations across both cloud and edge environments.
- Integrate, optimize, and expand rapidly: SaaS Runtime is integrated into Google Cloud, allowing developers to design applications using the new Application Design Center.
- These applications can then be deployed via the Google Cloud Marketplace. Once deployed across tenants, their performance can be monitored with Cloud Observability and the App Hub.

55:33 📢 Justin – “This is for a SaaS company that literally deploys an instance for each customer. It’s an expensive pattern number one, but sometimes customers like this, because it makes it very easy to say, well, these are your direct costs, and so you should pay for them. This is a model that Jira uses. This is the model that ServiceNow uses – where you’re getting a dedicated app server in addition to a dedicated database server. And so yeah – this is to manage all of that at scale… But this really isn’t how you should do it.”

1:03:49 Google Cloud Database and LangChain integrations support Go, Java, and JavaScript

Three new language support integrations for LangChain are available for Go, Java and Javascript
Each package supports Vector stores for semantic search of databases, Chat message history to enable chains to recall previous conversations and document loader for loading documents from your enterprise data.

Azure

1:04:20 Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry

We get it. You’re excited.
Microsoft is thrilled to announce the launch of GPT-image-1, the latest and most advanced image generation model.
Our API is available now to all gated customers: limited access model application, and playground is coming early next week.
This groundbreaking model sets a new standard in generating high-quality images, solving complex prompts and offering zero-shot capabilities in various scenarios.
- Granular Instruction Response
- Text Rendering
- Image Input Acceptance
GPT image 1 supports multiple modalities:
- Text-to-image
- Image-to-image
- Text transformation
- Inpainting

1:06:16 Tired of all the restarts? Get hotpatching for Windows Server

Hotpatching for Windows Server 2025, made available in preview in 2024, will become generally available as a subscription service on July 1st, 2025 (because you’re not already paying for the Microsoft licensing.)
One of the key updates in the latest release of Windows Server 2025 is the addition of hybrid and multi cloud capabilities, aligned with Azure’s adaptive cloud approach.
Hotpatching, we are taking what was previously an Azure-only capability and now making it available to Windows Server machines outside of Azure through Azure Arc.
Hotpatching is a new way to install Windows Server 2025 updates that does not require a reboot after installation, by patching the in-memory code of running processes without need to restart the process
Some of the benefits of hotpatching include the following:
- Higher availability with fewer reboots
- Faster deployment of updates as the packages are smaller, install faster, and have easier patch orchestration with Azure Update Manager
- Hotpatch packages install without the need to schedule a reboot, so they can happen sooner. This can decrease the window of vulnerability which can result if an administrator normally delaying an update and restart after a Windows security update is released.
Hotpatching is available at no charge to preview now, but starting in July with the subscription launch, hotpatching for Windows Server 2025 will be offered at a subscription of $1.50 per CPU core per month.
To make this work, though, the service must be connected to Azure Arc.

1:07:57 📢 Ryan – “I hope that there’s a technical reason, because it feels like a cash grab. On one hand, I get it – they’re solving operational problems they have by managing their workloads on Azure, and this is an enhancement that comes directly out of managing servers with that scale, which is fantastic. The fact that they put it as a subscription on Arc makes me feel a little dirty about it.”

1:13:53 Announcing preview for the next generation of Azure Intel® TDX Confidential VMs

Azure is announcing the preview of their next generation of confidential VM’s powered by the 5th gen Intel Xeon processor (Emerald Rapids) with Intel Trust Domain Extensions (TDX).
This enables organizations to bring confidential workloads to the cloud without code changes to applications. The supported SKUs include the general purpose DCesv6-series and the memory optimized ECesv6-series.
Confidential VM’s are designed for tenants with high security and confidentiality requirements, providing a strong, attestable, hardware-enforced boundary.
They ensure that your data and applications stay private and encrypted even while in use, keeping your sensitive code and other data encrypted in memory during processing.

1:17:09 Announcing Public Preview of Larger Container Sizes on Azure Container Instances

Azure is announcing the preview of larger container sizes of Azure Container Instances.
Customers can now deploy workloads with higher vCPU and memory for standard containers, confidential containers, containers with virtual networks, and containers utilizing virtual nodes to connect to AKS.
ACI now supports vCPU counts greater than 4 and memory capacities greater than 16, with the new maximum being 32 vCPU and 256gb for standard containers and 32vcpu and 192gb of confidential containers

1:18:09 📢 Ryan – “I’m just surprised they got away with it for as long as they did. Because I went on the same journey you did, which was to point and laugh – they only have four? Cause I’ve never seen a workload need more than four CPUs, but everyone asked for more than four.”

Other Clouds

1:19:47 Introducing DigitalOcean Managed Caching for Valkey, The New Evolution of Managed Caching

Digital Ocean has launched a managed caching for Valkey offering, which is their managed database service that seamlessly replaces Managed Caching (previous Managed Redis).
The offering is compatible with Valkey 8.0 and Redis 7.2.4 and is meant to be a drop in replacement for their managed caching database service while offering enhanced functionality for fast and efficient data storage.

1:20:11 📢 Ryan – “I like to hear DigitalOcean coming up with these managed services. And so if you have a workload on DigitalOcean you don’t have to manage your own service offering on compute. You can take advantage of these things. It’s great. I’d like to see more competition in this marketplace.”

Cloud Journey

1:20:50 ‘Project Greenland’: How Amazon Overcame a GPU Crunch

- Interesting project Amazon is working on related to AI chip crunch.
- Amazon retail business had a big problem, it couldn’t get enough GPU’s to power its crucial inference/training workloads.
- With projects hitting delays, Amazon revamped internal processes and technology to solve the problem.
- The solution was Project Greenland, a centralized GPU capacity pool to better manage and allocate its limited GPU supply.
- GPU’s are too valuable to be given out on a first come, first serve basis. Instead, distribution should be determined based on ROI layered with common sense considerations and provide for long-term growth of the company’s free cash flow” per internal guidelines.
- Two years since the shortage began, GPU’s remain scarce, but Amazon’s efforts to tackle the problem may be paying off, with internal forecasts suggesting the crunch would ease this year with chip availability expected to improve

“Amazon has ample GPU capacity to continue innovating for our retail business and other customers across the company,” the spokesperson said. “AWS recognized early on that generative AI innovations are fueling rapid adoption of cloud computing services for all our customers, including Amazon, and we quickly evaluated our customers’ growing GPU needs and took steps to deliver the capacity they need to drive innovation.”
Amazon demands hard data and return on investment proof for all internal GPU requests.
Initiatives are prioritized and ranked for GPU allocation based on several factors, including the completeness of data provided and the financial benefit per GPU. Projects must be shovel-ready, or approved for development, and prove they are competitive in the race to market. They also must provide a timeline for when benefits are expected to be realized.
If your system doesn’t provide the return on investment the GPU’s are redistributed to the next project/program.
They codified this process into official “tenets” or internal guidelines that individual teams or projects create for faster decision making. The tenets emphasize a strong return on investment, selective approvals and push for speed and efficiency.
1. ROl + High Judgment thinking is required for GPU usage prioritization. GPUs are too valuable to be given out on a first-come, first-served basis. Instead, distribution should be determined based on ROl layered with common sense considerations, and provide for the long-term growth of the Company’s free cash flow. Distribution can happen in bespoke infrastructure or in hours of a sharing/pooling tool.
2. Continuously learn, assess, and improve: We solicit new ideas based on continuous review and are willing to improve our approach as we learn more.
3. Avoid silo decisions: Avoid making decisions in isolation; instead, centralize the tracking of GPUs and GPU related initiatives in one place.
4. Time is critical: Scalable tooling is a key to moving fast when making distribution decisions which, in turn, allows more time for innovation and learning from our experiences.
5. Efficiency feeds innovation: Efficiency paves the way for innovation by encouraging optimal resource utilization, fostering collaboration and resource sharing.
6. Embrace risk in the pursuit of innovation: Acceptable level of risk tolerance will allow to embrace the idea of ‘failing fast’ and maintain an environment conducive to Research and Development.
7. Transparency and confidentiality: We encourage transparency around the GPU allocation methodology through education and updates on the wiki’s while applying confidentiality around sensitive information on R&D and ROI shareable with only limited stakeholders. We celebrate wins and share lessons learned broadly.
8. GPUs previously given to fleets may be recalled if other initiatives show more value. Having a GPU doesn’t mean you’ll get to keep it.
To manage all of this they built project greenland. Its described as a centralized GPU orchestration platform to share GPU capacity across teams and maximize utilization.
It can track GPU usage per initiative, share idle servers and implement clawbacks to reallocate chips to more urgent projects.
The system also simplifies networking setup and security updates, while alerting employees and leaders to projects with low GPU usage.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod

hotpatching

302: It’s So Hot, Even Windows is Hotpatching