347: The CloudPod Is Only Recording This Week “Because Of AI”

Welcome to episode 347 of The Cloud Pod, where the forecast is always cloudy! Justin, Jonathan, and Ryan are in the studio recording today, and thankfully, Jonathan hasn’t replaced us all with Skynet – yet. This week, we’re discussing how old our tools (and us) are (hint: it’s really old), whether or not the SaasApocalypse is upon us, and whether or not the business or AI is responsible for the latest round of layoffs.

Titles we almost went with this week

🪣 S3 Bucket Names Finally Stop Being a Global Hunger Games
🪟 One Million Tokens Walk Into a Context Window
👃 SLO Down and Smell the Reliability Metrics
☁️ CloudWatch Finally Watches Your Whole Cloud Organization
🎂 S3 Turns 20 and Still Buckets the Competition
🔓 Azure SRE Agent Goes GA So You Don’t Have To
🛑 Twenty Years of S3 and No Signs of Object Permanence
🖥️ One Rule to Monitor Them All Across AWS
🎌One Flag to Secure Them All on Cloud Run
🌋 SaaSpocalypse Now Atlassian Layoffs Hit the Jira
🗺️ No More Bucket Name Bingo with S3 Regional Namespaces
🖼️ A Picture Is Worth a Thousand Claude Tokens
🪖 One Command to Rule Your Autonomous AI Agents
🕴️AI Fixes Your Incidents Before Your Boss Notices
🤖 The CloudPod is only recording this week “Because of AI”
🙏 Amazon begs users to leave Simple DB with another migration tool

Follow Up

00:54 Microsoft’s brief in Anthropic case shows new alliance and willingness to challenge Trump administration

Microsoft filed an amicus brief in Anthropic’s lawsuit against the U.S. Department of War, urging a federal judge to temporarily block the Pentagon’s designation of Anthropic as a supply chain risk, citing substantial costs to government contractors that rely on Anthropic models.
The brief arrived one day after Microsoft launched Copilot Cowork, built on Anthropic’s Claude, and four months after Microsoft committed up to $5 billion in Anthropic as part of a deal requiring Anthropic to spend at least $30 billion on Azure, making the legal filing directly tied to concrete commercial dependencies.
Microsoft highlighted a procedural inconsistency in the government’s approach: the Pentagon gave itself six months to transition off Anthropic’s models while making the supply chain designation effective immediately for contractors, creating an unequal compliance burden.
Amazon, which has invested $8 billion in Anthropic, has not publicly responded to the lawsuit or the designation, creating a notable contrast in how two major cloud providers with similar financial exposure are handling the situation.
OpenAI announced its own Pentagon deal on the same day the Anthropic designation was issued, and 37 researchers from OpenAI and Google separately filed an amicus brief supporting Anthropic, indicating the case is drawing broad attention across the AI and cloud industry with potential implications for how AI guardrails are treated in government contracts.

01:37 📢 Justin – “Oh, yeah, there’s a vested interest in the lawsuit which we did not mention last week, so I wanted to follow up on that, because that explains very clearly why Microsoft is throwing in with Anthropic on this.”

General News

02:37 Atlassian to shed ten percent of staff, because of AI

Atlassian is cutting roughly 1,600 employees, about 10 percent of its workforce, citing AI-driven changes to required skill sets and a need to self-fund further AI and enterprise sales investment.
The company’s market cap has dropped from a peak of around 112 billion dollars in 2021 to approximately 20 billion dollars today, providing financial context for why cost restructuring is happening alongside the AI narrative.
The SaaSpocalypse concept is worth discussing here, as Atlassian is among the SaaS vendors analysts flag as potentially vulnerable to organizations replacing traditional tools with AI-generated or vibe-coded alternatives.
Atlassian points to 25 percent cloud revenue growth, 600 customers spending over 1 million dollars annually, and 5 million users on its Rovo AI suite as indicators that the business is still growing, which creates an interesting tension with the layoff announcement.
For cloud practitioners, this is a concrete example of how AI adoption is beginning to visibly reshape headcount decisions at established SaaS vendors, not just startups, which has implications for how enterprises evaluate vendor stability and long-term support commitments.

03:18 📢 Justin – “I’ve seen Rovo, which is Atlassian’s AI suite, and if that’s the best they can do… I have fears for the long-term health and viability of Jira in general. I’m kind of over the whole let’s blame AI for our bad business decisions. That’s going to get old real quick.”

AI Is Going Great – Or How ML Makes Money

06:18 Claude builds interactive visuals right in your conversation

Anthropic has launched in beta a new inline visualization feature for Claude that generates interactive charts, diagrams, and other visuals directly within chat conversations, available across all plan tiers at no additional cost.
These visuals are distinct from Claude’s existing artifacts system in a notable way: they are temporary and contextual, appearing inline rather than in a side panel, and they update or disappear as the conversation evolves rather than serving as persistent shareable documents.
Claude determines autonomously when a visual would aid comprehension, but users can also prompt it directly with natural language requests like “draw this as a diagram” or “visualize how this might change over time,” and can request adjustments iteratively within the same conversation.
The feature is part of a broader set of response format improvements Anthropic has been rolling out, including purpose-built layouts for recipes and weather queries, as well as direct in-conversation integrations with third-party tools like Figma, Canva, and Slack.
For developers and enterprise users, the practical implication is that Claude can now serve as a lightweight data visualization layer within workflows without requiring users to export data to separate charting tools, which could reduce friction in analytical and educational use cases.

07:27 📢 Ryan – “Kind of excited when Claude decides that the monkey making the queries needs bigger pictures because the text isn’t working out, so it’s like, I get you, Claude. I see what you’re doing.”

07:38 📢 Jonathan – “Anthropic’s Claude: Now with crayons.”

08:50 Introducing Genie Code

Databricks has launched Genie Code as a generally available product, positioning it as an agentic AI system built specifically for data teams rather than general software development.
It handles end-to-end tasks, including pipeline building, dashboard creation, ML model training, and production monitoring, directly within Databricks notebooks, SQL editor, and Lakeflow Pipelines.
The system claims to outperform a leading coding agent by more than 2x on real-world data science tasks, with the key differentiator being deep Unity Catalog integration that gives it access to data lineage, usage patterns, governance policies, and business semantics rather than just reading raw code.
Genie Code routes tasks across multiple models automatically, selecting from frontier LLMs, open source models, or custom Databricks-hosted models depending on the job, removing the need for users to manually choose models for different tasks.
A notable upcoming capability is background agents, which will proactively monitor Lakeflow pipelines and AI models, triage failures, handle routine Databricks Runtime upgrades, and auto-fix issues like schema mismatches in a sandboxed environment before alerting the team.
The governance angle is worth discussing for enterprise cloud users: Genie Code enforces Unity Catalog access controls during all operations, meaning it only surfaces data assets a user is authorized to see and respects existing lineage rules when building pipelines, which addresses a common concern with agentic systems operating on sensitive production data.

10:05 📢 Ryan – “I don’t think it will kill Glue or any of the ETL things, but hopefully it will just do it for you, and then I don’t think I care anymore.”

11:19 1M context is now generally available for Opus 4.6 and Sonnet 4.6

Anthropic has moved 1M context windows to general availability for Claude Opus 4.6 and Sonnet 4.6, with standard pricing applying across the full window and no long-context premium.
Opus 4.6 is priced at $5/$25 per million input/output tokens, and Sonnet 4.6 at $3/$15, meaning a 900K-token request costs the same per-token rate as a 9K one.
On the performance side, Opus 4.6 scores 78.3% on MRCR v2, a benchmark measuring recall and reasoning across long contexts, which Anthropic claims is the highest among frontier models at that context length.
Practical use cases include loading entire codebases, thousands of pages of contracts, or full agent traces with tool calls and intermediate reasoning, eliminating the need for lossy summarization or manual context management that long-context workflows previously required.
Claude Code users on Max, Team, and Enterprise plans now get 1M context automatically with Opus 4.6, meaning fewer session compactions and more conversation history retained without consuming extra usage credits.
The 1M context window is available natively on the Claude Platform and through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, making it accessible across the major cloud provider ecosystems that developers are already using.

19:46 Introducing GPT-5.4 mini and nano

OpenAI released GPT-5.4 mini and nano, two small models positioned for high-volume, latency-sensitive workloads.
GPT-5.4 mini runs more than 2x faster than GPT-5 mini while approaching GPT-5.4 performance on benchmarks like SWE-Bench Pro and OSWorld-Verified.
Pricing is notably lower than larger models: GPT-5.4 mini costs $0.75 per 1M input tokens and $4.50 per 1M output tokens, while GPT-5.4 nano comes in at $0.20 input and $1.25 output per 1M tokens, with a 400k context window on mini.
The models are designed for multi-model orchestration patterns where a larger model like GPT-5.4 handles planning and coordination while GPT-5.4 mini subagents execute narrower parallel tasks, a pattern OpenAI has built directly into their Codex product.
In Codex specifically, GPT-5.4 mini uses only 30% of the GPT-5.4 quota, giving developers a cost-effective path for simpler coding tasks like codebase navigation, targeted edits, and debugging loops without sacrificing too much capability.
GPT-5.4 nano is API-only and recommended for classification, data extraction, ranking, and simpler subagent tasks, making it a practical option for cloud workloads where cost and throughput matter more than deep reasoning.

21:00 📢 Ryan – “I’m a fan of these little models for certain things; as part of that tuning, my agent definitions have gotten a lot more complex. A lot of times, I’m breaking out agent definitions so that I can specifically use one of the smaller models for certain types of tasks. Data extraction being a big one.”

AWS

22:53 Twenty years of Amazon S3 and building what’s next

S3 turns 20 years old this month, growing from 1 petabyte of capacity and 15 cents per gigabyte in 2006 to hundreds of exabytes storing over 500 trillion objects at just over 2 cents per gigabyte today, representing roughly an 85% price reduction over two decades.
A notable engineering detail is that code written for S3 in 2006 still works today unchanged, with AWS maintaining complete API backward compatibility through multiple infrastructure generations, which is why the S3 API has become a de facto standard across the storage industry.
On the technical side, AWS has spent 8 years progressively rewriting performance-critical S3 components in Rust for memory safety and performance, and uses formal methods with automated proofs to mathematically verify consistency in the index subsystem and cross-region replication.
AWS is positioning S3 as a universal data foundation with three newer capabilities worth noting: S3 Tables for managed Apache Iceberg analytics, S3 Vectors for native vector storage supporting up to 2 billion vectors per index at sub-100ms latency, and S3 Metadata for centralized object cataloging, all priced at standard S3 cost structures rather than specialized database pricing.
The maximum object size has grown from 5 GB to 50 TB, and AWS reports customers have collectively saved over $6 billion in storage costs through S3 Intelligent-Tiering compared to S3 Standard storage class pricing.

24:08 📢 Justin – “I am a big fan of the S3 vectors because we use it for Bolt.”

25:39 Introducing account regional namespaces for Amazon S3 general-purpose buckets

AWS S3 now supports account regional namespaces for general-purpose buckets, where bucket names automatically include your account ID and region as a suffix, such as mybucket-123456789012-us-east-1-an.
This solves the long-standing problem of bucket name collisions in the global namespace, particularly useful for large organizations managing buckets at scale across multiple regions.
The feature integrates with IAM and AWS Organizations service control policies via the new s3:x-amz-bucket-namespace condition key, allowing security teams to enforce that employees only create buckets within their account’s namespace.
- This gives enterprises a straightforward governance mechanism to prevent naming conflicts and unauthorized bucket creation.
Existing global namespace buckets cannot be renamed to use the account regional namespace, so this is a forward-looking change for new bucket creation only. S3 table buckets, vector buckets, and directory buckets already operate in account-level or zonal namespaces, so this update brings general-purpose buckets in line with those patterns.
CloudFormation support is included via the BucketNamespace property and pseudo parameters AWS::AccountId and AWS::Region, making it straightforward to update existing IaC templates. CLI and Boto3 support is also available using the x-amz-bucket-namespace header or BucketNamespace parameter.
The feature is available across 37 AWS regions, including AWS China and GovCloud, at no additional cost, making it a low-friction adoption for teams looking to simplify bucket naming conventions without budget impact.

27:17 📢 Jonathan – “What’s really annoying is your account number is part of the public S3 bucket name! I wish a security person had been in the room there.”

28:17 Amazon CloudWatch Application Signals adds new SLO capabilities

Amazon CloudWatch Application Signals now includes three new SLO capabilities: SLO Recommendations, Service-Level SLOs, and SLO Performance Report, addressing longstanding gaps in data-driven reliability management for AWS customers.
SLO Recommendations analyzes 30 days of historical P99 latency and error rate data to suggest appropriate reliability targets, reducing the manual guesswork that previously led to misconfigured thresholds and alert fatigue.
Service-Level SLOs give teams a consolidated view of reliability across all operations within a service, making it easier to align technical monitoring with business objectives without stitching together multiple dashboards.
The SLO Performance Report adds calendar-aligned historical reporting at daily, weekly, and monthly intervals, which is useful for teams that need to present reliability data to stakeholders in business-friendly formats.
Pricing is usage-based, tied to inbound and outbound application requests plus SLO charges, with each SLO generating 2 application signals per service level indicator metric period. The features are available in all regions where CloudWatch Application Signals is supported.

29:11 📢 Jonathan – “So instead of fixing your product, you just use a tool that tells you that you should turn down your commitments to your customers. Ok…”

29:57 Amazon SimpleDB now supports exporting domain data to Amazon S3

Amazon SimpleDB, one of AWS’s oldest database services dating back to 2007, now supports exporting domain data directly to S3 in JSON format, giving long-time users a practical path to migrate away from the service or archive data for compliance purposes.
The export tool introduces three new APIs (StartDomainExport, GetExport, and ListExports) with background processing that avoids any performance impact on the running database, which matters for users who cannot afford downtime during data extraction.
Cross-region and cross-account support, along with multiple encryption options, make this useful for organizations with strict data governance requirements who need to move SimpleDB data into modern storage or database systems.
Rate limiting is set at 5 exports per domain and 25 per account within a 24-hour window, so teams with large numbers of domains should plan their migration timelines accordingly rather than assuming bulk exports can happen all at once.
The tool itself is free to use, but standard S3 data transfer charges apply, so cost planning should account for data volume when scoping a migration or archival project.

30:53 📢 Justin – “SimpleDB gets a new feature!”

32:19 Amazon CloudWatch introduces organization-wide EC2 detailed monitoring enablement

CloudWatch now supports organization-wide rules to automatically enable EC2 detailed monitoring, shifting metrics collection from a per-instance manual task to a centralized policy-driven configuration across the entire AWS Organizations.
Rules can be scoped to the full organization, specific accounts, or individual resources using tags, so teams can target environments like production workloads without enabling the feature universally and incurring unnecessary costs.
The 1-minute interval metrics that detailed monitoring provides are particularly relevant for Auto Scaling groups, where faster data collection means scaling policies can respond more quickly to utilization changes rather than waiting for the default 5-minute interval.
The feature covers both existing and newly launched instances within the rule scope, which closes a common gap where new instances spun up after policy creation would otherwise miss monitoring configuration.
Detailed monitoring costs apply per instance per metric per month per CloudWatch pricing, so organizations should evaluate tag-based scoping carefully to avoid unexpected billing increases when rolling this out broadly.

33:17 📢 Ryan – “I mean, what’s wrong with the previous method of waiting until you had an outage, not having the data, and THEN turning it on for your project?”

GCP

33:47 Why context is the missing link in AI data security

Google Cloud’s Sensitive Data Protection is now generally available with new context classifiers for medical and finance data, plus image object detectors for faces and passports, moving beyond simple keyword matching to understand the semantic meaning of data.
For AI training workflows on Vertex AI, SDP can scan unstructured image data using OCR and object detection to find sensitive content like credit card numbers or photo IDs, then generate redacted versions rather than discarding the data entirely, preserving training dataset quality.
The context-aware approach addresses a practical problem with traditional regex-based detection: the same number sequence can be treated differently depending on surrounding words, so “order number” passes through while “wallet number” triggers financial context classification and redaction.
SDP serves as the underlying engine for several other Google Cloud products, including Model Armor, Security Command Center, and Contact Center as a Service, meaning improvements here propagate across those services automatically.
Organizations in regulated industries like healthcare and finance are the most direct beneficiaries, as the tool helps ensure AI agents only access data appropriate to their function during both training and live user interactions. Pricing details are not specified in the announcement, so teams should check cloud.google.com/security/products/sensitive-data-protection for current rates.

35:16 📢 Ryan – “I don’t really think that’s usually where the sensitive data is. It can be, in some workloads, but probably not the majority, so there’s so many false positives, so I really like the idea that they’re having context be a part of that decision.”

37:16 Welcoming Wiz to Google Cloud: Redefining security for the AI era

Google has completed its acquisition of Wiz, a cloud and AI security platform, which will retain its brand and continue supporting multicloud environments, including AWS, Azure, and Oracle Cloud Platform, alongside Google Cloud.
Wiz connects code, cloud, and runtime into a single context, allowing security teams to map application architecture, permissions, data flows, and runtime behavior in real time to identify and prioritize exploitable attack paths before they reach production.
The combined offering integrates Wiz’s cloud security platform with Google Security Operations, Mandiant Consulting, and Google Threat Intelligence under the Google Unified Security umbrella, with Gemini AI assisting in threat hunting, remediation workflows, and audit documentation.
A notable focus of the acquisition is AI-specific security, addressing threats that target AI models and those generated by AI systems, which is increasingly relevant as organizations deploy AI agents fed with business-critical data.
Pricing details for the combined platform have not been announced, but Wiz products will remain available through existing partner channels, system integrators, and managed security service providers, suggesting continuity for current Wiz customers during the transition.

38:16 📢 Justin – “Typically on these acquisitions, it takes about a year for Google to figure out how to package them properly, and most likely they’ll want a separate contract for it anyways because that’s how all the integration acquisitions they’ve done are.”

39:22 IAP integration with Cloud Run

Google Cloud Run now supports direct Identity-Aware Proxy integration in general availability, allowing developers to enable IAP authentication with a single UI click or the –iap flag in gcloud, eliminating the previous requirement to configure load balancers manually. IAP carries no additional cost beyond standard Cloud Run charges, with limited exceptions noted in the pricing docs.
IAP on Cloud Run supports enterprise authentication features, including user and group identity policies, context-aware access controls based on IP, geolocation, and device status, and Workforce Identity Federation for external identity providers. This makes it practical for organizations that need to secure internal web applications without building custom authentication layers.
A separate change allows Cloud Run services to disable the default IAM invoker check by selecting “Allow Public access,” which resolves a long-standing friction point for teams trying to host public-facing applications while also enforcing Domain Restricted Sharing org policies.
The two features address different scenarios: IAP is the recommended path for internal business applications requiring user authentication, while the public access option suits public websites, store locators, or private microservices where network-level controls like Cloud Armor handle security instead.
Real-world adoption examples include L’Oreal using IAP across their Google Cloud application portfolio and Bilt Rewards disabling IAM invoker checks on multi-regional Cloud Run services to simplify edge routing while relying on Cloud Armor for security enforcement.

39:57 📢 Ryan – This is a neat little feature. I don’t know how widely known it is, but it’s something that I’ve been using for a while.”

42:09 Multi-cluster GKE Inference Gateway helps scale AI workloads

Google Cloud has launched a preview of multi-cluster GKE Inference Gateway, which extends the existing GKE Gateway API to enable model-aware load balancing for AI inference workloads across multiple GKE clusters and regions.
This addresses practical limitations of single-cluster deployments like GPU/TPU capacity caps and regional availability risks.
The system introduces two core Kubernetes custom resources, InferencePool and InferenceObjective, which group model-server backends and define routing priorities, respectively.
This allows the gateway to intelligently multiplex latency-sensitive and lower-priority inference requests across a distributed fleet.
A notable technical capability is the GCPBackendPolicy resource, which enables load balancing decisions based on real-time custom metrics such as KV cache utilization on model servers.
This is more inference-specific than traditional request-count or latency-based routing approaches.
The architecture uses a dedicated config cluster to manage a single Gateway configuration that routes traffic to multiple target clusters, simplifying operations for teams running globally distributed AI services. Supported use cases include disaster recovery, capacity bursting, and heterogeneous hardware utilization.
Pricing for this feature is not separately detailed in the announcement, so costs would likely follow existing GKE and Cloud Load Balancing pricing structures. Teams evaluating this should factor in multi-cluster networking and potential cross-region data transfer costs alongside their GPU/TPU resource expenses.

43:06 📢 Ryan – “Simplify. Sure…”

44:35 More transparency and control over Gemini API costs

Google AI Studio now supports Project Spend Caps, letting developers set monthly dollar limits per project directly from the Spend tab.
There is a roughly 10-minute enforcement delay, so users remain responsible for any overages incurred during that window.
Usage Tiers have been redesigned with lower spend qualifications, automatic tier upgrades based on payment history, and system-defined billing account caps that increase as you move to higher tiers. This reduces manual intervention for developers scaling their API usage over time.
Three new dashboards have been added to Google AI Studio covering rate limits, costs, and usage. The rate limit dashboard tracks RPM, TPM, and RPD per project, while the cost dashboard offers a daily breakdown filterable by model and time range going back up to a full month.
Billing setup can now be completed entirely within Google AI Studio, including linking billing profiles to projects, removing the previous need to navigate across multiple Google Cloud console windows.
This consolidation is particularly useful for teams managing several projects under one billing account.
Developers building with Imagen and Veo now have dedicated usage graphs alongside standard request metrics, giving multimodal workloads the same observability previously available only for text-based Gemini API calls.

45:13 📢 Justin – “If you’ve ever tried to figure out who is using what models and what they’re doing with them and how much it costs, you know that this is all terrible – and this doesn’t actually improve it all that much.”

Azure

47:35 Generally Available: Azure SRE Agent with new capabilities

Azure SRE Agent is now generally available as an AI-powered operations tool designed to help teams diagnose incidents faster and automate response workflows, to reduce downtime and manual operational work.
The GA release introduces deep context gathering capabilities, meaning the agent can pull together relevant signals and telemetry during an incident rather than requiring engineers to manually correlate data across multiple tools.
This fits naturally into teams already using Azure Monitor, Application Insights, and related observability tooling, as the agent is positioned to work within existing Azure operations workflows rather than requiring a separate platform.
The primary target audience is operations and SRE teams managing production workloads on Azure who are looking to reduce the time between incident detection and resolution without adding headcount.
Pricing details were not included in the announcement, so teams evaluating this should check the Azure pricing page directly here before planning adoption, as AI-powered agent services on Azure typically carry consumption-based costs.

48:25 📢 Jonathan – “All right, so they run the services, which are going to have problems. And now they want me to pay for another service so that I can use that tool to troubleshoot the problems with the other tools that I’m already paying for. OK…”

55:59 Many agents, one team: Scaling modernization on Azure

Azure announced two new public preview offerings: the Azure Copilot migration agent and the GitHub Copilot modernization agent, designed to automate discovery, assessment, planning, and deployment for organizations moving workloads to Azure.
The migration agent targets servers, virtual machines, applications, and databases, while the modernization agent orchestrates code upgrades at scale across multiple applications simultaneously.
The two agents are designed to work together, with GitHub Copilot scanning application code to produce assessment reports that Azure Copilot’s migration agent then ingests to inform cloud infrastructure planning. This integration aims to close the historical gap between developer-level code work and infrastructure decisions around landing zones, networking, and governance.
Early customer results show a 70% reduction in total modernization effort using GitHub Copilot modernization capabilities, and Ahold Delhaize is cited as a customer that reduced complexity and accelerated delivery using these agentic workflows across discovery, assessment, and execution.
Microsoft is pairing these agentic tools with a structured delivery program called Cloud Accelerate Factory, a no-cost benefit under Azure Accelerate where Microsoft experts work alongside customers from discovery through production. Pricing for the agents themselves is not specified in the announcement, so listeners should check Azure pricing pages directly for cost details.
According to a Forrester Q1 2026 survey of 223 global IT leaders, 91% view application modernization as necessary for enabling AI in their business, which provides context for why Microsoft is investing in automating what has traditionally been a slow, manual planning process.

52:32 📢 Ryan – “I keep waiting for someone to tout the success of how they did it, they’ve migrated all their terrible legacy code into this new thing, and it all works – but I haven’t seen it…”

53:28 Announcing Fireworks AI on Microsoft Foundry

Microsoft Foundry now integrates with Fireworks AI’s inference cloud, giving customers access to models like DeepSeek v3.2, Kimi K2.5, and OpenAI’s gpt-oss-120b through both pay-per-token and provisioned throughput deployment options.
This is currently in public preview and requires an opt-in through the Azure portal’s Preview features panel.
Pricing follows a per-million-token model for serverless deployments covering input, cached input, and output tokens, with US Data Zone availability across six regions, including East US and West US.
Default quota limits start at either 250K or 25K tokens per minute, depending on subscription type, with additional quota available via a request form.
A notable addition is custom model support, allowing teams who have fine-tuned models from families like Qwen3-14B, DeepSeek v3, or Kimi K2 to import and deploy those weights directly into Foundry projects.
The Azure Developer CLI has been updated with an azd ai models create command to facilitate the weight transfer process.
Fireworks-hosted models are distinct from Azure Direct models in that they skip Microsoft’s Responsible AI safety assessments, so teams needing safety evaluations will need to use Foundry’s built-in risk and safety evaluator tools separately.
Model retirement for serverless deployments comes with at least 30 days’ notice, and customers can extend usage past retirement dates by switching to provisioned throughput deployments, which use existing Global PTU quota and reservation commitments.

54:30 📢 Justin – “Sounds like it’s a cross-connect that they’ve done to Firework’s cloud basically, to provide this to you, so it’s sort of interesting.”

56:02 Announcing Copilot leadership update

Microsoft is reorganizing its Copilot efforts by merging consumer and commercial teams into a single unified org, structured around four pillars: Copilot experience, Copilot platform, Microsoft 365 apps, and AI models.
Jacob Andreou will lead the combined Copilot experience as EVP, reporting directly to Satya Nadella.
Mustafa Suleyman is shifting focus exclusively to what Microsoft calls its “superintelligence” effort, concentrating on frontier model development, enterprise-tuned model lineages, and reducing inference costs at scale over the next five years.
The restructuring reflects a product direction where Copilot moves from individual features toward an integrated system connecting agents, apps, and workflows, with recent announcements like Copilot Tasks, Copilot Cowork, and Agent 365 representing early examples of this approach.
For enterprise customers, the key practical implication is that commercial and consumer Copilot capabilities will converge, meaning IT and governance controls will need to account for a more unified product surface rather than separate consumer and business tracks.
The Copilot Leadership Team now includes Suleyman, Andreou, Charles Lamanna, Perry Clarke, and Ryan Roslansky, signaling that Microsoft 365 app development and platform infrastructure will be tightly coordinated with model development rather than operating independently.

57:23📢 Ryan – “Noticeably missing is Github’s Copilot…”

After Show

55:59 Washington state hotline callers hear AI voice with Spanish accent

Washington state’s Department of Licensing accidentally routed Spanish-language callers to an AI voice speaking English with a Spanish accent for several months, a direct result of a misconfiguration by DOL staff using Amazon Web Services Polly.
AP journalists were able to replicate the issue by selecting the AWS Polly voice named “Lucia,” which is designed to mimic Castilian Spanish, highlighting how easy it is to misconfigure AI voice services when teams lack familiarity with the underlying platform options.
The incident is a practical reminder that deploying AI-driven customer service tools across multiple languages requires thorough testing and quality assurance, particularly for government agencies serving diverse populations with real accessibility needs.
Amazon provided the platform but declined interview requests, raising a recurring question in cloud deployments about where vendor responsibility ends and customer configuration responsibility begins when things go wrong in production.
The story went viral with around 2 million TikTok views, which illustrates how public-facing AI failures in government services can quickly become reputational issues, adding pressure on agencies to treat AI deployment with the same rigor as other critical infrastructure.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

347: The CloudPod is Only Recording this Week “Because of AI”