314: Vector? I Hardly Know Her! S3’s New AI Storage Play

Cloud Pod Header
tcp.fm
314: Vector? I Hardly Know Her! S3's New AI Storage Play
Loading
/

Welcome to episode 314 of The Cloud Pod, where your hosts, Matt and Ryan, are holding down the fort in Justin’s absence and bringing what’s left of our audience (those of you still here after the last time they were left in charge) the latest and greatest in cloud and tech news. We’ve got undersea cables, vector storage, and even some hobos – but not the kind on trains. Plus, AWS S3 gets its Vector Victor. Let’s get started! 

Titles we almost went with this week:

  • ↖️S3 Gets Direction: AWS Points to Vector Storage
  • 🧕Vector? I Hardly Know Her! S3’s New AI Storage Play
  • 🧭S3 Finds Its Magnitude and Direction
  • 🧱Claude Goes to Wall Street
  • 💲Anthropic’s Bull Run Into Financial Services
  • 🪪AI Assistant Gets Its Series 7 License
  • 🍁Nova Scotia: AWS Brings Regional Flavor to AI Models
  • 🐁The Fine-Tuning of the Shrew: Teaching Nova Models New Tricks
  • 💉Nova-caine: Numbing the Pain of Model Customization
  • 🕵️AgentCore Blimey: AWS Gives AI Agents Their License to Scale
  • 🏗️The Agent Infrastructure: Mission Deployable
  • 💪From Zero to Agent Hero: AWS Tackles the Production Problem
  • 📊SageMaker Gets Its Data Act Together
  • 💓From Catalog to QuickSight: A Data Love Story
  • 📈The Great Data Unification of 2024
  • 🆓AWS Free Tier Gets a $200 Makeover
  • 💄EKS-treme Makeover: Cluster Edition
  • #️⃣100K Nodes Walk Into a Cluster…
  • ➡️S3 Gets Direction: Amazon Points to Vector Storage
  • 💴Amazon S3: Now with 90% Less Vector Bills and 100% More Dimensions

Follow Up

01:03 SoftBank and OpenAI’s $500 Billion AI Project Struggles to Get Off Ground

  • The $500 billion AI effort unveiled at the White House has struggled to get off the ground and has scaled back its near-term plans. 
  • It’s been six months since the announcement, where they said they would spend $100B almost immediately, but now they have a more modest goal of building a small data center by the end of the year in Ohio.
  • Softbank committed to $30 billion earlier this year, and it is one of the largest ever startup investments by them, which led them to take on new debt and sell assets.  
  • This investment was made alongside Stargate, giving them a role in the physical infrastructure needed for AI. 
  • Altman, though, has been eager to secure computing power as quickly as possible and has proceeded without Softbank. 
  • Publicly, they say it’s a great partnership, and they look forward to advancing projects in multiple states
  • Oracle was part of Stargate, but the recent 30B deal just signed with includes a commitment of 4.5 gigawatts of capacity, and would consume the equivalent power of more than two Hoover Dams, or about 4 million homes. 
  • Oracle was also named part of the deal with UAE firm MGX as a partner, but Oracle CEO Safra Catz said that Stargate hadn’t been formed yet, as of last month. 

02:31 📢 Matthew – “…everyone’s like, how hard can it be to build a data center? But it’s city zoning, power consumption, grid improvements, water for cooling… getting communities to approve – and these things end up being a massive undertaking. And it takes the hyperscalers a long time to get these things up and operational. So it doesn’t surprise me that a small data center by the end of the year is probably something that was already in the works beforehand; they’re just taking over other plans. Most data centers take a couple of years to really get up and operational.”

General News

04:55 A Transatlantic Communications Cable Does Double Duty – Eos

  • You know how much we love a good undersea cable story, and this one is especially nerdy. Strap in! (Thanks, Matt)
  • Scientists have developed a new instrument that transforms existing undersea fiber-optic telecommunications cables into ocean sensors by measuring variations in light signals between repeaters, enabling monitoring of water temperature, pressure, and tide patterns without disrupting internet or phone service.
  • The technology uses fiber Bragg gratings at cable repeaters (positioned every 50-100km) to reflect light signals, allowing researchers to measure changes in travel time that indicate how surrounding water conditions affect cable shape and properties.
  • This distributed sensing approach is more cost-effective than previous methods as it uses standard, nonstabilized lasers rather than expensive ultrastable ones, and can monitor individual cable subsections rather than treating the entire cable as a single sensor.
  • The 77-day test on the EllaLink cable between Portugal and Brazil successfully measured daily and weekly temperature variations and tide patterns across 82 subsections, demonstrating the potential for the global submarine cable network to serve dual purposes.
  • The technology could enable early tsunami warning systems and long-term climate monitoring by leveraging millions of kilometers of existing infrastructure, providing valuable ocean data without requiring new sensor deployments.

06:30 📢 Ryan – “It feels like our version of like getting into World War Two or something.”

AI Is Going Great – or How ML Makes Its Money 

08:55 Amazon-backed Anthropic rolls out Claude AI for financial services

  • Anthropic launched Claude Financial Analysis Solution, a tailored version of Claude for Enterprise specifically designed for financial professionals to analyze markets, make investment decisions, and conduct research using Claude 4 models with expanded usage limits.
  • The solution integrates with major financial data providers, including Box, PitchBook, Databricks, S&P Global, and Snowflake, for real-time financial information access, with availability through AWS Marketplace and Google Cloud Marketplace coming soon.
  • This represents Anthropic’s strategic push into enterprise AI following their $61.5 billion valuation in March, targeting financial services as businesses increasingly adopt generative AI for customer-facing functions.
  • The offering includes Claude Code capabilities and implementation support, positioning it as a specialized alternative to general-purpose AI assistants for complex financial analysis tasks requiring domain-specific accuracy and reasoning.
  • Cloud providers benefit from this vertical-specific AI approach as it drives compute consumption through AWS and Google Cloud marketplaces while demonstrating how foundation models can be packaged for specific industry needs.

10:22 📢 Matt – “It’s literally why we named this section this! AI is how ML makes money!”  

14:35 TwelveLabs video understanding models are now available on Amazon Bedrock | AWS News Blog

  • TwelveLabs brings two specialized video understanding models to Amazon Bedrock: Marengo for video embeddings and search, and Pegasus for generating text from video content. These models enable natural language queries like “find the scene where the main characters first meet” to locate specific moments in video libraries.
  • The models were trained on Amazon SageMaker HyperPod and support both synchronous and asynchronous inference patterns. 
  • Pegasus uses the standard Invoke API while Marengo requires the AsyncInvoke API for processing video embeddings.
  • Key technical capabilities include video-to-text summarization with timeline descriptions, automatic metadata generation (titles, hashtags, chapters), and vector embeddings for similarity search. The models accept video input via S3 URIs or Base64-encoded strings.
  • Practical applications span multiple industries: media teams can search dialogue across footage libraries, marketing can personalize content at scale, and security teams can identify patterns across multiple video feeds. This transforms previously unsearchable video archives into queryable knowledge bases.
  • Pricing follows Amazon Bedrock‘s standard model, with Marengo available in US East, Europe, and Asia Pacific regions, while Pegasus operates in US West and Europe with cross-region inference support. 
  • Integration requires minimal code changes using existing Bedrock SDKs.
  • I’m extra proud of Matt for getting through this particularly dense block of text. Gold star! 

16:27 📢 Matt – “I feel like this is definitely something that came out of like Amazon video, so that they were able to find stuff a lot faster. And this is like, hey – let’s productize it. This is the next evolution.”

Cloud Tools 

17:48 Harness AI Unveils Advanced DevOps Automation: Smarter Pipelines, Faster Delivery, and Enterprise-Ready Compliance

  • Harness AI brings context-aware automation to DevOps pipelines by understanding your organization’s existing templates, tool configurations, and governance policies to generate production-ready CI/CD pipelines that match internal standards from day one.
  • The platform uses large language models combined with a proprietary knowledge graph to provide AI-driven troubleshooting, natural language pipeline generation, and automated policy enforcement directly integrated into the Harness Platform rather than as a separate add-on.
  • This addresses the growing challenge of faster AI-generated code outpacing traditional pipeline capabilities while managing increasingly fragmented toolchains and mounting compliance requirements across enterprise environments.
  • Key capabilities include automatic pipeline generation that adapts to organizational standards, intelligent troubleshooting that understands your specific environment context, and built-in governance guardrails for enterprise-ready compliance without added complexity.
  • The solution is positioned as having an AI DevOps engineer on call 24/7 who already knows your system, helping teams move from idea to production faster while reducing manual toil in the software delivery process.

19:59 📢 Ryan – “I do like that it’s built into the existing tooling as an InfoSec professional. I’m like, how is this compliance really put in? Because if I have to prompt it as the software engineer, that’s not okay. But then how do I, from a central organization, provide that sort of governance at a level that’s not actually just dragging everything to a screaming halt.”

AWS 

20:48 Introducing Amazon S3 Vectors: First cloud storage with native vector support at scale | AWS News Blog

  • Amazon S3 Vectors introduces native vector storage in S3 with a new bucket type that can reduce vector storage costs by up to 90% compared to traditional vector databases. 
  • This addresses the growing need for affordable vector storage as organizations scale their AI applications.
  • The service provides sub-second query performance for similarity searches across tens of millions of vectors per index, with support for up to 10,000 indexes per bucket. Each vector can include metadata for filtered queries, making it practical for recommendation engines and semantic search applications.
  • Native integrations with Amazon Bedrock Knowledge Bases and SageMaker Unified Studio simplify building RAG applications, while the OpenSearch Service export feature enables a tiered storage strategy. 
  • Organizations can keep infrequently accessed vectors in S3 Vectors and move high-priority data to OpenSearch for real-time performance.
  • The preview is available in five regions (US East Virginia/Ohio, US West Oregon, Europe Frankfurt, Asia Pacific Sydney) with dedicated APIs for vector operations. 
  • Pricing details aren’t specified (so hold on to your butts), but the 90% cost reduction claim suggests significant savings for large-scale vector workloads.
  • This positions AWS as the first cloud provider with native vector support in object storage, potentially disrupting the vector database market. 
  • The ability to store embeddings for images, videos, documents, and audio files directly in S3 removes infrastructure management overhead for AI teams.

25:21 📢 Ryan – “So expensive. It’s going to be ALL the money. All the new stuff on S3 is expensive.”

25:39 Announcing Amazon Nova customization in Amazon SageMaker AI | AWS News Blog

  • AWS introduces customization capabilities for Amazon Nova models (Micro, Lite, Pro) through SageMaker AI, supporting supervised fine-tuning, alignment techniques (DPO/PPO), continued pre-training, and knowledge distillation with seamless deployment to Amazon Bedrock for inference.
  • The service offers both parameter-efficient fine-tuning (PEFT) using LoRA adapters for smaller datasets with on-demand inference, and full fine-tuning (FFT) for extensive datasets requiring provisioned throughput, giving customers flexibility based on data volume and cost requirements.
  • Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) enable alignment of model outputs to company-specific requirements like brand voice and customer experience preferences, addressing the limitations of prompt engineering and RAG for business-critical workflows.
  • Knowledge distillation allows customers to create smaller, cost-efficient models that maintain the accuracy of larger teacher models, particularly useful when lacking adequate training data samples for specific use cases.
  • Early adopters, including MIT CSAIL, Volkswagen, and Amazon’s internal teams, are already using these capabilities, with recipes currently available in US East (N. Virginia) through SageMaker Studio’s JumpStart interface.

27:13 📢 Ryan – “It’s such a fast field that you know, like, I barely understand these things, and I’ve only because I’ve been working on a project in my day job to sort of get information based on all of our internal IT data sets, right? Like, and have a custom bot that simplifies our  employee day-to-day and onboarding.”

28:38 Introducing Amazon Bedrock AgentCore: Securely deploy and operate AI agents at any scale (preview) | AWS News Blog

  • Amazon Bedrock AgentCore provides enterprise-grade infrastructure services for deploying AI agents at scale, addressing the gap between proof-of-concept agents built with frameworks like CrewAI or LangGraph and production-ready systems. 
  • The preview includes seven modular services: Runtime for serverless deployment, Memory for session management, Observability for monitoring, Identity for secure access controls, Gateway for API integration, Browser for web automation, and Code Interpreter for sandboxed code execution.
  • AgentCore Runtime offers isolated serverless environments with three network configurations (Sandbox, Public, and upcoming VPC-only), enabling developers to deploy agents with just three lines of code while maintaining session isolation and preventing data leakage. The service works with any agent framework and supports both Amazon Bedrock models and external models, with free usage until September 16, 2025.
  • AgentCore Identity implements a secure token vault that stores user OAuth tokens and API keys, allowing agents to act on behalf of users with proper authorization across AWS services and third-party platforms like Salesforce, Slack, and GitHub. 
  • This eliminates the need for developers to build custom authentication infrastructure while maintaining enterprise security requirements.
  • AgentCore Gateway transforms existing APIs and Lambda functions into agent-ready tools using Model Context Protocol (MCP), providing unified access with built-in authentication, throttling, and request transformation capabilities. Combined with AgentCore Memory’s short-term and long-term storage strategies, agents can maintain context across sessions and extract semantic facts from conversations.
  • The preview is available in US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Frankfurt), with integration support for AWS Marketplace pre-built agents and tools. 
  • After the free preview period ends on September 17, 2025, standard AWS pricing will apply based on service usage.

35:07 Streamline the path from data to insights with new Amazon SageMaker Catalog capabilities | AWS News Blog

  • Welcome to writing copy, Ryan. Your headline WAS better. 
  • Amazon SageMaker now integrates QuickSight directly into Unified Studio, allowing users to build dashboards from project data and publish them to SageMaker Catalog for organization-wide discovery and sharing. 
    • This eliminates the need to switch between platforms and maintains consistent governance across analytics workflows.
  • SageMaker Catalog adds support for S3 general-purpose buckets with S3 Access Grants, enabling teams to discover and access unstructured data like documents and images alongside structured data. The integration automatically handles permissions when users subscribe to S3 assets, simplifying cross-team collaboration on diverse data types.
  • Automatic onboarding from AWS Glue Data Catalog brings existing lakehouse datasets into SageMaker Catalog without manual setup, unifying technical and business metadata management. 
    • This allows organizations to immediately explore and govern their existing data investments through a single interface.
  • The integrations require IAM Identity Center setup for QuickSight and appropriate S3 permissions, with standard pricing for each service applying. 
  • Available in all commercial AWS regions where SageMaker is supported, these features address the complete data lifecycle from ingestion to visualization.
  • Real-world applications include medical imaging analysis in notebooks, combining unstructured documents with structured data for comprehensive analytics, and building executive dashboards that automatically stay synchronized with project permissions. This unified approach reduces the time from data discovery to actionable insights.

48:25 📢 Ryan – “Once you get the ability to query and generate insights from a very large data set, like it’s just super neat. But then when you want to share that, it is super hard. If you want to productionize it at all, it’s just very complicated.”

39:23 AWS Free Tier update: New customers can get started and explore AWS with up to $200 in credits | AWS News Blog

  • Do you love surprise credit card bills? Do you love complicated pricing structures? We’ve got some great news. 
  • AWS introduces a new Free Tier structure with up to $200 in credits for new customers – $100 upon signup plus $20 each for completing activities in EC2, RDS, Lambda, Amazon Bedrock, and AWS Budgets within the first 6 months.
  • New customers now choose between a free account plan (no charges for 6 months or until credits expire) with limited service access, or a paid account plan with full AWS access, where credits are automatically applied to bills.
  • The free account plan restricts access to enterprise-focused services but includes over 30 always-free tier services, with automatic email alerts at 50%, 25%, and 10% credit remaining and timeline notifications at 15, 7, and 2 days before expiration.
  • This replaces the previous 12-month free tier model for accounts created after July 15, 2025, while existing accounts remain on the legacy program – a notable shift in AWS’s customer acquisition strategy.
  • The required activities expose new users to core AWS services and cost management tools, teaching proper instance sizing and budget monitoring from day one rather than discovering these concepts after unexpected bills.

41:43 📢 Matt – “I know we talked about cutting it, but I think it’s kind of fun the way they gamified it a little bit and forced you to go play with the things, and with the key one here being Budgets. I feel like that should have been like, in order to use EC2 RDS, and especially Bedrock, you had to set up that budget, and it kind of forces people to fix, you know, a lot of those… hey, I’ve actually caused a $300 bill.” 

44:20 Monitor and debug event-driven applications with new Amazon EventBridge logging | AWS News Blog

  • EventBridge now provides comprehensive logging for event-driven applications, tracking the complete event lifecycle from receipt through delivery with detailed success/failure information and status codes. 
  • This addresses a major pain point in debugging microservice architectures where event flows were previously opaque.
  • The feature supports three log destinations – CloudWatch Logs, Kinesis Data Firehose, and S3 – with configurable log levels (Error, Info, Trace) and optional payload logging. Logs are encrypted in transit with TLS and at rest when using customer-managed keys.
  • The logs include valuable performance metrics like ingestion-to-start latency, target duration, and HTTP status codes, making it straightforward to identify bottlenecks between EventBridge processing time and target service performance. What previously took hours of trial-and-error debugging can now be diagnosed in minutes.
  • API destination debugging becomes significantly easier as the logs clearly show authentication failures, credential issues, and endpoint errors with specific error messages. This is particularly useful for troubleshooting integrations with external HTTPS endpoints and SaaS applications.
  • There’s no additional EventBridge charge for logging – customers only pay standard S3, CloudWatch Logs, or Kinesis Data Firehose pricing for storage and delivery. The feature operates asynchronously with no impact on event processing latency or throughput.

46:07 📢 Ryan – “Where have you been all my life?” 

48:35 Amazon S3 Metadata now supports metadata for all your S3 objects | AWS News Blog

  • S3 Metadata now provides complete visibility into all existing objects in S3 buckets through Apache Iceberg tables, eliminating the need for custom scanning systems and expanding beyond just tracking new objects and changes.
  • The service introduces two table types: live inventory tables that provide a complete snapshot of all objects refreshed within an hour, and journal tables that track near real-time object changes for auditing and lifecycle tracking.
  • Pricing includes a one-time backfill cost of $0.30 per million objects, with no additional monthly fees for buckets under one billion objects, and journal tables cost $0.30 per million updates (a 33% price reduction).
  • The tables enable SQL queries through Athena for use cases like finding unencrypted objects, tracking deletions, analyzing storage costs by tags, and optimizing ML pipeline scheduling by pre-discovering metadata.
  • Currently available only in US East (Ohio, N. Virginia) and US West (N. California), with tables automatically created and maintained by S3 Tables service without requiring manual compaction or garbage collection.

51:44 📢 Matt – “It’s amazing how much fractions of cents add up real fast.” 

54:23 Simplify serverless development with console to IDE and remote debugging for AWS Lambda | AWS News Blog

  • AWS Lambda now offers direct console-to-IDE integration with VS Code, adding an “Open in Visual Studio Code” button that automatically handles setup and opens functions locally, eliminating manual environment configuration and enabling developers to use full IDE features like integrated terminals and package management.
  • Remote debugging capability allows developers to debug Lambda functions running in their AWS account directly from VS Code with full access to VPC resources and IAM roles, solving the long-standing problem of debugging cloud functions that interact with production AWS services.
  • The remote debugging feature supports Python, Node.js, and Java runtimes at launch and automatically handles secure connection setup, breakpoint management, and cleanup after debugging sessions to prevent production impact.
  • Both features are available at no additional cost beyond standard Lambda execution charges during debugging sessions, making it more cost-effective for developers to troubleshoot issues in actual cloud environments rather than maintaining complex local emulation setups.
  • This addresses a key serverless development pain point where functions work locally but fail in production due to differences in permissions, network access, or service integrations, potentially reducing debugging time from hours to minutes for complex AWS service interactions.

57:03 📢 Matt – “I have bad news for Peter. It only supports Python, Node.js, and Java. It does not support Ruby.”

59:15 Accelerate safe software releases with new built-in blue/green deployments in Amazon ECS | AWS News Blog

  • In things we thought they already have…
  • Amazon ECS now includes built-in blue/green deployments at no additional charge, eliminating the need for teams to build custom deployment tooling while providing automated rollback capabilities for safer container deployments.
  • The feature introduces deployment lifecycle hooks that integrate with Lambda functions, allowing teams to run validation tests at specific stages like pre-scale up, post-scale up, and traffic shift phases before committing to new versions.
  • Blue/green deployments maintain both environments simultaneously during deployment, enabling near-instantaneous rollbacks without end-user impact since production traffic only shifts after successful validation of the green environment.
  • The implementation requires configuring IAM roles, load balancers, or Service Connect, and target groups through the ECS console, with each service revision maintaining an immutable configuration for consistent rollback behavior.
  • This addresses a significant operational challenge where development teams previously spent cycles building undifferentiated deployment tools instead of focusing on business innovation, particularly important for organizations running containerized workloads at scale.

1:02:45 Amazon Braket adds new 54-qubit quantum processor from IQM – AWS

  • Amazon Braket now offers access to IQM’s Emerald, a 54-qubit superconducting quantum processor with square-lattice topology, expanding the quantum computing options available to AWS customers alongside existing trapped-ion and neutral atom devices.
  • The Emerald QPU features state-of-the-art gate fidelities and dynamic circuit support, enabling researchers to experiment with more complex quantum algorithms using familiar tools like the Braket SDK, NVIDIA CUDA-Q, Qiskit, and Pennylane.
  • Hosted in Munich and accessible via the Europe (Stockholm) Region, this addition strengthens AWS’s quantum computing presence in Europe while providing on-demand access to the latest-generation quantum hardware without requiring direct hardware investment.
  • Amazon Braket Hybrid Jobs offers priority access to Emerald for running fully managed quantum-classical algorithms, addressing the practical need for combining quantum and classical computing resources in real-world applications.
  • AWS Cloud Credits for Research program supports accredited institutions experimenting with quantum computing, reducing the barrier to entry for academic research, while standard Braket pricing applies for commercial users.

GCP

1:05:44 New monitoring library to optimize Google Cloud TPU resources | Google Cloud Blog

  • Google released a new monitoring library for Cloud TPUs that provides real-time metrics like tensor core utilization, HBM usage, and buffer transfer latency sampled at 1Hz, enabling developers to dynamically optimize their AI workloads directly in code.
  • The library integrates with JAX and PyTorch installations through libtpu and allows programmatic adjustments – for example, automatically increasing batch sizes when duty_cycle_pct is low or triggering memory-saving strategies when HBM capacity approaches limits.
  • This addresses a key gap in TPU observability compared to AWS’s CloudWatch for EC2 GPU instances and Azure’s GPU monitoring, giving Google customers similar granular performance insights specifically designed for TPU architectures.
  • The monitoring capabilities are particularly valuable for large-scale AI training where even small efficiency improvements can translate to significant cost savings, with metrics like hlo_exec_timing helping identify bottlenecks in distributed workloads.
  • While the library is free to use, it requires shell access to TPU VMs and is limited to snapshot-mode access rather than continuous streaming, which may impact real-time monitoring use cases compared to traditional APM solutions.

1:07:45 📢 Ryan – “I mean, it is an SDK that they’re releasing in addition to the existing services, right? It’s not a service by itself, but it is a neat little easy, you know, like, like any library, it’s just an easy button instrument for my code, to make it visible, right? So I do like that.”

1:08:28 Get to know Cloud Observability Application Monitoring | Google Cloud Blog

  • Google Cloud introduces Application Monitoring, an out-of-the-box observability solution that automatically generates dashboards for applications defined in App Hub, eliminating hours of manual dashboard configuration and providing immediate visibility into the Four Golden Signals (traffic, latency, error rate, saturation).
  • The service automatically propagates application labels across logs, metrics, and traces in Google Cloud, enabling consistent filtering and correlation across all telemetry data without manual tagging effort.
  • Integration with Gemini Cloud Assist Investigations (currently in private preview) provides AI-powered troubleshooting that understands application boundaries and relationships, offering contextual analysis based on the automatically collected application data.
  • This positions Google Cloud competitively against AWS CloudWatch Application Insights and Azure Application Insights by reducing the upfront investment typically required for application monitoring setup while incorporating Google SRE best practices.
  • Organizations can start using Application Monitoring immediately by defining applications in App Hub and navigating to Cloud Observability, with Gemini features requiring a separate SKU and trusted tester program enrollment.

1:12:06 Deepseek R1 is available for everyone in Vertex AI Model Garden | Google Cloud Blog

  • Google adds DeepSeek R1 to Vertex AI Model Garden as a managed service, eliminating the need for customers to provision 8 H200 GPUs typically required to run this large language model, with pay-as-you-go pricing and serverless API access.
  • The Model-as-a-Service offering provides enterprise-grade security and compliance while supporting both REST API and OpenAI Python client integration, positioning GCP alongside AWS Bedrock and Azure’s model marketplace in the managed LLM space.
  • DeepSeek R1 joins Llama 4 models in Vertex AI’s expanding open model catalog, giving customers more flexibility to choose models for specific use cases without infrastructure management overhead.
  • The service operates without outbound internet access for data security, making it suitable for enterprises with strict compliance requirements who need advanced AI capabilities without compromising data privacy.
  • This release strengthens Google’s open AI ecosystem strategy by providing access to non-Google models through its platform, competing directly with proprietary offerings while maintaining the convenience of fully managed deployment.

1:13:14 📢 Ryan – “I mean, this is really the power of using those public models in something like a model garden. Instead of like, you know, running a server, installing all the models and getting it all in place, and hooking it all together, you can now just basically provision this within your virtual site AI environment and have a web endpoint that you can then send prompts to. And it makes that much, much easier to do. So the fact that it’s DeepSeek. Like everyone’s always concerned about China’s going to steal our data.”

Azure

01:15:36 Unified by design: mirroring Azure Databricks Unity Catalog to Microsoft OneLake in Fabric (Generally Available) | Microsoft Fabric Blog | Microsoft Fabric 

  • Microsoft Fabric now offers general availability of mirroring for Azure Databricks Unity Catalog, enabling direct access to Databricks tables in OneLake without data duplication or ETL pipelines. 
  • This integration allows organizations to query Databricks data through Fabric workloads and Power BI Direct Lake mode while maintaining a single copy of data.
  • The feature addresses a key enterprise challenge of bridging Azure Databricks and Microsoft Fabric ecosystems, as demonstrated by The Adecco Group, which uses it to expose Databricks datasets for Power BI semantic models and GraphQL APIs. Setup requires only a few clicks to connect catalogs, schemas, or individual tables through the Fabric portal.
  • Technical improvements in the GA release include support for ADLS with firewalls enabled, public APIs for CI/CD automation, and full integration with OneLake security framework for enterprise-grade access controls. Data automatically syncs as tables are updated or modified in Azure Databricks.
  • This positions Microsoft against AWS and GCP by leveraging their unique combination of Databricks partnership and Fabric platform, though competitors offer similar lakehouse integrations through services like AWS Glue Data Catalog and BigQuery external tables. The open Delta Parquet format ensures vendor neutrality while reducing storage costs.
  • Target customers include enterprises already using both Azure Databricks and Microsoft Fabric who need unified analytics without maintaining duplicate data pipelines. The future roadmap may include support for RLS/CLM policies, federated tables, Delta Sharing, and streaming data.

01:16:37 Announcing Cosmos DB in Microsoft Fabric Featuring New Capabilities!  Microsoft Fabric Blog | Microsoft Fabric 

  • Microsoft brings Cosmos DB natively into Fabric as a preview, combining NoSQL database capabilities with Fabric’s analytics platform to create a unified data environment for both operational and analytical workloads without managing separate services.
  • The service automatically mirrors operational data to OneLake in Delta format for real-time analytics, enabling T-SQL queries, Spark notebooks, and Power BI reporting on the same data without ETL pipelines or manual replication steps.
  • New vector and full-text search capabilities support AI workloads with multiple indexing options, including Microsoft’s DiskANN for large-scale scenarios, positioning this as a direct competitor to AWS DocumentDB‘s vector search and GCP’s AlloyDB vector capabilities.
  • Billing uses Fabric capacity units rather than separate Cosmos DB pricing, which could simplify cost management for organizations already invested in Fabric but may require careful capacity planning to avoid unexpected charges.
  • CI/CD support through deployment pipelines and Git integration addresses enterprise DevOps requirements, though the preview status suggests production workloads should wait for general availability.

1:17:36 📢 Matt – “They just continue to shove everything into Fabric.” 

1:18:48 Public Preview: CLI command for migration from Availability Sets and Basic load balancer on AKS 

  • Azure introduces a CLI command to migrate AKS clusters from deprecated Availability Sets and Basic load balancers to Virtual Machine Scale Sets before the September 30, 2025, retirement deadline, simplifying what would otherwise be a complex manual migration process.
  • The automated migration tool addresses a critical need as Basic load balancers lack features like availability zones and SLA guarantees that production workloads require, while Availability Sets are being replaced by the more resilient Virtual Machine Scale Sets architecture.
  • This positions Azure competitively with AWS EKS and GCP GKE, which already use modern infrastructure patterns by default, though Azure’s migration tool provides a smoother transition path for existing customers compared to manual rebuilds.
  • Organizations running production AKS workloads should prioritize testing this migration in non-production environments first, as the shift to Standard load balancers will increase costs but provide essential enterprise features like cross-zone load balancing.
  • The preview availability gives customers nearly two years to plan and execute migrations, though early adoption allows time to address any edge cases before the deprecation deadline forces the change.

1:20:15 📢 Matt – “There’s a bunch of deprecations coming up, and it is extremely nice that Azure is attempting to help you migrate away from some of these things. But definitely test these in your lower-level environments.” 

1:21:27 Generally Available: Microsoft Azure Cloud HSM 

  • Azure Cloud HSM delivers FIPS 140-3 Level 3 certified hardware security modules as a single-tenant service, giving customers full administrative control over their cryptographic operations and key management infrastructure.
  • This positions Azure competitively against AWS CloudHSM and Google Cloud HSM, offering similar dedicated hardware security capabilities for organizations with strict compliance requirements in financial services, healthcare, and government sectors.
  • The single-tenant architecture ensures complete isolation of cryptographic operations, making it suitable for workloads requiring the highest levels of security assurance and regulatory compliance.
  • Key use cases include protecting certificate authorities, database encryption keys, code signing certificates, and meeting specific regulatory mandates that require hardware-based key storage.
  • While pricing details aren’t provided in the announcement, organizations should expect premium costs typical of dedicated HSM services, with deployment considerations around high availability configurations and integration with existing Azure Key Vault implementations.

1:23:56 Generally Available: Hosted-On-Behalf-Of (HOBO) Public IP model for ExpressRoute Gateways

  • Azure’s new Hosted-On-Behalf-Of (HOBO) model for ExpressRoute Gateways eliminates the need to manually assign public IP addresses, with Microsoft now managing this infrastructure component automatically for all new deployments.
  • This simplification reduces configuration complexity and potential misconfigurations for enterprises connecting their on-premises networks to Azure via ExpressRoute, particularly benefiting organizations with limited networking expertise.
  • The HOBO model aligns Azure more closely with AWS Direct Connect Gateway’s approach, where public IPs are abstracted away, though Azure still requires customers to manage more networking components overall compared to AWS’s implementation.
  • While this improves the deployment experience, existing ExpressRoute gateways won’t automatically migrate to HOBO, creating a mixed environment that IT teams will need to manage during their transition period.

01:26:00 Public Preview: Orchestration versioning for Durable Functions and Durable task SDKs / Generally Available: Durable Functions PowerShell SDK as a standalone module

  • Azure introduces orchestration versioning for Durable Functions, addressing a critical challenge where modifying orchestration logic could break existing in-flight workflows – this allows developers to safely update orchestration code without disrupting running instances.
  • The feature enables side-by-side deployment of multiple orchestration versions, letting new instances use updated logic while existing instances complete with their original code – similar to AWS Step Functions versioning but with tighter integration into Azure’s serverless ecosystem.
  • Target customers include enterprises running long-running workflows, event-driven architectures, and complex business processes where orchestration changes are frequent but downtime is unacceptable – particularly valuable for financial services and e-commerce scenarios.
  • This positions Azure competitively against AWS Step Functions and Google Cloud Workflows by solving the “orchestration evolution” problem that has plagued serverless workflow engines since their inception.
  • The preview status suggests Microsoft is gathering feedback before GA, with pricing likely to follow the standard Durable Functions consumption model where you pay for execution time and storage of orchestration state.
  • Microsoft has released the Durable Functions PowerShell SDK as a standalone module in the PowerShell Gallery, making it easier for developers to build stateful serverless applications using PowerShell without bundling it with the Azure Functions runtime.
  • This GA release provides PowerShell developers with native support for orchestration patterns like function chaining, fan-out/fan-in, and human interaction workflows, bringing PowerShell to parity with C# and JavaScript for Durable Functions development.
  • The standalone module approach simplifies dependency management and version control, allowing teams to update the SDK independently of their Azure Functions runtime version and reducing potential compatibility issues.
  • While AWS Step Functions and GCP Workflows offer similar orchestration capabilities, Azure’s approach uniquely integrates with PowerShell’s automation heritage, targeting IT operations teams who already use PowerShell for infrastructure management.
  • Organizations can now build complex workflows that combine traditional PowerShell automation scripts with serverless orchestration, enabling scenarios like multi-step deployment pipelines or approval workflows without managing state infrastructure.

1:28:10 📢 Matt – “I mean, any of these improvements are just good. You know, durable functions are designed for that consistency, and having that consistency and allocation of the time, you know, but potentially breaking the things in flight kind of wasn’t a good look for them. So having that kind of a little bit more robustness with the versioning and making sure that different, you know, you’re able to control that a lot better. It’s just, you know, beneficial. A general quality of life improvement.” 

1:29:24 Public Preview: Web Application Firewall (WAF) running on Application Gateway for Containers

  • Azure brings WAF capabilities to Application Gateway for Containers, extending layer 7 security to Kubernetes workloads with protection against common web exploits like SQL injection and cross-site scripting.
  • This positions Azure competitively against AWS WAF on ALB and Google Cloud Armor, offering native integration with AKS and other Azure container services for simplified security management.
  • The preview enables organizations to implement consistent security policies across containerized applications without deploying separate WAF instances, reducing operational overhead and complexity.
  • Target customers include enterprises migrating microservices to Kubernetes who need enterprise-grade application security without sacrificing the agility of container deployments.
  • Pricing details aren’t specified in the preview announcement, but expect consumption-based billing similar to standard Application Gateway WAF tiers when it reaches general availability.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.