248: A Public Service Announcement On Shared VPCs In AWS: Don’t!

75 / 100

Welcome to episode 248 of the CloudPod Podcast – where the forecast is always cloudy! It’s the return of our Cloud Journey Series! Plus, today we’re talking shared VPCs and why you should avoid them, Amazon’s new data centers ( we think they forgot about the sustainability pledge,) new threats to and from AI, and a quick preview of Next ‘24 programs – plus much more!

Titles we almost went with this week:

👠The Cloud Pod Isn’t a Basic Bitch
🤑New AWS Data Solutions Framework – or – How You Accidentally Spent $100k’s
📢A PSA on Shared VPCs in AWS
☁️Amazon Doesn’t Even Pay Attention to Climate When it’s on a Building
🔍Vector Search I Hardly Know Her
🚀Google Migs are Less Fun than Russian Migs
💣AI Can Now Attack Us; Who Didn’t See That Coming
🗺️Who is Surprised That AWS is Using More Power Than the Rest of the State of Oregon
🥘Spend all the Dinero in Spain

A big thanks to this week’s sponsor:

We’re sponsorless this week! Interested in sponsoring us and having access to a specialized and targeted market? We’d love to talk to you. Send us an email or hit us up on our Slack Channel.

AI is Going Great (or how ML Makes all Its Money)

01:24 Disrupting malicious uses of AI by state-affiliated threat actors

In this week’s chapter of AI nightmares, ChatGPT tells us how they are blocking the usage of AI by state-affiliated threat actors. Awesome; things went from bad to worse in one week. Cool. Cool cool cool.
In partnership with Microsoft Threat Intelligence, they have disrupted five state-affiliated actors that sought to use their AI service in support of malicious cyber activities
These actors generally sought to use OpenAI services for querying open-source information, translating, finding coding errors, and running basic coding tasks.
- Charcoal Typhoon (China affiliated) researched various companies and cybersecurity tools, debugged code and generated scripts, and created content likely for use in phishing campaigns.
- Salmon Typhoon (China affiliated) translated technical papers, retrieved publicly available information on multiple intelligence agencies and regional threat actors, assisted with coding, and researched common ways processes could be hidden on a system.
- Crimson Sandstorm (Iran affiliated) used OpenAI services for scripting support related to app and web development, generating content likely for spear-phishing campaigns, and researching common ways malware could evade detection.
- Emerald Sleet (North Korea affiliated) identified experts and organizations focused on defense issues in the Asia-Pacific region, to understand publicly available vulnerabilities, and used OpenAI services for help with basic scripting tasks, and drafting content that could be used in phishing campaigns.
- Forest Blizzard (Russia-affiliated) primarily for performing research on open-source data into satellite communication protocols and radar imaging technology, as well as for support with scripting tasks.
OpenAI says the capabilities of the current models are limited, they believe it’s important to stay ahead of significant and evolving threats.
To continue making sure their platform is used for good they have a multi-pronged approach:
- Monitoring and Disrupting malicious state affiliated actors
- Working together with the AI ecosystem
- Iterating on Safety mitigations
- Public Transparency

03:59 📢 Ryan – “I do like that all these state-sponsored actors, they’re just like us, asking basic scripting questions.”

AWS

06:46 Announcing the Data Solutions Framework on AWS

AWS is announcing the release of Data Solutions Framework on AWS (DSF), an opinionated open source framework that accelerates building data solutions on AWS.
- It can take days or weeks to build end-to-end solutions on AWS with infrastructure as code (IaC) and following best practices, but with DSF it takes hours and you can focus on your use case. Cool!
DSF is built using the CDK to package infrastructure components into L3 CDK constructs atop AWS services.
L3 constructs are opinionated implementations of common technical patterns, and generally create multiple resources that are configured to work with each other.
One of the constructs creates a complete data lake storage solution with three different S3 buckets, encryption, data lifecycle policies, and more.
- This means that you can create a data lake in your CDK application with just a few lines of code, and be sure it follows best practices and is thoroughly tested.
DSF is available in TypeScript and Python packages through npm and PyPi.
Some of the things DSF will set up as a data engineer you get access to Spark EMR tooling, DataLakeStorage and DataLakeCatalog.
The tooling leverages codecommit, codepipeline, codebuild, S3, KMS, Spark EMR Serverless, Cloudformation and Glue Data Catalogs.

17:43 📢 Ryan – “…none of the things in this are new, but it’s the packaging of it all together, where you just sort of – with a few simple lines of SDK code – really have the full infrastructure for a full DataLake. And so, it’s super cool, because this is always what, as a customer, you’re sort of wanting, like give me the easy button.”

10:25 API Gateway now supports TLS 1.3

API Gateway finally supports TLS 1.3 on its regional REST, HTTP and WebSocket endpoints. TLS 1.3 on APIGW works by offloading encryption and decryption of TLS traffic from your application servers to API Gateway.
TLS 1.3 optimizes for performance and security through the use of one round trip TLS handshake, while exclusively supporting ciphers that offer perfect forward secrecy.

10:45 📢 Justin- “I assume this is also a prerequisite for them to be able to support mutual TLS on API Gateway, which would be probably the last big feature I think they’re missing on the API Gateway.”

11:36 📢 Matthew – “…the one piece of it I don’t like is that they didn’t do it, which makes sense across the board on all the different flavors of API gateway, but that’s because CloudFront would need to actually handle 1.3 also. So I get why they’re slowly rolling it out, but you know, just doing a regional means someone’s going to go in there and try to do it on global and not understand why. And then you’re going to go bang your head on the wall until you really sit down and figure out, oh yeah, this is actually a CloudFront API gateway under the hood.”

12:40 Amazon GuardDuty Runtime Monitoring protects clusters running in shared VPC

GuardDuty Runtime Monitoring, now protects workloads running in shared VPCs across all supported compute services.
VPC sharing allows multiple AWS accounts to create their application resources, such as Amazon EC2 instances, into shared, centrally-managed VPCs.
Customers use shared VPCs to simplify network management across different accounts in the organization, providing cost benefits and reduced operational overhead with fewer VPCs to manage.
Its at this point that Justin, Ryan, and Matthew would like to point out that we **really discourage** the use of shared VPCs, and warn there be “SHARP EDGES THAT WILL CUT YOU DOWN THIS PATH.”

13:57 📢 Ryan – “I mean, even I’m glad they’re fixing this with GuardDuty. I hope that they’re not implementing too much complexity on the backend, making it either very complicated to run or changing the results. But like, today I learned that previously you couldn’t run GuardDuty and inspect those workloads, right? And so I’m sure that GuardDuty is one of many.”

18:18 One of Oregon’s smallest utilities is suddenly among the state’s biggest polluters. Why? Amazon data centers

Amazon needs to look really hard at their Climate Pledge Arena to remember that you’re supposed to be making your data centers sustainable! Right? Did we just imagine that?
The Oregonian has a report that one of Oregon’s smallest utilities is now one of the state’s biggest polluters, and it’s all because of Amazon’s data centers.
This isn’t a recent development per the report; the increases started in 2018, and by 2020 its carbon emissions had doubled and in 2021 doubled again.
The Umatilla Electric Cooperative is responsible for 1.8 million tons of carbon emissions annually, even though it only has 16,000 customers.
- It is now the third largest emitter of greenhouse gasses among all Oregon utilities.
Amazon capitalized on hundreds of millions of local tax breaks to subsidize multiple datacenters in the cities of Boardman and Hermiston, areas where the regional power grid has little access to renewable energy. Super sustainable, right?
Oregon is many years away from expanding its transmission capacity, and with Amazon planning at least 10 more data centers in the region, eastern Oregon’s carbon footprint will only get worse. ULTRA sustainable.
Both Amazon and Umatilla Electric say they’re committed to fighting climate change and to finding clean energy to power the data centers.
- In fact just this month Amazon announced a deal to start buying renewable power from a wind farm in neighboring Gilliam County… but it only represents 4% of their need.
Umatila used to rely on renewable hydropower from federal dams to meet its modest energy needs.
- As federal hydropower had already been largely allocated to Umatillas electric existing market, they have had to buy the power on the open market and nearly all available power to buy comes from fossil fuel power plants
Amazon isn’t the only data center in Oregon that is drawing a tremendous amount of power; joining them is Apple, Facebook, Google and even X, which run large data centers in Prineville, The Dalles and Hillsboro.
The Bonneville Power Administration estimates that data center electricity use in Oregon and Washington will more than double by 2041, requiring power equivalent to a third of all homes in the two states.
Amazon is committed to net-zero carbon emissions by 2040, but when Oregon lawmakers considered a bill to make datacenters subject to the state’s clean energy rules, Amazon mounted a furious lobbying campaign to kill it. Weird, huh?
Amazon is supporting legislation to support offshore wind power, battery storage and clean energy incentives.
But those efforts are not enough when the transmission infrastructure isn’t big enough to address all that power.

22:13 📢 Matthew – “There was a podcast I listened to at one point where saying there’s a lot of these green initiatives. Everybody wants to do it. The problem comes down to the way all this works is like it has to come down to like transmission lines. And like, if you say, I’m going to build a wind farm over here, you got to pay for all the infrastructure after that to trans to do it. So you end up like, Hey, this 50,000, this hundred thousand dollar project now, it costs you a million dollars. You got to redo it. So like, it almost feels like we have to look at the way we kind of handle our electrical grid to support these. So a simple wind farm over here doesn’t end up costing billions of dollars.”

GCP

23:43 Google Cloud expands access to Gemini models for Vertex AI customers

Gemini has been moving at a blazing pace since it was announced in December.
Now Gemini 1.0 Pro and Ultra are available to you via Vertex, as well as the first Gemini 1.5 model has been released for early testing in a private preview on Vertex.
Gemini 1.5 pro is a mid-size multi-modal model optimized for scaling across a wide-range of tasks, and performs at a similar level to 1.0 ultra, their largest model to date.
- 1.5 pro introduced a new breakthrough experimental feature in long-context understanding –the longest context window of any large-scale foundational model yet.
Apps can now run up to 1 million tokens in production.
- This means 1.5 pro can process vast amounts of information in 1 go, including 1 hour of video, 11 hours of audio, and codebases with over 30,000 lines of code or over 700,000 words.
Larger context models allow you to reference more information, grasp narrative flow, maintain coherence over longer passages and generate more contextually rich responses. Some use cases:
- Accurately analyze an entire code library in a single prompt without the need to fine-tune the model, including understanding and reasoning over small details that a developer might easily miss, such as errors, inefficiencies and inconsistencies in code.
- Reason across very long documents, from comparing details across contracts to synthesizing and analyzing themes and opinions across analyst reports, research studies or even a series of books.
- Analyze and compare content across hours of video, such as finding specific details in sports footage or getting caught up on detailed information from video meeting summaries that support precise question-answers.
- Enable chatbots to hold long conversations without forgetting details, even over complex tasks or many follow-up interactions
- Enable hyper-personalized experiences by pulling relevant user information into the prompt without the complexity of fine-tuning a model.
Vertex will allow you to customize your models, augment the gemini models via Grounding, and manage and scale gemini in production.
You can also use Gemini to build search and conversational agents via Vertex AI search and AI Conversation.

25:29 📢 Ryan – “This is moving way too fast for me. Like, Gemini 1.0 Pro used something like 32,000 tokens, right? Or maxed out at that? I can’t. I think that’s it. And now they’re scaling up to a million, like a week later, like it feels like. Like it’s crazy. You’re going to be able to run this against so many things.”

27:13 Feel the Next ‘24 love: Full session library is now live

The session catalog for Next is live… which you might say “Next.. Isn’t that months away… NO IT ISN’T!! It’s April 9th – 11th!
We’re annoyed that we can’t at least flag these sessions as interesting as registration has yet to open for them, but some that jumped out at us as we clicked through the tracks:
- Build an Internal Developer Platform on GKE Enterprise- OPS304
- Build Telemetry Pipelines for cloud Operations with OpenTelemetry – DEV233
- Distributed techniques for large language models – OPS 213
- A guide for enterprises: how to implement generative AI applications: AIML145
- AI and Modernization on your terms: From edge to sovereign to cross-cloud SPTL204
- Goodbye, deployment headaches: Cloud Deploy and Vertex AI unite
- Cloud compromises: Lessons learned from Mandiant investigations in 2023
- Preventing data exfiltration with Google Cloud’s built-in controls – SEC304
- Non-relational databases design patterns: How Shopify leverages them to power their business – DBS300
Little tidbit… The AI and ML track has 183 results… There are only 43 advanced technical sessions. Everything else is much less.

33:21 Introducing vector search in BigQuery

With the new advanced AI and ML technologies revolutionizing the way organizations use their data, offering new opportunities to unlock your potential. Google is announcing the public preview of Vector Search in BQ, which enables vector similarity search on BigQuery data.
This functionality, commonly referred to as approximate nearest-neighbor search, is key to empowering numerous new data and AI use cases such as semantic search, similarity detection and retrieval augmented generation with a LLM.
Vector search is often performed on high-dimensional numeric vectors, aka embeddings, which incorporate a semantic representation for an entity and can be generated from numerous sources, including text, image or video.
BigQuery vector search relies on an index to optimize the lookups and distance computations required to identify closely matching embeddings.
Couple of use cases of this type of solution:
- Given a new (batch of) support case(s), find ten closely-related previous cases, and pass them to an LLM as context to summarize and propose resolution suggestions.
- Given an audit log entry, find the most closely matching entries in the past 30 days.
- Given a picture, find the most closely-related images in the customer’s BigQuery object table, and pass them to a model to generate captions.

35:15 📢 Justin- “I definitely see the advantage of it. Like, you know, I, my trick is I just like, I select SQL server row one. And if that’s the data I want, then I assume that row two is also similar to it. That’s how I do it. It’s not really the right way to do it. Yeah. There’s a wildcard. I just, yeah. Select all put it in an elastic search cluster, do a search, see what I come up with. All kinds of, all kinds of ways to solve this problem.”

35:40 Introducing Managed Instance Groups standby pool: Stop and suspend idle VMs

At Google Cloud they are constantly working on providing cost-efficiency improvements to your infrastructure (which they then negate with price increases in other areas…)
*They say* that one of the best ways to save money is to stop or suspend your compute engine VMs, to avoid compute charges for idle instances.
- However, if you manage your applications with Managed Instance Groups (MIGs), this capability hasn’t been available until now.
Now with Standby Pool for managed instance groups you can pause and resume VMs, manually or as part of a MIG automation.
This is a new way for MIGs to reduce costs when pausing applications, or enable a MIG to respond faster to increased load with pre-initialized VMs.

36:23 📢 Matthew – “So this is the AWS auto scaling cold or whatever they called it, which is like a server that you can have on the side that like you can just boot up. And the only reason I ever saw to use this feature, was if you were auto scaling or MIG scaling in this case, I guess, Windows servers, just because they take so long to boot up. Because Windows…It was a nice feature. We set this up for one person at one point and it did dramatically help it, you know, Windows by default, I think it takes like 15 minutes to boot up. So just having the server there and essentially stopping it off hours; kind of lets you do fake auto scaling without actually doing auto scaling. So you’re stopping/starting servers. It’s a significant savings.”

Azure

37:54 Microsoft to invest $2.1B in Spain to expand AI and cloud infrastructure

MS is expanding its footprint in Spain by 2.1 billion dollars over the next 2 years.
The investment goes beyond just building a data center, and is a testament to their “37 year commitment to Spain, its security, development, and digital transformation of its government, business and people”.
This comes just after they announced a 3.44 billion dollar investment in Germany.

38:31 📢 Justin – “I mean, if I were to be a betting man, they’re all trying to get ahead of the EU data center moratorium because they can only build for so long before they hit the moratorium limit. Because they also have power transmission problems in Europe, if you didn’t know.”

40:00 General Availability: Azure NetApp Files Standard Network Features – Edit Volumes

Azure is announcing the general availability of standard network features, edit network features for Azure Netapp Files.
Standard network features provide you with enhanced virtual networking experience for a seamless and consistent experience along with security posture for Azure netapp files.
You can now edit an existing Azure Netapp File volume and upgrade basic network features to standard.
Standard networking gets you new features like:
- Increased IP limits for the VNETs with Azure Netapp Files volumes on par with VMs to enable customers to provision volumes in their existing topologies/architectures.
- Enhanced network security with support for Network Security Groups
- Enhanced network control with support for User-Defined Routes to and from Azure netapp Files delegated subnet.
- Connectivity over active/active VPN gateway setups for high availability connectivity to ANF from on-prem.
- ExpressRoute FastPath connectivity to Azure Netapp Files. FastPath is designed to improve the data path performance between on-premises networks and Azure virtual networks.

42:22 📢 Matthew – “I just feel like it’s kind of also the way Azure is, is security and reliability. You always have to go to the higher tiers, which just drives me a little bit crazy. Like it’s not built in day one whereas with AWS, I feel like, you know, their motto is designed for failure. So like most of the managed services are by default and you don’t have an option. Like you can’t launch a load balancer without two subnets. You can’t launch, you know, your database without multiple subnets; they’re just there where Azure feels like you always have to think about it. I’m like, I don’t want to think about it. This is why I’m paying for a service – do it for me.”

Continuing our Cloud Journey Series Talks

50:35 Five key things to consider when building a cloud FinOps team

You may be under tremendous pressure to save costs and one of the things we’re a big fan of is building out a Finops capability to help drive optimization processes in your company.
Google has a blog post here that talks about five key things to consider when building your FinOps team
1. Define your goals and document them in a cloud FinOps charter
- Set a clear set of goals and objectives. Without a well defined purpose, your team may struggle to align efforts and demonstrate value to the organization. This FinOps charter outlines the teams mission, goals, strategies and responsibilities
- The charter should have several benefits:
  - Guidance in uncertainty
  - Executive Buy In
  - Prioritization
  - Efficiency
2. Develop a Cloud FinOps Lexicon
- A lexicon or shared glossary of terms ensures that everyone on the team speaks the same language, reduces misunderstandings and promotes clarity
3. Establish a cloud FinOps culture
- Key things are Cross-functional Collaboration, Continuous Improvement, and Democratizing cost visibility
4. Define a set of KPIs and metrics to measure progress (What gets measured gets managed)
- You need to gauge the success of your cloud FinOps team and its cost optimization efforts, it’s crucial to define a clear set of KPI’s and success metrics that can accurately measure progress and drive financial accountability and value realization in your org.
- As you think about these metrics, make sure they are readily measurable that can be used as your team matures. You can begin to track unit economic metrics.
- Some metrics to consider:
  - Cloud Enablement: This could be done by measuring the number of business leaders trained or certified by the total number of cloud learners across the org
  - Cloud allocation or the amount of tagged costs to responsible business owners. This metric supports both showback and chargeback models and reflects the underlying effectiveness and accuracy of resource tagging and cost attribution
  - Cloud optimization realized savings: Allows the metric to keep a pulse on inefficiencies that exist in the organization and allows business to focus on achieving cost savings, thereby capturing the true value of running their workloads on cloud
  - Forecast Accuracy. Measuring forecast accuracy enables companies to understand what will happen and if they do what they plan and also allows for better control of cloud spend allocations
  - Finops Automation is measured by evaluating the number of automated recommendations implemented as a percentage of the total list of automated recommendations generated that result in cost savings.
5.Choose your tooling strategy carefully and reevaluate it frequently.
- Selecting the right tools for your cloud FinOps is critical. However, the cloud technology landscape is constantly evolving and new tools and services are introduced regularly.
- Consider the following factor when considering tools:
  - Scalability
  - Integration
  - Cost and ROi
  - Customization
  - Technology
  - User-Friendliness

52:17 📢 Matthew – “…make sure you have executive buy in probably you’re starting this whole fin op started because your CFO is freaking out about the bill, but you know, making sure that not just the CFO, you know, your CTO and other organization members all agreed. This is something you’re going to do so you don’t have one side of the house fighting the other and you’re sitting there in the middle just going cool. We’re here.”

57:35 📢 Ryan – “…it’s super fun to watch that transformation happen, from taking a dev team who hasn’t had any visibility into their costs to the initial stages of bewilderment of why is everything expensive, to actually making architecture choices based off of cost-driven data. And it’s not always comfortable, but almost every team that I’ve seen do that transformation, they’re excited by the end, right? It’s not like, oh, I had to do this and they’re jaded about it. And so like, it is one of those things where it allows some really cool decisions and it’s, you know, when you have the visibility, when you have that insight and, you know, like I said, access is clear and transparency is part of your culture.

Closing

And that is the week in the cloud! Just a reminder – if you’re interested in joining us as a sponsor, let us know! Check out our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod

Shared VPC

248: A Public Service Announcement on Shared VPCs in AWS: Don’t!