246: The CloudPod Will Never Type localllm Correctly

Cloud Pod Header
246: The CloudPod Will Never Type localllm Correctly
74 / 100

Welcome to episode 246 of The CloudPod podcast, where the forecast is always cloudy! This week we’re discussion localllm and just why they’ve saddled us all with that name, saying goodbye to Bard and hello to Gemini Pro, and discussing the pros and cons of helping skynet to eradicate us all. All that and more cloud and AI news, now available for your listening nightmares. 

Titles we almost went with this week:

🌍Oracle says hold my beer on Africa

🧓The Cloud Pod Thinks the LLM Maturity Model has More Maturing To Do

🐦There is a Finch Windows Canary in Fargate

😨New LLM Nightmares

⌨️The Cloud Pod Will Never Type localllm Correctly

A big thanks to this week’s sponsor:

We’re sponsorless this week! Interested in sponsoring us and having access to a very specialized and targeted market? We’d love to talk to you. Send us an email or hit us up on our Slack Channel. 

General News

It’s Earnings Time! 

01:42 Microsoft issues light guidance even as Azure growth drives earnings beat 

  • Microsoft shares were up after they reported earnings of 2.93 per share vs expectations of 2.73 per share.  
    • Revenue was 62.02 billion vs 61.12 billion. 
  • This represents a 17.6% year over year in the quarter. 
  • The intelligent cloud segment produced $25.88 billion in revenue, up 20% and above the $25.29 billion consensus among analysts surveyed by Streets Accounts. 
  • Revenue from Azure and other cloud services grew 30%, when analysts only expected 27.7%.  
    • Six points are tied to AI as Microsoft now has 53,000 Azure AI customers and 1/3rd are new in the past year (per Microsoft.) 

02:46 📢 Justin- “I don’t think the count the Open AI customers, do you? Because there’s way more people that have Open AI usage than 53,000. So I think this is legitimately Azure AI – which is Open AI under the hood – but specifically paying for that subscription.”

04:19 Alphabet shares slide on disappointing Google ad revenue  

  • Alphabet reported better-than-expected revenue and profit for the fourth quarter, but ad revenue trailed analysts projections. 
  • Earnings per share were 1.64 vs 1.59 expected. 
    • Revenue of 86.31 billion vs 85.33 billion expected 
  • Google Cloud was 9.19 Billion vs 8.94 billion expected, according to Street. 
  • That represents a 26% expansion in the fourth quarter. 

04:51 📢 Justin- “…which is interesting, because you would expect that they’d have similar growth being tied to Bard and Gemini to be close to what Microsoft is doing.”

12:02 Amazon reports better-than-expected results as revenue jumps 14% 

  • Amazon also exceeded analysis expectations.  
    • Earnings per share 1.00 vs 80 cents expected.  
  • Revenue of 170 billion vs 166.2 billion per share
  • AWS came in on expectations of 24.2 billion. 
  • AWS growth was 13% in the fourth quarter; a slight uptick to 13% from 12%. 

14:19 📢 Jonathan – “I think AI is great for tinkering right now, but I think the cloud that’s going to win – and I suspect it’s going to be Amazon despite Google’s early lead – will be the cloud that provides the best tooling around SDLC.”

AI is Going Great (or how ML Makes all Its Money)

17:22 Building an early warning system for LLM-aided biological threat creation 

  • And now, for your nightmare fuel – we have a new horror scenario we hadn’t thought of
  • Open AI is getting ahead of people using LLM’s to aid in the creation of biological threats. Good times! 
  • They have built a preparedness framework and are working with researchers and policymakers to ensure that these systems aren’t used for ill will. 
    • And if you’ll excuse us, we’ll be spending the rest of the episode installing air filters and sealing our homes. 

22:15 📢 Justin- “We assumed Skynet takes us out with nuclear weapons; but we’re teaching it how to make biological weapons. That’ll work even better!”


22:44 Finch Container Development Tool: Now for Windows 

  • Finch or the Podman/Lima alternative is now available to Windows.  
  • Finch gives you a local developer tool so that container developers can work with Linux containers on non-linux operating systems, initially available for macOS.  
  • They built out their support by contributing the Windows Subsystems for Linux (WSL2) to Lima one of the core components of Finch.  

24:50 AWS Free Tier now includes 750 hours of free Public IPv4 addresses, as charges for Public IPv4 begin

  • AWS free tier for Amazon EC2 now supports Ipv4 for 750 hours of Free public IpV4 addresses as they have now officially begun charging for IPV4 addresses. 

24:58 📢  Justin – “So, thank you for the free ones, but also, I just got a really big increase in my bill for all the IPV4 addresses you have that I can’t turn off because you don’t support IPV 6 on those services yet…I really don’t appreciate it. And those 750 free hours? Amazon – you can shove them somewhere.”

27:40 Amazon FSx for OpenZFS now supports up to 400,000 IOPS  

  • FSX for OpenZFS now provides fully managed file storage powered by OpenZFS file system, now provides 14% higher levels of I/O operations per second at no additional cost, bringing the new maximum IOPS level to 400,000. 
  • The increased IOPS level enables you to improve price-performance for IOPS-intensive workloads like Oracle databases and optimize costs for workloads like periodic reporting jobs with IOPS requirements that vary over time. 

29:00 Announcing CDK Migrate: A single command to migrate to the AWS CDK 

  • AWS is announcing CDK migrate, a component of the AWS cloud development kit (CDK).  
  • This feature enables users to migrate AWS Cloudformation templates, previously deployed cloudformation stacks, or resources created outside of IaC into a CDK application.  
  • This feature is being launched with the Cloudformation IAC Generator, which helps customers import resources created outside of Cloudformation into a template and into a new generated, fully managed cloudformation stack. 
  • While it’s a good and recommended practice to manage the lifecycle of resources using IaC, there can be an on-ramp to getting started.
    •  For those that aren’t ready to use IaC, it is likely that they use the console to create the resources and update them accordingly

29:51 📢  Ryan – “I like features like this, just because anything where you’re taking your resources where you’ve deployed and being able to configure them into a stateful representation I think is a neat tool. It’s super powerful for development.”

40:14 AWS Fargate announces a price reduction for Windows containers on Amazon ECS 

  • We are excited to announce that AWS Fargate for WIndows containers on Amazon ECS has reduced infrastructure pricing up to 49%.  
  • Fargate simplifies the adoption of modern container technology for ECs customers by making it easier to run their windows containers on AWS.  
  • With Fargate, customers no longer need to set up automatic scaling groups or managed host instances for their application.  
  • You can get more information on pricing here.

40:44 📢  Justin – “If you HAVE to run Windows containers, this is the *only* way I’d recommend…which, I guess having a price cut is pretty nice. But if this is your model of deployment – try something else. Please.” 


42:33 Firestore Multiple Databases is now generally available

  • Today, we are announcing the general availability of Firestore Multiple Databases, which lets you manage multiple Firestore databases within a single Google Cloud project, enhancing data separation, security, resource management and cost tracking. With this milestone, multiple databases are now fully supported in the Google Cloud console, Terraform resources and all of Firestores SDKs. 
  • Each Firestore database operates with independent isolation, ensuring robust data separation and performance. 

44:14 📢  Ryan – “We were laughing before the show because we all learned that this was a limitation, and it’s crazy… don’t get me started on how Google provides their managed services; I’m sure that’s what this is. The way they implemented it required these backend connections into your project through your network.”

45:31 Heita South Africa! The new Google Cloud region is now open in Johannesburg   

  • The Johannesburg cloud region in South Africa is now ready for customer use.  
  • Google will hold an official launch even later this year to celebrate the opening. 
  • We’re looking for invites to the launch party. Someone hit us up. 

46:12 Bard’s latest updates: Access Gemini Pro globally and generate images 

  • Gemini Pro is now available in Bard.  
    • This includes support for more languages and places as well as image generation. 
  • 40 languages in more than 230 countries and territories
  • Large Model Systems Organization, a leading evaluation of language models and chatbots across languages, recently shared that Bard with Gemini Pro is one of the most preferred chatbots available. 
  • After the cutoff for the show they announced they’re killing Bard, and they are now both Gemini. Rip Bard. 

48:17📢 Jonathan – “I think this just confirms our suspicions that Bard was rushed out the door in response to Chat GPT.” 

48:51 No GPU? No problem. localllm lets you develop gen AI apps on local CPUs

  • In today’s fast-paced AI landscape, developers face numerous challenges when it comes to building applications that use LLMs.  
    • In particular, the scarcity of GPUs. In this blog post, they introduce a novel solution that allows developers to harness the power of LLMs locally on CPU and memory, right within cloud workstations, Google cloud’s fully managed development environment. So now Google gets to make money on you! Huzzah! 
  • By using a combination of quantized models, cloud workstations and a new open-source tool named localllm, you can develop AI-based applications on a well-equipped development workstation, leveraging existing processes and workflow. 
  • Quantized models are AI models that have been optimized to run on local devices with limited computational resources.  
  • Localllm is a set of tools and libraries that provides easy access to quantized models from Hugging Face through a command line utility.  Localllm can be a game changer for developers seeking to leverage LLMs without the constraints of GPU availability.
    • GPU-free LLM execution:  lets you execute LLMs on CPU and memory, removing the need for scarce GPU resources, so you can integrate LLMs into your application development workflows, without compromising performance or productivity.
    • Enhanced productivity: With localllm, you use LLMs directly within the Google Cloud ecosystem. This integration streamlines the development process, reducing the complexities associated with remote server setups or reliance on external services. Now, you can focus on building innovative applications without managing GPUs.
    • Cost efficiency: By leveraging localllm, you can significantly reduce infrastructure costs associated with GPU provisioning. The ability to run LLMs on CPU and memory within the Google Cloud environment lets you optimize resource utilization, resulting in cost savings and improved return on investment.
    • Improved data security: Running LLMs locally on CPU and memory helps keep sensitive data within your control. With localllm, you can mitigate the risks associated with data transfer and third-party access, enhancing data security and privacy.
    • Seamless integration with Google Cloud services: localllm integrates with various Google Cloud services, including data storage, machine learning APIs, or other Google Cloud services, so you can leverage the full potential of the Google Cloud ecosystem. 

50:12 📢  Jonathan – “I’m pretty sure they’ve chosen that name just for SEO. This is named purely for SEO, because everyone is searching for Local Llama right now, and that’s Meta’s tool, and you can already run those models locally with the same technology and techniques to quantize the model…this is totally a hack on people who are already running Local Llama.”


56:36 Achieve generative AI operational excellence with the LLMOps maturity model

  • Microsoft is defining the LLM maturity for a new industry. 
  • Isn’t it brand new? So the maturity model is definitely going to be wrong. You heard it here first. 

58:05 📢  Justin – “This is so junior at this moment in time. It’s just covering LLM usage; it’s not covering LLM development or any other LLM use cases. And I expect that in a year it’s just laughed at.”


1:01:24 OCI announces plans to expand in Africa 

  • Oracle is announcing their intent to open a new public cloud region in Kenya.  This will be part of Oracle’s broader strategy for Africa, this region will expand OCI’s footprint on the continent, including the Johannesburg region, which is also an Azure Interconnect location


And that is the week in the cloud! Just a reminder – if you’re interested in joining us as a sponsor, let us know! Check out our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.