AI_Distilled #21: MLAgentBench as AI Research Agents, OpenAI’s Python SDK and AI Chip, AMD Acquires Nod.ai, IBM Enhances PyTorch for AI Inference, Microsoft to Tackle GPU Shortage

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

👋 Hello ,

“Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks -- we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents.”

- from the paper Benchmarking Large Language Models as AI Research Agents (arXivLabs, Oct 2023), proposed by Qian Huang, Jian Vora, Percy Liang, Jure Leskovec.

Stanford University researchers are addressing the challenge of evaluating AI research agents with free-form decision-making abilities through MLAgentBench, a pioneering benchmark. This framework provides research tasks with task descriptions and required files, allowing AI agents to mimic human researchers' actions like reading, writing, and running code. The evaluation assesses proficiency, reasoning, research process, and efficiency.

Welcome to AI_Distilled #21, your weekly source for the latest breakthroughs in AI, ML, GPT, and LLM. In this edition, we’ll talk about Microsoft and Google introducing new AI initiatives for healthcare, OpenAI unveiling the beta version of Python SDK for enhanced API access, IBM’s enhancement of PyTorch for AI inference, targeting enterprise deployment, and AMD working on enhancing its AI capabilities with the acquisition of Nod.ai and getting a quick look at OpenAI’s ambitious new ventures in AI chipmaking to tackle the global chip shortage.

We know how much you love our curated collection of AI tutorials and secret knowledge. We’ve packed some great knowledge resources in this issue covering recent advances in enhancing content safety with Azure ML, understanding autonomous agents for problem solving with LLMs, and enhancing code quality and security with Generative AI, Amazon Bedrock, and CodeGuru.

📥 Feedback on the Weekly Edition

What do you think of this issue and our newsletter?

Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion.

Complete the Survey. Get a Packt eBook for Free!

Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!

Cheers,

Merlyn Shelley

Editor-in-Chief, Packt

⚡ TechWave: AI/GPT News & Analysis

Microsoft and Google Introduce New Gen AI Initiatives for Healthcare: Microsoft and Alphabet's Google have unveiled separate AI initiatives to assist healthcare organizations in improving data access and information management. Google's project, powered by Google Cloud, aims to simplify the retrieval of patient data, including test results and prescriptions, in one central location. It also intends to help healthcare professionals with administrative tasks that often lead to work overload and burnout. Meanwhile, Microsoft's initiative is focused on enabling healthcare entities to efficiently aggregate data from various doctors and hospitals, eliminating the time-consuming search for information.

OpenAI Mulls Chip Independence Due to Rising Costs: OpenAI, known for its ChatGPT AI model, is considering developing its own AI chips due to the growing costs of using Nvidia's hardware. Each ChatGPT query costs OpenAI around 4 cents, and the company reportedly spends $700,000 daily to run ChatGPT. Nvidia accounts for over 70% of AI chip sales but is becoming costly for OpenAI. The organization has been in discussions about making its own chips but has not made a final decision. Microsoft is also exploring in-house chip development, potentially competing with Nvidia's H100 GPU. OpenAI may remain dependent on Nvidia for the time being.

Microsoft May Unveil AI Chip at Ignite 2023 to Tackle GPU Shortage: Microsoft is considering debuting its own AI chip at the upcoming Ignite 2023 conference due to the high demand for GPUs, with NVIDIA struggling to meet this demand. The chip would be utilized in Microsoft's data center servers and to enhance AI capabilities within its productivity apps. This move reflects Microsoft's commitment to advancing AI technology following a substantial investment in OpenAI. While Microsoft plans to continue purchasing NVIDIA GPUs, the development of its own AI chip could increase profitability and competitiveness with tech giants like Amazon and Google, who already use their custom AI chips.

OpenAI Unveils Beta Version of Python SDK for Enhanced API Access: OpenAI has released a beta version of its Python SDK, aiming to improve access to the OpenAI API for Python developers. This Python library simplifies interactions with the OpenAI API for Python-based applications, providing an opportunity for early testing and feedback ahead of the official version 1.0 launch. The SDK streamlines integration by offering pre-defined classes for API resources and ensuring compatibility across different API versions. OpenAI encourages developers to explore the beta version, share feedback, and shape the final release. The library supports various tasks, including chat completions, text model completions, embeddings, fine-tuning, moderation, image generation, and audio functions.

IBM Enhances PyTorch for AI Inference, Targeting Enterprise Deployment: IBM is expanding the capabilities of the PyTorch machine learning framework beyond model training to AI inference. The goal is to provide a robust, open-source alternative for inference that can operate on multiple vendor technologies and both GPUs and CPUs. IBM's efforts involve combining three techniques within PyTorch: graph fusion, kernel optimizations, and parallel tensors to speed up inference. Using these optimizations, they achieved impressive inference speeds of 29 milliseconds per token for a large language model with 70 billion parameters. While these efforts are not yet ready for production, IBM aims to contribute these improvements to the PyTorch project for future deployment, making PyTorch more enterprise-ready.

AMD Enhances AI Capabilities with Acquisition of Nod.ai: AMD has announced its intention to acquire Nod.ai, a startup focused on optimizing AI software for high-performance hardware. This acquisition underlines AMD's commitment to the rapidly expanding AI chip market, which is projected to reach $383.7 billion by 2032. Nod.ai's software, including the SHARK Machine Learning Distribution, will accelerate the deployment of AI models on platforms utilizing AMD's architecture. By integrating Nod.ai's technology, AMD aims to offer open software solutions to facilitate the deployment of highly performant AI models, thereby enhancing its presence in the AI industry.

🔮 Expert Insights from Packt Community

Machine Learning Engineering with MLflow - By Natu Lauchande

Developing your first model with MLflow

From the point of view of simplicity, in this section, we will use the built-in sample datasets in sklearn, the ML library that we will use initially to explore MLflow features. For this section, we will choose the famous Iris dataset to train a multi-class classifier using MLflow.

The Iris dataset (one of sklearn's built-in datasets available from https://scikit-learn.org/stable/datasets/toy_dataset.html) contains the following elements as features: sepal length, sepal width, petal length, and petal width. The target variable is the class of the iris: Iris Setosa, Iris Versocoulor, or Iris Virginica:

Load the sample dataset:

from sklearn import datasets
from sklearn.model_selection import train_test_split
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.4)

Next, let's train your model.

Training a simple machine model with a framework such as scikit-learn involves instantiating an estimator such as LogisticRegression and calling the fit command to execute training over the Iris dataset built in scikit-learn:

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)

The preceding lines of code are just a small portion of the ML Engineering process. As will be demonstrated, a non-trivial amount of code needs to be created in order to productionize and make sure that the preceding training code is usable and reliable. One of the main objectives of MLflow is to aid in the process of setting up ML systems and projects. In the following sections, we will demonstrate how MLflow can be used to make your solutions robust and reliable.

Then, we will add MLflow. With a few more lines of code, you should be able to start your first MLflow interaction. In the following code listing, we start by importing the mlflow module, followed by the LogisticRegression class in scikit-learn. You can use the accompanying Jupyter notebook to run the next section:

import mlflow
from sklearn.linear_model import LogisticRegression
mlflow.sklearn.autolog()
with mlflow.start_run():
clf = LogisticRegression()
clf.fit(X_train, y_train)

The mlflow.sklearn.autolog() instruction enables you to automatically log the experiment in the local directory. It captures the metrics produced by the underlying ML library in use. MLflow Tracking is the module responsible for handling metrics and logs. By default, the metadata of an MLflow run is stored in the local filesystem.

The above content is extracted from the book Machine Learning Engineering with MLflow written by Natu Lauchande and published in Aug 2021. To get a glimpse of the book's contents, make sure to read the free chapter provided here, or if you want to unlock the full Packt digital library free for 7 days, try signing up now! To learn more, click on the button below.

Read through the Chapter 1 unlocked here...

🌟 Secret Knowledge: AI/LLM Resources

Boosting Model Inference Speed with Quantization: In the realm of deploying deep learning models, efficiency is key. This post offers a primer on quantization, a technique that significantly enhances the inference speed of hosted language models. Quantization involves reducing the precision of data types used for weights and activations, such as moving from 32-bit floating point to 8-bit integers. While this may slightly affect model accuracy, the benefits are substantial: reduced memory usage, faster inference times, lower energy consumption, and the ability to deploy models on edge devices. The post explains two common approaches for quantization: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), helping you understand how to implement them effectively.

Unlocking Database Queries with Text2SQL: A Historical Perspective and Current Advancements: In this post, you'll explore the evolution of Text2SQL, a technology that converts natural language queries into SQL for interacting with databases. Beginning with rule-based approaches in the 1960s, it has transitioned to machine learning-based models, and now, LLMs like BERT and GPT have revolutionized it. Discover how LLMs enhance Text2SQL, the challenges it faces, and prominent products like Microsoft LayoutLM, Google TAPAS, Stanford Spider, and GuruSQL. Despite challenges, Text2SQL holds great promise for making database querying more convenient and intelligent in practical applications.

Enhancing Content Safety with Azure ML: Learn how to ensure content safety in Azure ML when using LLMs. By setting up Azure AI Content Safety and establishing a connection within Prompt Flow, you'll scrutinize user input before directing it to the LLM. The article guides you through constructing the flow, including directing input to content safety, analyzing results, invoking the LLM, and consolidating the final output. With this approach, you can prevent unwanted responses from LLM and ensure content safety throughout the interaction.

💡 Masterclass: AI/LLM Tutorials

Understanding Autonomous Agents for Problem Solving with LLMs: In this post, you'll explore the concept of autonomous LLM-based agents, how they interact with their environment, and the key modules that make up these agents, including the Planner, Reasoner, Actioner, Executor, Evaluator, and more. Learn how these agents utilize LLMs' inherent reasoning abilities and external tools to efficiently solve intricate problems while avoiding the limitations of fine-tuning.

Determining the Optimal Chunk Size for a RAG System with LlamaIndex: When working with retrieval-augmented generation (RAG) systems, selecting the right chunk size is a crucial factor affecting efficiency and accuracy. This post introduces LlamaIndex's Response Evaluation module, providing a step-by-step guide on how to find the ideal chunk size for your RAG system. Considering factors like relevance, granularity, and response generation time, the optimal balance typically found around 1024 for a RAG system.

Understanding the Power of Rouge Score in Model Evaluation: Evaluating the effectiveness of fine-tuned language models like Mistral 7B Instruct Model requires a reliable metric, and the Rouge Score is a valuable tool. This article provides a step-by-step guide on how to use the Rouge Score to compare finetuned and base language models effectively. This assesses the similarity of words generated by a model to reference words provided by humans using unigrams, bigrams, and n-grams. Mastering this metric, you'll be able to make informed decisions when choosing between different model versions for specific tasks.

Enhancing Code Quality and Security with Generative AI, Amazon Bedrock, and CodeGuru: In this post, you'll learn how to use Amazon CodeGuru Reviewer, Amazon Bedrock, and Generative AI to enhance the quality and security of your code. Amazon CodeGuru Reviewer provides automated code analysis and recommendations, while Bedrock offers insights and code remediation. The post outlines a detailed solution involving CodeCommit, CodeGuru Reviewer, and Bedrock.

Exploring Generative AI with LangChain and OpenAI: Enhancing Amazon SageMaker Knowledge: In this post, the author illustrates the process of hosting a Machine Learning Model with the Generative AI ecosystem, using LangChain, a Python framework that simplifies Generative AI applications, and OpenAI's LLMs. The goal is to see how well this solution can answer SageMaker-related questions, addressing the challenge of LLMs lacking access to specific and recent data sources.

🚀 HackHub: Trending AI Tools

leptonai/leptonai: ̉Python library for simplifying AI service creation, offering a Pythonic abstraction (Photon) for converting research code into a service, simplified model launching, prebuilt examples, and AI-specific features.

okuvshynov/slowllama: Enables developers to fine-tune Llama2 and CodeLLama models, including 70B/35B, on Apple M1/M2 devices or Nvidia GPUs, emphasizing fine-tuning without quantization.

yaohui-wyh/ctoc: A lightweight tool for analyzing codebases at the token level, which is crucial for understanding and managing the memory and conversation history of LLMs.

eric-ai-lab/minigpt-5: ̉A model for interleaved vision-and-language generation using generative vokens to enable the simultaneous generation of images and textual narratives, particularly in the context of multimodal applications.