Generative AI | 2 articles | Tech News, Tutorials & Expert Insights

How to Face a Critical RAG-driven Generative AI Challenge

Mr. Denis Rothman

06 Nov 2024

15 min read

This article is an excerpt from the book, "RAG-Driven Generative AI", by Denis Rothman. Explore the transformative potential of RAG-driven LLMs, computer vision, and generative AI with this comprehensive guide, from basics to building a complex RAG pipeline.IntroductionOn a bright Monday morning, Dakota sits down to get to work and is called by the CEO of their software company, who looks quite worried. An important fire department needs a conversational AI agent to train hundreds of rookie firefighters nationwide on drone technology. The CEO looks dismayed because the data provided is spread over many websites around the country. Worse, the management of the fire department is coming over at 2 PM to see a demonstration to decide whether to work with Dakata’s company or not. Dakota is smiling. The CEO is puzzled. Dakota explains that the AI team can put a prototype together in a few hours and be more than ready by 2 PM and get to work. The strategy is to divide the AI team into three sub-teams that will work in parallel on three pipelines based on the reference Deep Lake, LlamaIndex and OpenAI RAG program* they had tested and approved a few weeks back. Pipeline 1: Collecting and preparing the documents provided by the fire department for this Proof of Concept(POC). Pipeline 2: Creating and populating a Deep Lake vector store with the first batch of documents while the Pipeline 1 team continues to retrieve and prepare the documents. Pipeline 3: Indexed-based RAG with LlamaIndex’s integrated OpenAI LLM performed on the first batch of vectorized documents. The team gets to work at around 9:30 AM after devising their strategy. The Pipeline 1 team begins by fetching and cleaning a batch of documents. They run Python functions to remove punctuation except for periods and noisy references within the content. Leveraging the automated functions they already have through the educational program, the result is satisfactory. By 10 AM, the Pipeline 2 team sees the first batch of documents appear on their file server. They run the code they got from the RAG program* to create a Deep Lake vector store and seamlessly populate it with an OpenAI embedding model, as shown in the following excerpt: from llama_index.core import StorageContext vector_store_path = "hub://denis76/drone_v2" dataset_path = "hub://denis76/drone_v2" # overwrite=True will overwrite dataset, False will append it vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True) Note that the path of the dataset points to the online Deep Lake vector store. The fact that the vector store is serverless is a huge advantage because there is no need to manage its size, storage process and just begin to populate it in a few seconds! Also, to process the first batch of documents, overwrite=True, will force the system to write the initial data. Then, starting the second batch, the Pipeline 2 team can run overwrite=False, to append the following documents. Finally, LlamaIndex automatically creates a vector store index: storage_context = StorageContext.from_defaults(vector_store=vector_store) # Create an index over the documents index = VectorStoreIndex.from_documents(documents, storage_context=storage_context) By 10:30 AM, the Pipeline 3 team can visualize the vectorized(embedded) dataset in their Deep Lake vector store. They create a LlamaIndex query engine on the dataset: from llama_index.core import VectorStoreIndex vector_store_index = VectorStoreIndex.from_documents(documents) … vector_query_engine = vector_store_index.as_query_engine(similarity_top_k=k, temperature=temp, num_output=mt) Note that the OpenAI Large Language Model is seamlessly integrated into LlamaIndex with the following parameters: k, in this case, k=3, specifies the number of documents to retrieve from the vector store. The retrieval is based on the similarity of embedded user inputs and embedded vectors within the dataset. temp, in this case temp=0.1, determines the randomness of the output. A low value such as 0.1 forces the similarity search to be precise. A higher value would allow for more diverse responses, which we do not want for this technological conversational agent. mt, in this case, mt=1024, determines the maximum number of tokens in the output. A cosine similarity function was added to make sure that the outputs matched the sample user inputs: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') def calculate_cosine_similarity_with_embeddings(text1, text2):     embeddings1 = model.encode(text1)     embeddings2 = model.encode(text2)     similarity = cosine_similarity([embeddings1], [embeddings2])     return similarity[0][0] By 11:00 AM, all three pipeline teams are warmed up and ready to go full throttle! While the Pipeline 2 team was creating the vector store and populating it with the first batch of documents, the Pipeline 1 team prepared the next several batches. At 11:00 AM, Dakota gave the green light to run all three pipelines simultaneously. Within a few minutes, the whole RAG-driven generative AI system was humming like a beehive! By 1:00 PM, Dakota and the three pipeline teams were working on a PowerPoint slideshow with a copilot. Within a few minutes, it was automatically generated based on their scenario. At 1:30 PM, they had time to grab a quick lunch. At 2:00 pm, the fire department management, Dakota’s team, and the CEO of their software company were in the meeting room. Dakota’s team ran the PowerPoint slide show and began the demonstration with a simple input: user_input="Explain how drones employ real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions." The response displayed was satisfactory: Drones utilize real-time image processing and machine learning algorithms to accurately detect events in various environmental conditions by analyzing data captured by their sensors and cameras. This technology allows drones to process visual information quickly and efficiently, enabling them to identify specific objects, patterns, or changes in the environment in real-time. By employing these advanced algorithms, drones can effectively monitor and respond to different situations, such as wildfires, wildlife surveys, disaster relief efforts, and agricultural monitoring with precision and accuracy. Dakota’s team then showed that the program could track and display the original documents the response was based on. At one point, the fire department’s top manager, Taylor, exclaimed, “Wow, this is impressive! It’s exactly what we were looking for! " Of course, Dakato’s CEO began discussing the number of users, cost, and timelines with Taylor. In the meantime, Dakota and the rest of the fire department’s team went out to drink some coffee and get to know each other. Fire departments intervene at short notice efficiently for emergencies. So can expert-level AI teams! https://github.com/Denis2054/RAG-Driven-Generative-AI/blob/main/Chapter03/Deep_Lake_LlamaIndex_OpenAI_RAG.ipynb ConclusionIn facing a high-stakes, time-sensitive challenge, Dakota and their AI team demonstrated the power and efficiency of RAG-driven generative AI. By leveraging a structured, multi-pipeline approach with tools like Deep Lake, LlamaIndex, and OpenAI’s advanced models, the team was able to integrate scattered data sources quickly and effectively, delivering a sophisticated, real-time conversational AI prototype tailored for firefighter training on drone technology. Their success showcases how expert planning, resourceful use of AI tools, and teamwork can transform a complex project into a streamlined solution that meets client needs. This case underscores the potential of generative AI to create responsive, practical solutions for critical industries, setting a new standard for rapid, high-quality AI deployment in real-world applications.Author Bio Denis Rothman graduated from Sorbonne University and Paris-Diderot University, and as a student, he wrote and registered a patent for one of the earliest word2vector embeddings and word piece tokenization solutions. He started a company focused on deploying AI and went on to author one of the first AI cognitive NLP chatbots, applied as a language teaching tool for Mo�t et Chandon (part of LVMH) and more. Denis rapidly became an expert in explainable AI, incorporating interpretable, acceptance-based explanation data and interfaces into solutions implemented for major corporate projects in the aerospace, apparel, and supply chain sectors. His core belief is that you only really know something once you have taught somebody how to do it.

0
0
1178

Mastering Machine Learning: Best Practices and the Future of Generative AI for Software Engineers

Miroslaw Staron

25 Oct 2024

10 min read

IntroductionThe field of machine learning (ML) and generative AI has rapidly evolved from its foundational concepts, such as Alan Turing's pioneering work on intelligence, to the sophisticated models and applications we see today. While Turing’s ideas centered on defining and detecting intelligence, modern applications stretch the definition and utility of intelligence in the realm of artificial neural networks, language models, and generative adversarial networks. For software engineers, this evolution presents both opportunities and challenges, from creating intelligent models to harnessing tools that streamline development and deployment processes. This article explores the best practices in machine learning, insights on deploying generative AI in real-world applications, and the emerging tools that software engineers can utilize to maximize efficiency and innovation.Exploring Machine Learning and Generative AI: From Turing’s Legacy to Today's Best Practices When Alan Turing developed his famous Turing test for intelligence, computers, and software were completely different from what we are used to now. I’m certain that Turing did not think about Large Language Models (LLMs), Generative AI (GenAI), Generative Adversarial Networks, or Diffusers. Yet, this test for intelligence is equally useful today as it was at the time when it was developed. Perhaps our understanding of intelligence has evolved since then. We consider intelligence on different levels, for example, at the philosophical level and the computer science level. At the philosophical level, we still try to understand what intelligence really is, how to measure it, and how to replicate it. At the computer science level, we develop new algorithms that can tackle increasingly complex problems, utilize increasingly complex datasets, and provide more complex output. In the following figure, we can see two different solutions to the same problem. On the left-hand side, the solution to the Fibonacci problem uses good old-fashioned programming where the programmer translates the solution into a program. On the right-hand side, we see a machine learning solution – the programmer provides example data and uses an algorithm to find the pattern just to replicate it later. Figure 1. Fibonacci problem solved with a traditional algorithm (left-hand side) and machine learning’s linear regression (right-hand side). Although the traditional way is slow, it can be mathematically proven to be correct for all numbers, whereas the machine learning algorithm is fast, but we do not know if it renders correct results for all numbers. Although the above is a simple example, it illustrates that the difference between a model and an algorithm is not that great. Essentially, the machine learning model on the right is a complex function that takes an input and produces an output. The same is true for the generative AI models. Generative AI Generative AI is much more complex than the algorithms used for Fibonacci, but it works in the same way – based on the data it creates new output. Instead of predicting the next Fibonacci number, LLMs predict the next token, and diffusers predict values of new pixels. Whether that is intelligence, I am not qualified to judge. What I am qualified to say is how to use these kinds of models in modern software engineering. When I wrote the book Machine Learning Infrastructure and Best Practices for Software Engineers1, we could see how powerful ChatGPT 3.5 is. In my profession, software engineers use it to write programs, debug them and even to improve the performance of the programs. I call it being a superprogrammer. Suddenly, when software engineers get these tools, they become team leader for their bots, who support them – these bots are the copilots for the software engineers. But using these tools and models is just the beginning. Harnessing NPUs and Mini LLMs for Efficient AI Deployment Neural Processing Units (NPUs) have started to become more popular in modern computers, which addresses the challenges with running language models locally, without the access to internet. The local execution reduces latency and reduces security risks of hijacking information when it is sent between the model and the client. However, the NPUs are significantly less powerful than data centers, and therefore we can only use them with small language models – so-called mini-LLMs. An example of such a model is Phi-3-mini model developed by Microsoft2. In addition to NPUs, frameworks like ONNX appeared, which made it possible to quickly interchange models between GPUs and NPUs – you could train the model on a powerful GPU and use it on a small NPU thanks to these frameworks. Since AI take so much space in modern hardware and software, GeekbenchAI3 is a benchmark suite that allows us to quantify and compare AI capabilities of modern hardware. I strongly recommend to take it for a spin to check what we can do with the hardware that we have at hands. Now, hardware is only as good as the software, and there, we also saw a lot of important improvements. Ollama and LLM frameworks In my book, I presented the methods and tools to work with generative AI (as well as the classical ML). It’s a solid foundation for designing, developing, testing and deploying AI systems. However, if we want to utilize LLMs without the hassle of setting up the entire environment, we can use frameworks like Ollama4. The Ollama framework seamlessly downloads and deploys LLMs on a local machine if we have enough resources. Once installing the framework, we can type ollama run phi-3 to start a conversation with the model. The framework provides a set of user interfaces, web services and other types of mechanisms needed to construct a fully-fledged machine learning software5. We can use it locally for all kinds of tasks, e.g., in finance6 . What’s Next: Embracing the Future of AI in Software Engineering As generative AI continues to evolve, its role in software engineering is set to expand in exciting ways. Here are key trends and opportunities that software engineers should focus on to stay ahead of the curve: Mastering AI-Driven Automation: AI will increasingly take over repetitive programming and testing tasks, allowing engineers to focus on more creative and complex problems. Engineers should leverage AI tools like GitHub Copilot and Ollama to automate mundane tasks such as bug fixing, code refactoring, and even performance optimization. Actionable Step: Start integrating AI-driven tools into your development workflow. Experiment with automating unit tests, continuous integration pipelines, or even deployment processes using AI. AI-Enhanced Collaboration: Collaboration with AI systems, or "AI copilots," will be a crucial skill. The future of software engineering will involve not just individual developers using AI tools but entire teams working alongside AI agents that facilitate communication, project management, and code integration. Actionable Step: Learn to delegate tasks to AI copilots and explore collaborative platforms that integrate AI to streamline team efforts. Tools like Microsoft Teams and Github Copilot integrated with AI assistants are a good start. On-device AI and Edge Computing: The rise of NPUs and mini-LLMs signals a shift towards on-device AI processing. This opens opportunities for real-time AI applications in areas with limited connectivity or stringent privacy requirements. Software engineers should explore how to optimize and deploy AI models on edge devices. Actionable Step: Experiment with deploying AI models on edge devices using frameworks like ONNX and test how well they perform on NPUs or embedded systems. To stay competitive and relevant, software engineers need to continuously adapt by learning new AI technologies, refining their workflows with AI assistance, and staying attuned to emerging ethical challenges. Whether by mastering AI automation, optimizing edge deployments, or championing ethical practices, the future belongs to those who embrace AI as both a powerful tool and a collaborative partner. For software engineers ready to dive deeper into the transformative world of machine learning and generative AI, Machine Learning Infrastructure and Best Practices for Software Engineers offers a comprehensive guide packed with practical insights, best practices, and hands-on techniques.ConclusionAs generative AI technologies continue to advance, software engineers are at the forefront of a new era of intelligent and automated development. By understanding and implementing best practices, engineers can leverage these tools to streamline workflows, enhance collaborative capabilities, and push the boundaries of what is possible in software development. Emerging hardware solutions like NPUs, edge computing capabilities, and advanced frameworks are opening new pathways for deploying efficient AI solutions. To remain competitive and innovative, software engineers must adapt to these evolving technologies, integrating AI-driven automation and collaboration into their practices and embracing the future with curiosity and responsibility. This journey not only enhances technical skills but also invites engineers to become leaders in shaping the responsible and creative applications of AI in software engineering.Author BioMiroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner’s Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.

0
0
1301

Author Posts - Generative AI

How to Face a Critical RAG-driven Generative AI Challenge

Mastering Machine Learning: Best Practices and the Future of Generative AI for Software Engineers

Trending Topics