LLM | 81 articles | Tech News, Tutorials & Expert Insights

article-image-exploring-token-generation-strategies

28 Aug 2023

8 min read

Exploring Token Generation Strategies

28 Aug 2023

0
0
3152

article-image-text-classification-with-transformers

Saeed Dehqan

28 Aug 2023

9 min read

Text Classification with Transformers

Saeed Dehqan

28 Aug 2023

9 min read

IntroductionThis blog aims to implement binary text classification using a transformer architecture. If you're new to transformers, the "Transformer Building Blocks" blog explains the architecture and its text generation implementation. Beyond text generation and translation, transformers serve classification, sentiment analysis, and speech recognition. The transformer model comprises two parts: an encoder and a decoder. The encoder extracts features, while the decoder processes them. Just as a painter with tree features can draw, describe, visualize, categorize, or write about a tree, transformers encode knowledge (encoder) and apply it (decoder). This dual-part process is pivotal for text classification with transformers, allowing them to excel in diverse tasks like sentiment analysis, illustrating their transformative role in NLP.Deep Dive into Text Classification with TransformersWe train the model on the IMDB dataset. The dataset is ready and there’s no preprocessing needed. The model is vocab-based instead of character-based so that the model can converge faster. I limited the dataset vocabs to the 20000 most frequent vocabs. I also reduced the sequence to 200 so we can train faster. I tried to simplify the model and use torch.nn.MultiheadAttention it instead of writing the Multihead-attention ourselves. It makes the model faster since the nn.MultiheadAttention uses scaled_dot_product_attention under the hood. But if you want to know how MultiheadAttention works you can study the transformer building blocks blog or see the code here.Okay, now, let us add the feature extractor part:class transformer_block(nn.Module): def __init__(self): super(block, self).__init__() self.attention = nn.MultiheadAttention(embeds_size, num_heads, batch_first=True) self.ffn = nn.Sequential( nn.Linear(embeds_size, 4 * embeds_size), nn.LeakyReLU(), nn.Linear(4 * embeds_size, embeds_size), ) self.drop1 = nn.Dropout(drop_prob) self.drop2 = nn.Dropout(drop_prob) self.ln1 = nn.LayerNorm(embeds_size, eps=1e-6) self.ln2 = nn.LayerNorm(embeds_size, eps=1e-6) def forward(self, hidden_state): attn, _ = self.attention(hidden_state, hidden_state, hidden_state, need_weights=False) attn = self.drop1(attn) out = self.ln1(hidden_state + attn) observed = self.ffn(out) observed = self.drop2(observed) return self.ln2(out + observed)● hidden_state: A tensor with a shape (batch_size, block_size, embeds_size) goes to the transformer_block and a tensor with the same shape goes out of it.● self.attention: The transformer block tries to combine the information of tokens so that each token is aware of its neighbors or other tokens in the context. We may call this part the communication part. That’s what the nn.MultiheadAttention does. nn.MultiheadAttention is a ready multihead attention layer that can be faster than implementing it from scratch, just like what we did in the “Transformer Building Blocks” blog. The parameters of nn.MultiheadAttention are as follows: ○ embeds_size: token embedding size ○ num_heads: multihead, as the name suggests, consists of multiple heads and each head works on different parts of token embeddings. Suppose, your input data has shape (B,T,C) = (10, 32, 16). The token embedding size for this data is 16. If we specify the num_heads parameter to 2(divisible by 16), the multi-head splits data into two parts with shape (10, 32, 8). The first head works on the first part and the second head works on the second part. This is because transforming data into different subspaces can help the model to see different aspects of the data. Please note that the num_heads should be divisible by the embedding size so that at the end we can concatenate the split parts. ○ batch_first: True means the first dimension is batch.● Dropout: After the attention layer, the communication between tokens is closed and computations on tokens are done individually. We run a dropout on tokens. Dropout is a method of regularization. Regularization helps the training process to be based on generalization, not memorization. Without regularization, the model tries to memorize the training set and has poor performance on the test set. The dropout method turns off features with a probability of drop_prob.● self.ln1: Layer normalization normalizes embeddings so that they have zero mean and standard deviation one.● Residual connection: hidden_state + attn: Observe that before normalization, we added the input to the output of multihead attention, named residual connection. It has two benefits: ○ It helps the model to have the unchanged embedding information. ○ It helps to prevent gradient vanishing, which is common in deep networks where we stack multiple transformer layers.● self.ffn: After dropout, residual connection, and normalization, we forward data into a simple non-linear neural network to adjust the tokens one by one for better representation.● self.ln2(out + observed): Finally, another dropout, residual connection, and layer normalization.The transformer block is ready. And here is the final piece:class transformer(nn.Module): def __init__(self): super(transformer, self).__init__() self.tok_embs = nn.Embedding(vocab_size, embeds_size) self.pos_embs = nn.Embedding(block_size, embeds_size) self.block = block() self.ln1 = nn.LayerNorm(embeds_size) self.ln2 = nn.LayerNorm(embeds_size) self.classifier_head = nn.Sequential( nn.Linear(embeds_size, embeds_size), nn.LeakyReLU(), nn.Dropout(drop_prob), nn.Linear(embeds_size, embeds_size), nn.LeakyReLU(), nn.Linear(embeds_size, num_classes), nn.Softmax(dim=1), ) print("number of parameters: %.2fM" % (self.num_params()/1e6,)) def num_params(self): n_params = sum(p.numel() for p in self.parameters()) return n_params def forward(self, seq): B,T = seq.shape embedded = self.tok_embs(seq) embedded = embedded + self.pos_embs(torch.arange(T, device=device)) output = self.block(embedded) output = output.mean(dim=1) output = self.classifier_head(output) return output● self.tok_embs: nn.Embedding is like a lookup table that receives a sequence of indices, and returns their corresponding embeddings. These embeddings will receive gradients so that the model can update them to make better predictions.● self.tok_embs: To comprehend a sentence, you not only need words, you also need to have the order of words. Here, we embed positions and add them to the token embeddings. In this way, the model has both words and their order.● self.block: In this model, we only use one transformer block, but you can stack more blocks to get better results.● self.classifier_head: This is where we put the extracted information into action to classify the sequence. We call it the transformer head. It receives a fixed-size vector and classifies the sequence. The softmax as the final activation function returns a probability distribution for each class.● self.tok_embs(seq): Given a sequence of indices (batch_size, block_size), it returns (batch_size, block_size, embeds_size).● self.pos_embs(torch.arange(T, device=device)): Given a sequence of positions, i.e. [0,1,2], it returns embeddings of each position. Then, we add them to the token embeddings.● self.block(embedded): The embedding goes to the transformer block to extract features. Given the embedded shape (batch_size, block_size, embeds_size), the output has the same shape (batch_size, block_size, embeds_size).● output.mean(dim=1): The purpose of using mean is to aggregate the information from the sequence into a compact representation before feeding it into self.classifier_head. It helps in reducing the spatial dimensionality and extracting the most important features from the sequence. Given the input shape (batch_size, block_size, embeds_size), the output shape is (batch_size, embeds_size). So, one fixed-size vector for each batch.● self.classifier_head(output): And here we classify.The final code can be found here. The remaining code consists of downstream tasks such as the training loop, loading the dataset, setting the hyperparameters, and optimizer. I used RMSprop instead of Adam and AdamW. I also used BCEWithLogitsLoss instead of cross-entropy loss. BCE(Binary Cross Entropy) is for binary classification models and it combines sigmoid with cross entropy and it is numerically more stable. I also empirically got better accuracy. After 30 epochs, the final accuracy is ~84%.ConclusionThis exploration of text classification using transformers reveals their revolutionary potential. Beyond text generation, transformers excel in sentiment analysis. The encoder-decoder model, analogous to a painter interpreting tree feature, propels efficient text classification. A streamlined practical approach and the meticulously crafted transformer block enhance the architecture's robustness. Through optimization methods and loss functions, the model is honed, yielding an empirically validated 84% accuracy after 30 epochs. This journey highlights transformers' disruptive impact on reshaping AI-driven language comprehension, fundamentally altering the landscape of Natural Language Processing.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc.

0
0
2798

article-image-detecting-anomalies-using-llm-sentence-embeddings

Alan Bernardo Palacio

21 Aug 2023

18 min read

Detecting Anomalies Using LLM Sentence Embeddings

Alan Bernardo Palacio

21 Aug 2023

18 min read

0
0
1055

article-image-deploying-llm-models-in-kubernetes-with-kfserving

Alan Bernardo Palacio

21 Aug 2023

14 min read

Deploying LLM Models in Kubernetes with KFServing

Alan Bernardo Palacio

21 Aug 2023

14 min read

0
0
3232

article-image-building-a-containerized-llm-chatbot-application

Alan Bernardo Palacio

21 Aug 2023

19 min read

Building a Containerized LLM Chatbot Application

Alan Bernardo Palacio

21 Aug 2023

19 min read

0
0
1290

article-image-hands-on-tutorial-on-how-to-use-pinecone-with-langchain

Alan Bernardo Palacio

21 Aug 2023

17 min read

Hands-On tutorial on how to use Pinecone with LangChain

Alan Bernardo Palacio

21 Aug 2023

17 min read

A vector database stores high-dimensional vectors and mathematical representations of attributes. Each vector holds dimensions ranging from tens to thousands, enhancing data richness. It operationalizes embedding models, aiding application development with resource management, security, scalability, and query efficiency. Pinecone, a vector database, enables a quick semantic search of vectors. Integrating OpenAI’s LLMs with Pinecone merges deep learning-based embedding generation with efficient storage and retrieval, facilitating real-time recommendation and search systems. Pinecone acts as long-term memory for large language models like OpenAI’s GPT-4.IntroductionThis tutorial will guide you through the process of integrating Pinecone, a high-performance vector database, with LangChain, a framework for building applications powered by large language models (LLMs). Pinecone enables developers to build scalable, real-time recommendation and search systems based on vector similarity search.PrerequisitesBefore you begin this tutorial, you should have the following:A Pinecone accountA LangChain accountA basic understanding of PythonPinecone basicsAs a starter, we will get familiarized with the use of Pinecone by exploring its basic functionalities of it. Remember to get the Pinecone access key.Here is a step-by-step guide on how to set up and use Pinecone, a cloud-native vector database that provides long-term memory for AI applications, especially those involving large language models, generative AI, and semantic search.Initialize Pinecone clientWe will use the Pinecone client, so this step is only necessary if you don’t have it installed already.pip install pinecone-clientTo use Pinecone, you must have an API key. You can find your API key in the Pinecone console under the "API Keys" section. Note both your API key and your environment. To verify that your Pinecone API key works, use the following command:import pinecone pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")If you don't receive an error message, then your API key is valid. This will also initialize the Pinecone session.Creating and retrieving indexesThe commands below create an index named "quickstart" that performs an approximate nearest-neighbor search using the Euclidean distance metric for 8-dimensional vectors.pinecone.create_index("quickstart", dimension=8, metric="euclidean")The Index creation takes roughly a minute.Once your index is created, its name appears in the index list. Use the following command to return a list of your indexes.pinecone.list_indexes()Before you can query your index, you must connect to the index.index = pinecone.Index("quickstart")Now that you have created your index, you can start to insert data into it.Insert the dataTo ingest vectors into your index, use the upsert operation, which inserts a new vector into the index or updates the vector if a vector with the same ID is already present. The following commands upsert 5 8-dimensional vectors into your index.index.upsert([ ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]), ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]), ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]), ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]), ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]) ])You can get statistics about your index, like the dimensions, the usage, and the vector count. To do this, you can use the following command to return statistics about the contents of your index.index.describe_index_stats()This will return a dictionary with information about your index:Now that you have created an index and inserted data into it, we can query the database to retrieve vectors based on their similarity.Query the index and get similar vectorsThe following example queries the index for the three vectors that are most similar to an example 8-dimensional vector using the Euclidean distance metric specified above.index.query( vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], top_k=3, include_values=True )This command will return the first 3 vectors stored in this index that have the lowest Euclidian distance:Once you no longer need the index, use the delete_index operation to delete it.pinecone.delete_index("quickstart")By following these steps, you can set up a Pinecone vector database in just a few minutes. This will help you provide long-term memory for your high-performance AI applications without any infrastructure hassles.Now, let’s take a look at a bit more complex example, in which we embed text data and insert it into Pinecone.Preparing and Processing the DataIn this section, we will create a context for large language models (LLMs) using the OpenAI API. We will walk through the different parts of a Python script, understanding the purpose and function of each code block. The ultimate aim is to transform data into larger chunks of around 500 tokens, ensuring that the dataset is ordered sequentially.SetupFirst, we install the necessary libraries for our script. We're going to use OpenAI for AI models, pandas for data manipulation, and transformers for tokenization.!pip install openai pandas transformersAfter the installations, we import the necessary modules for our script.import pandas as pd import openaiBefore you can interact with OpenAI, you need to provide your API key. Make sure to replace <<YOUR_API_KEY>> with your actual API key.openai.api_key = ('<<YOUR_API_KEY>>')Now we are ready to start processing the data to be embedded and stored in Pinecone.Data transformationWe use pandas to load JSON data files related to different technologies (HuggingFace, PyTorch, TensorFlow, Streamlit). These files seem to contain questions and answers related to their respective topics and are based on the data in the Pinecone documentation. First, we will concatenate these data frames into one for easier manipulation.hf = pd.read_json('data/huggingface-qa.jsonl', lines=True) pt = pd.read_json('data/pytorch-qa.jsonl', lines=True) tf = pd.read_json('data/tensorflow-qa.jsonl', lines=True) sl = pd.read_json('data/streamlit-qa.jsonl', lines=True) df = pd.concat([hf, pt, tf, sl], ignore_index=True) df.head()We can see the data here:Next, we define a function to remove new lines and unnecessary spaces in our text data. The function remove_newlines takes a pandas Series object and performs several replace operations to clean the text.def remove_newlines(serie): serie = serie.str.replace('\\\\n', ' ', regex=False) serie = serie.str.replace('\\\\\\\\n', ' ', regex=False) serie = serie.str.replace(' ',' ', regex=False) serie = serie.str.replace(' ',' ', regex=False) return serieWe transform the text in our dataframe into a single string format combining the 'docs', 'category', 'thread', 'question', and 'context' columns.df['text'] = "Topic: " + df.docs + " - " + df.category + "; Question: " + df.thread + " - " + df.question + "; Answer: " + df.context df['text'] = remove_newlines(df.text)TokenizationWe use the HuggingFace transformers library to tokenize our text. The GPT2 tokenizer is used, and the number of tokens for each text string is stored in a new column 'n_tokens'.from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))We filter out rows in our data frame where the number of tokens exceeds 2000.df = df[df.n_tokens < 2000]Now we can finally embed the data using the OpenAI API.from openai.embeddings_utils import get_embedding size = 'curie' df['embeddings'] = df.text.apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001')) df.head()We will be using the text-search-curie-doc-001' Open AI engine to create the embeddings, which is very capable, faster, and lower cost than Davinci:So far, we've prepared our data for subsequent processing. In the next parts of the tutorial, we will cover obtaining embeddings from the OpenAI API and using them with the Pinecone vector database.Next, we will initialize the Pinecone index, create text embeddings using the OpenAI API and insert them into Pinecone.Initializing the Index and Uploading Data to PineconeThe second part of the tutorial aims to take the data that was prepared previously and upload them to the Pinecone vector database. This would allow these embeddings to be queried for similarity, providing a means to use contextual information from a larger set of data than what an LLM can handle at once.Checking for Large Text DataThe maximum size limit for metadata in Pinecone is 5KB, so we check if any 'text' field items are larger than this.from sys import getsizeof too_big = [] for text in df['text'].tolist(): if getsizeof(text) > 5000: too_big.append((text, getsizeof(text))) print(f"{len(too_big)} / {len(df)} records are too big")This will filter out the entries whose metadata is larger than the one Pinecone can manage. The next step is to create a unique identifier for the records.There are several records with text data larger than the Pinecone limit, so we assign a unique ID to each record in the DataFrame.df['id'] = [str(i) for i in range(len(df))] df.head()This ID can be used to retrieve the original text later:Now we can start with the initialization of the index in Pinecone and insert the data.Pinecone Initialization and Index CreationNext, Pinecone is initialized with the API key, and an index is created if it doesn't already exist. The name of the index is 'beyond-search-openai', and its dimension matches the length of the embeddings. The metric used for similarity search is cosine.import pinecone pinecone.init( api_key='PINECONE_API_KEY', environment="YOUR_ENV" ) index_name = 'beyond-search-openai' if not index_name in pinecone.list_indexes(): pinecone.create_index( index_name, dimension=len(df['embeddings'].tolist()[0]), metric='cosine' ) index = pinecone.Index(index_name)Now that we have created the index, we can proceed to insert the data. The index will be populated in batches of 32. Relevant metadata (like 'docs', 'category', 'thread', and 'href') is also included with each item. We will use tqdm to create a progress bar for the progress of the insertion.from tqdm.auto import tqdm batch_size = 32 for i in tqdm(range(0, len(df), batch_size)): i_end = min(i+batch_size, len(df)) df_slice = df.iloc[i:i_end] to_upsert = [ ( row['id'], row['embeddings'], { 'docs': row['docs'], 'category': row['category'], 'thread': row['thread'], 'href': row['href'], 'n_tokens': row['n_tokens'] } ) for _, row in df_slice.iterrows() ] index.upsert(vectors=to_upsert)This will insert the records into the database to be used later on in the process:Finally, the ID-to-text mappings are saved into a JSON file. This would allow us to retrieve the original text associated with an ID later on.mappings = {row['id']: row['text'] for _, row in df[['id', 'text']].iterrows()} import json with open('data/mapping.json', 'w') as fp: json.dump(mappings, fp)Now the Pinecone vector database should now be populated and ready for querying. Next, we will use this information to provide context to a question answering LLM.Querying and Answering QuestionsThe final part of the tutorial involves querying the Pinecone vector database with questions, retrieving the most relevant context embeddings, and using OpenAI's API to generate an answer to the question based on the retrieved contexts.OpenAI Embedding GenerationThe OpenAI API is used to create embeddings for the question.from openai.embeddings_utils import get_embedding q_embeddings = get_embedding( 'how to use gradient tape in tensorflow', engine=f'text-search-curie-query-001' )A function create_context is defined to use the OpenAI API to create a query embedding, retrieve the most relevant context embeddings from Pinecone, and append these contexts into a larger string ready for feeding into OpenAI's next generation step.from openai.embeddings_utils import get_embedding def create_context(question, index, max_len=3750, size="curie"): q_embed = get_embedding(question, engine=f'text-search-{size}-query-001') res = index.query(q_embed, top_k=5, include_metadata=True) cur_len = 0 contexts = [] for row in res['matches']: text = mappings[row['id']] cur_len += row['metadata']['n_tokens'] + 4 if cur_len < max_len: contexts.append(text) else: cur_len -= row['metadata']['n_tokens'] + 4 if max_len - cur_len < 200: break return "\\\\n\\\\n###\\\\n\\\\n".join(contexts) We can now use this function to retrieve the context necessary based on a given question, as the question is embedded and the relevant context is retrieved from the Pinecone database:Now we are ready to start passing the context to a question-answering model.Querying and AnsweringWe start by defining the parameters that will take during the query, specifically the model we will be using, the maximum token length and other parameters. We can also define given instructions to the model which will be used to constrain the results we can get..fine_tuned_qa_model="text-davinci-002" instruction=""" Answer the question based on the context below, and if the question can't be answered based on the context, say \\"I don't know\\"\\n\\nContext:\\n{0}\\n\\n---\\n\\nQuestion: {1}\\nAnswer:""" max_len=3550 size="curie" max_tokens=400 stop_sequence=None domains=["huggingface", "tensorflow", "streamlit", "pytorch"]Different instruction formats can be defined. We will start now making some simple questions and seeing what the results look like.question="What is Tensorflow" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)We can see that it's giving us the proper results using the context that it's retrieving from Pinecone:We can also inquire about Pytorch:question="What is Pytorch" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)The results keep being consistent with the context provided:Now we can try to go beyond the capabilities of the context by pushing the boundaries a bit more.question="Am I allowed to publish model outputs to Twitter, without a human review?" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)We can see in the results that the model is working according to the instructions provided as we don’t have any context on Twitter:Lastly, the Pinecone index is deleted to free up resources.pinecone.delete_index(index_name)ConclusionThis tutorial provided a comprehensive guide to harnessing Pinecone, OpenAI's language models, and HuggingFace's library for advanced question-answering. We introduced Pinecone's vector search engine, explored data preparation, embedding generation, and data uploading. Creating a question-answering model using OpenAI's API concluded the process. The tutorial showcased how the synergy of vector search engines, language models, and text processing can revolutionize information retrieval. This holistic approach holds potential for developing AI-powered applications in various domains, from customer service chatbots to research assistants and beyond.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn

0
0
1940

article-image-how-open-source-language-models-could-reshape-the-tech-industry

Julian Melanson

30 Jun 2023

5 min read

How Open-Source Language Models Could Reshape the Tech Industry

Julian Melanson

30 Jun 2023

5 min read

The world of technology, characterized by an incessant and rapid pace of evolution, is on the cusp of a seismic shift. Historically, the development and control of large language models—a key component in modern artificial intelligence systems—have been dominated by tech industry giants. However, emerging developments show that this might not be the status quo for much longer. The burgeoning field of open-source LLMs presents a potential disruption to the current balance of power in the tech industry, signaling a shift towards a more democratic and inclusive AI landscape.Major tech firms like Microsoft and Google, armed with vast financial resources, have long held the reins of the LLM market. Their position seemed unassailable as recent earnings calls indicated a thriving business built around their AI services. Yet, a leaked internal document from Google has cast a shadow of uncertainty over this seemingly secure stronghold. The central idea gleaned from this document? No company has an unassailable fortress against competition in the realm of LLMs, not even the mighty OpenAI, the organization responsible for the groundbreaking GPT-3.The story of GPT-3 is a pivotal chapter in the annals of AI history. Its 2020 release ignited a spark in the research community, illuminating the tantalizing promise of scale. With 175 billion parameters, GPT-3 showed capabilities that stretched beyond its initial training data. The success of this LLM prompted a surge of interest in the creation of larger, more complex models. This development led to an arms race among AI research labs, producing increasingly massive models such as Gopher, LaMDA, PaLM, and Megatron-Turing.However, this race towards larger LLMs engendered a substantial increase in research and development costs. The staggering financial demands associated with training and running models like GPT-3 created an environment where LLM innovation was essentially confined to the wealthiest entities in tech. With this economic pressure to recoup their considerable investment, these companies began to commercialize their technology, leading to the erection of protective "moats" around their products. These mechanisms of defensibility safeguarded their investments against the competition, obscuring their research and constraining the sharing of intellectual resources.Key elements of these moats included the proprietary control over training data, model weights, and the costs associated with training and inference. With their deep pockets, big tech companies kept the upper hand in managing the expenses tied to training and running large LLMs. This dominance rendered even open-source alternatives such as BLOOM and OPT175-B largely inaccessible to organizations without the fiscal means to support the hefty demands of these advanced models.The Coming of Open-Source Language ModelsFor a time, this state of affairs painted a bleak picture for the democratization of LLMs, with the field becoming increasingly exclusive and secretive. However, the ebb and flow of innovation and competition that define the tech industry were bound to respond. The open-source community rose to the challenge, their endeavors intensifying following the release of OpenAI's ChatGPT, an instruction-following language model that illustrated the vast potential of LLMs in a multitude of applications.These open-source alternatives are changing the game by proving that performance is not solely a function of scale. Small, nimble LLMs trained on expansive datasets have proven the ability to compete head-to-head with their larger counterparts. Moreover, the open-source models, often consisting of 7-13 billion parameters, can be fine-tuned to remarkable degrees on a modest budget and can run on consumer-grade GPUs.One such example, the open-source LLM developed by Meta, known as LLaMA, sparked a wave of similar models like Alpaca and Vicuna. These models, constructed on top of LLaMA, displayed an impressive capability for instruction-following akin to ChatGPT. The subsequent release of Dolly 2.0 by Databricks and Open Assistant further enriched the field by providing commercially usable, instruction-following LLMs that organizations can tailor to their specific needs.The impact of these open-source models is profound. They potentially democratize access to advanced AI systems, reducing the cost of training by using techniques like low-rank adaptation (LoRA) and allowing businesses to incorporate LLMs into their operations at an affordable price. This development poses a significant challenge to the established order, undermining the monopoly of tech giants on LLMs.Nonetheless, the rise of open-source models does not spell the end of cloud-based language models. Despite the democratization they promise, open-source LLMs face significant hurdles, including the prohibitive costs of pre-training. Furthermore, they may not be the best choice for all businesses. Companies without in-house machine learning expertise may still prefer the convenience of out-of-the-box, serverless solutions provided by the likes of Microsoft and Google. The entrenched distribution channels of these tech behemoths also present a formidable barrier for open-source LLMs to overcome.However, the broader implications of the open-source movement in LLMs are unmistakable. It expands the market, opens up novel applications, and puts pressure on tech giants to offer more competitive pricing. By democratizing access to advanced AI, it allows for broader participation in the AI revolution, reducing the concentration of power and innovation within a few wealthy tech companies. As the LLM landscape continues to evolve rapidly, the rise of open-source models will leave an indelible mark on the tech industry.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!

0
0
388

article-image-bloomberggpt-putting-finance-to-work-using-large-language-models

Julian Melanson

28 Jun 2023

7 min read

BloombergGPT: Putting Finance to Work using Large Language Models

Julian Melanson

28 Jun 2023

7 min read

In recent years, the financial industry has experienced a significant surge in the amount and complexity of data. This exponential growth has underscored the need for advanced artificial intelligence models capable of comprehending and processing the specialized language used in finance. Addressing this demand, Bloomberg unveiled BloombergGPT, a revolutionary language model trained on a diverse range of financial data.The Rise of BloombergGPTReleased on March 30th, BloombergGPT represents a groundbreaking development in the financial sector's application of AI technology. By focusing specifically on finance-related tasks, BloombergGPT aims to enhance existing NLP applications employed by Bloomberg, including sentiment analysis, named entity recognition, news classification, and question answering. Furthermore, this sophisticated model holds the promise of unlocking new possibilities for leveraging the vast amounts of data accessible through the Bloomberg Terminal, thereby empowering the firm's customers and fully harnessing the potential of AI in the financial domain.Unleashing the Power of BloombergGPTBloombergGPT boasts two notable capabilities that propel it beyond generic language models. First, it possesses the ability to generate Bloomberg Query Language (BQL), which serves as a query language for accessing and analyzing financial data on the Bloomberg platform. BQL, a powerful and intricate tool, enables various financial tasks such as data searching, analysis, report creation, and insight generation. BloombergGPT's proficiency in transforming natural language queries into valid BQL fosters more intuitive interactions with financial data, streamlining the querying process and enhancing user experience.The second noteworthy feature of BloombergGPT is its capability to provide suggestions for news headlines. This functionality proves invaluable for news applications and aids journalists in constructing compelling and informative newsletters. By inputting paragraphs, BloombergGPT can generate relevant and engaging titles, saving time and enhancing the efficiency of content creation.Training BloombergGPT: A Domain-Specific ApproachTo train BloombergGPT, Bloomberg employed a domain-specific approach, combining their own financial data with augmented online text data. This strategy demonstrates the value of developing language models tailored to specific industries, surpassing the utility of generic models. The training process involved building a dataset of English-language financial documents, incorporating 363 billion financial-specific tokens from Bloomberg's proprietary data assets and an additional 345 billion generic tokens from online text datasets, including The Pile, C4, and Wikipedia.The resulting domain-specific language model, BloombergGPT, comprises an impressive 50 billion parameters and is optimized for financial tasks. Notably, BloombergGPT outperforms popular open-source language models such as GPT-NeoX, OPT, and Bloom in finance-specific tasks. Furthermore, it exhibits remarkable performance in generic language tasks, including summarization, often rivaling the performance of GPT-3 based on Bloomberg's benchmarks.Applications and Advantages:BloombergGPT's introduction opens up a wealth of possibilities for employing language models in the financial technology realm. One such application is sentiment analysis, which enables the assessment of sentiments in articles, particularly those related to individual companies. Automatic entity recognition is another area where BloombergGPT excels, offering the potential for streamlined data extraction and analysis. Additionally, the model is adept at answering financial questions, providing prompt and accurate responses to user inquiries.Bloomberg's news division can leverage BloombergGPT to automatically generate compelling headlines for newsletters, reducing manual effort and improving efficiency. The model's capability to formulate queries in Bloomberg's proprietary query language (BQL) with minimal examples further augments its versatility. Users can interact with BloombergGPT using natural language, specifying their data requirements, and allowing the model to generate the appropriate BQL, expediting data extraction from databases.Shawn Edwards, Bloomberg's Chief Technology Officer, emphasizes the immense value of developing the first language model focused on the financial domain. The domain-specific approach not only allows for the creation of diverse applications but also yields superior performance compared to developing custom models for each specific task. This advantage, coupled with a faster time-to-market, positions BloombergGPT as a game-changer in the finance industry.The Future of BloombergGPT:BloombergGPT's potential extends beyond its current capabilities. As the model continues to train and optimize on financial data, further progress, and advancements are expected. Its application can be broadened to encompass a wider range of financial tasks, ultimately facilitating more accurate and efficient decision-making in the financial industry.BloombergGPT represents a significant milestone in the advancement of financial natural language processing. By addressing the unique language intricacies of the financial industry, this domain-specific language model holds immense potential for revolutionizing how financial data is analyzed, queried, and leveraged. With its impressive 50 billion parameters and exceptional performance in financial NLP tasks, BloombergGPT positions itself as a powerful tool that will shape the future of the finance industry.Use-casesAutomating research tasks: BloombergGPT is being used by researchers at the University of Oxford to automate the task of summarizing large medical datasets. This has allowed the researchers to save a significant amount of time and effort, and it has also allowed them to identify new insights that they would not have been able to find otherwise.Creating content: BloombergGPT is being used by businesses such as Nike and Coca-Cola to create content for their websites and social media channels. This has allowed these businesses to produce high-quality content more quickly and easily, and it has also helped them to reach a wider audience.Improving customer service: BloombergGPT is being used by customer service teams at companies such as Amazon and PayPal to provide customers with more personalized and informative responses. This has helped these companies to improve their customer satisfaction ratings.Generating code: BloombergGPT is being used by developers at companies such as Google and Facebook to generate code for new applications. This has helped these developers to save time and effort, and it has also allowed them to create more complex and sophisticated applications.Translating languages: BloombergGPT is being used by businesses such as Airbnb and Uber to translate their websites and apps into multiple languages. This has helped these businesses to expand into new markets and to reach a wider audience.These are just a few examples of how BloombergGPT is being used in the real world. As it continues to develop, it is likely that even more use cases will be discovered.SummaryIn recent years, the financial industry has faced a surge in data complexity, necessitating advanced artificial intelligence models. BloombergGPT, a language model trained on financial data, represents a groundbreaking development in the application of AI in finance. It aims to enhance Bloomberg's NLP applications, providing improved sentiment analysis, named entity recognition, news classification, and question answering. Notably, BloombergGPT can generate Bloomberg Query Language (BQL) and suggest news headlines, streamlining financial data querying and content creation. By training the model on domain-specific data, BloombergGPT outperforms generic models and offers various applications, including sentiment analysis, entity recognition, and prompt financial question answering. With further advancements expected, BloombergGPT has the potential to revolutionize financial NLP, enabling more accurate decision-making. The model's versatility and superior performance position it as a game-changer in the finance industry, with applications ranging from automating research tasks to improving customer service and code generation.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!

0
0
1701

article-image-building-and-deploying-web-app-using-langchain

Avratanu Biswas

26 Jun 2023

12 min read

Building and deploying Web App using LangChain

Avratanu Biswas

26 Jun 2023

12 min read

So far, we've explored the LangChain modules and how to use them (refer to the earlier blog post on LangChain Modules here). In this section, we'll focus on the LangChain Indexes and Agent module and also walk through the process of creating and launching a web application that everyone can access. To make things easier, we'll be using Databutton, an all-in-one online workspace to build and deploy web apps, integrated with Streamlit, a Python web- development framework known for its support in building interactive web applications.What are LangChain Agents?In simpler terms, LangChain Agents are tools that enable Large Language Models (LLMs) to perform various actions, such as accessing Google search, executing Python calculations, or making SQL queries, thereby empowering LLMs to make informed decisions and interact with users by using tools and observing their outputs. The official documentation of LangChain describes Agents as:" …there is an agent which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call… In building agents, there are several abstractions involved. The Agent abstraction contains the application logic, receiving user input and previous steps to return either an AgentAction (tool and input) or AgentFinish (completion information). Agent covers another aspect, called Tools, which represents the actions agents can take, while Toolkits group tools for specific use cases (e.g., SQL querying). Lastly, the Agent Executor manages the iterative execution of the agent with the available tools. Thus, in this section, we will briefly explore such abstractions while using the Agent functionality to integrate tools and primarily focus on building a real-world easily deployable web application.IndexesThis module provides utility functions for structuring documents using indexes and allowing LLMs to interact with them effectively. We will focus on one of the most commonly used retrieval systems, where indexes are used to find the most relevant documents based on a user's query. Additionally, LangChain supports various index and retrieval types, with a focus on vector databases for unstructured data. We will explore this component in detail as it can be leveraged in a wide number of real-world applications.Image 1 Langchain workflow by AuthorWorkflow of a question & answer generation interface using Retrieval index, where we leverage all types of Indexes which LangChain provides. Indexes are primarily of four types, namely : Document Loaders, Text Splitters, VectorStores, and Retrievers. Briefly, (a) the documents fetched from any datasource is split into chunks using text splitter modules (b) Embeddings are created (c)Stored over a vector store index ( vector databases such as chromadb / pinecone / weaviate, etc ) (d) Queries from the user is retrieved via retrieval QA chain We will use the WikipediaLoader to load Wikipedia documents related to our query "LangChain" and retrieve the metadata and a portion of the page content of the first document.from langchain.document_loaders import WikipediaLoader docs = WikipediaLoader(query='LangChain', load_max_docs=2).load() docs[0].metadata docs[0].page_content[:400]CharacterTextSplitter is used to split the loaded documents into smaller chunks for further processing.from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=4000, chunk_overlap=0) texts = text_splitter.split_documents(docs)The OpenAIEmbeddings the module is then employed to generate embeddings for the text chunks.from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)We will use Chroma vector store, which is created from the generated text chunks and embeddings, allowing for efficient storage and retrieval of vectorized data.Next, the RetrievalQA module is instantiated with an OpenAI LLM and the created retriever, setting up a question-answering system.from langchain.vectorstores import Chroma db = Chroma.from_documents(texts, embeddings) retriever = db.as_retriever() from langchain.chains import RetrievalQA from langchain.llms import OpenAI Qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=OPENAI_API_KEY), chain_type="stuff", retriever=retriever)At this stage, we can easily seek answers from the stored indexed data. For instance, query = "What is LangChain?" qa.run(query)LangChain is a framework designed to simplify the creation of applications using large language models (LLMs).query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "Who is the founder?" qa.run(query) The founder of LangChain is Harrison Chase.The Q&A functionality implemented using the retrieval chain provides reasonable answers to most of our queries. Different types of indexes provided by LangChain, can be leveraged for various real-world use cases involving data structuring and retrieval. Moving forward, we will delve into the next section, where we will focus on the final component called the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.Building Web App using DatabuttonPrerequisitesTo begin using Databutton, all that is required is to sign up through their official website. Once logged in, we can either create a blank template app from scratch or choose from the pre-existing templates provided by Databutton.Image by Author | Screen grasp showing on how to start working with a new blank appOnce the blank app is created, we generate our online workspace consisting of several features for building and deploying the app. We can immediately begin writing our code within the online editor. The only requirement at this stage is to include the necessary packages or dependencies that our app requires.Image by Author | Screen grasp showing the different components available within the Databutton App's online workspace. Databutton's workspace initialization includes some essential packages by default. However, for our specific app, we need to add two additional packages - openai and langchain. This can be easily accomplished within the "configuration" workspace of Databutton.Image by Author | Screen grasp of the configuration options within Databutton's online workspace. Here we can add the additional packages which we need for working with our app. The workspace is generated with few pre-installed dependencies.Writing the codeNow that we have a basic understanding of Agents and their abstraction methods, let's put them to use, alongside incorporating some basic Streamlit syntax for the front end.Importing the required modules: For building the web app, we will require the Streamlit library and several LangChain modules. Additionally, we will utilise a helper function that relies on the sys and io libraries for capturing and displaying function outputs. We will discuss the significance of this helper function towards the end to better understand its purpose.# Modules to Import import streamlit as st import sys import io import re from typing import Callable, Any from langchain.agents.tools import Tool from langchain.agents import initialize_agent from langchain.llms import OpenAI from langchain.chains import LLMChain from langchain import LLMMathChain from langchain import PromptTemplateUsing the LangChain modules and building the main user interface: We set the title of the app using st.title() syntax and also enables the user to enter their OpenAI API key using the st.text_input() widget.# Set the title of the app st.title("LangChain `Agent` Module Usage Demo App") # Get the OpenAI API key from the user OPENAI_API_KEY = st.text_input( "Enter your OpenAI API Key to get started", type="password" )As we discussed in the previous sections, we need to define a template for the prompt that incorporates a placeholder for the user's query.# Define a template for the prompt template = """You are a friendly and polite AI Chat Assistant. You must try to provide accurate but concise answers. If you don't know the answer, just say "I don't know." Question: {query} Answer: """ # Create a prompt template object with the template prompt = PromptTemplate(template=template, input_variables=["query"])Next, we implement a conditional loop. If the user has provided an OpenAI API key, we proceed with the flow of the app. The user is asked to enter their query using the st.text_input() widget.# Check if the user has entered an OpenAI API key if OPENAI_API_KEY: # Get the user's query query = st.text_input("Ask me anything")Once the user has the correct API keys inserted, from this point onward, we will proceed with the implementation of LangChain modules. Some of these modules may be new to us, while others may have already been covered in our previous sections.Next, we create instances of the OpenAI language model, OpenAI, the LLMMathChain for maths-related queries, and the LLMChain for general-purpose queries.# Check if the user has entered a query if query: # Create an instance of the OpenAI language model llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY) # Create an instance of the LLMMathChain for math-related queries llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True) # Create an instance of the LLMChain for general-purpose queries llm_chain = LLMChain(llm=llm, prompt=prompt)Following that, we create a list of tools that the agent will utilize. Each tool comprises a name, a corresponding function to handle the query, and a brief description.# Create a list of tools for the agent tools = [ Tool( name="Search", func=llm_chain.run, description="Useful for when you need to answer general purpose questions", ), Tool( name="Calculator", func=llm_math_chain.run, description="Useful for when you need to answer questions about math", ), ]Further, we need to initialize a zero-shot agent with the tools and other parameters. This agent employs the ReAct framework to determine which tool to utilize based solely on the description associated with each tool. It is essential to provide a description of each tool.# Initialize the zero-shot agent with the tools and parameters zero_shot_agent = initialize_agent( agent="zero-shot-react-description", tools=tools, llm=llm, verbose=True, max_iterations=3, ) And now finally, we can easily call the zero-shot agent with the user's query using the run(query) method.# st.write(zero_shot_agent.run(query))However, this would only yield the final outcome of the result within our Streamlit UI, without providing access to the underlying LangChain thought process (i.e. the verbose ) that we typically observe in a Notebook environment. This information is crucial to understand which tools our agent is opting for based on the user query. To address this, a helper function called capture_and_display_output was created.# Helper function to dump LangChain Verbose / Though Process # Function to capture and display the output of a function def capture_and_display_output(func: Callable[..., Any], args, **kwargs) -> Any: # Redirect stdout to a string buffer original_stdout = sys.stdout sys.stdout = output_catcher = io.StringIO() # Call the function and capture the response response = func(args, *kwargs) # Restore the original stdout and get the captured output sys.stdout = original_stdout output_text = output_catcher.getvalue() # Clean the output text by removing escape sequences cleaned_text = re.sub(r"\x1b\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", "", output_text) # Split the cleaned text into lines and concatenate them with line breaks lines = cleaned_text.split("\n") concatenated_string = "\n".join([s if s else "\n" for s in lines]) # Display the captured output in an expander with st.expander("Thoughts", expanded=True): st.write(concatenated_string)This function allows users to monitor the actions undertaken by the agent. Consequently, the response from the agent is displayed within the UI.# Call the zero-shot agent with the user's query and capture the output response = capture_and_display_output(zero_shot_agent.run, query) Image by Author | Screen grasp of the app in local deployment displays the entire verbose or rather the thought process Deploy and Testing of the AppThe app can now be easily deployed by clicking the "Deploy" button on the top left-hand side. The deployed app will provide us with a unique URL that can be shared with everyone!Image by Author | Screen grasp of the Databutton online workspace showing the Deploy options. Yay! We have successfully built and deployed a LangChain-based web app from scratch. Here's the link to the app ! The app also consists of a view code page , which can be accessed via this link.To test the web app, we will employ two different types of prompts. One will be a general question that can be answered by any LLMs, while the other will be a maths-related question. Our hypothesis is that the LangChain agents will intelligently determine which agents to execute and provide the most appropriate response. Let's proceed with the testing to validate our assumption.Image by Author | Screen grasped from the deployed web app.Two different prompts were used to validate our assumptions. Based on the thought process ( displayed in the UI under the thoughts expander ), we can easily interpret which Tool has been chosen by the Agent. (Left) Usage of LLMMath chain incorporating Tool (Right) Usage of a simple LLM Chain incorporating Tool.ConclusionTo summarise, we have not only explored various aspects of working with LangChain and LLMs but have also successfully built and deployed a web app powered by LangChain. This demonstrates the versatility and capabilities of LangChain in enabling the development of powerful applications.ReferencesLangChain Agents official documentation : https://python.langchain.com/en/latest/modules/agents.htmlDatabutton : https://www.databutton.io/Streamlit : https://streamlit.io/ Build a Personal Search Engine Web App using Open AI Text Embeddings : https://medium.com/@avra42/build-a-personal-search-engine-web-app-using-open-ai-text-embeddings-d6541f32892dPart 1: Using LangChain for Large Language Model — powered Applications: https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsDeployed Web app - https://databutton.com/v/23ks6sem Source code for the app - https://databutton.com/v/23ks6sem/View_CodeAuthor BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, (Data Science, ML & AI ).Twitter YouTube Medium GitHub

0
0
807

article-image-how-to-work-with-langchain-python-modules

Avratanu Biswas

22 Jun 2023

13 min read

How to work with LangChain Python modules

Avratanu Biswas

22 Jun 2023

13 min read

This article is the second part of a series of articles, please refer to Part 1 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionIn this section, we dive into the practical usage of LangChain modules. Building upon the previous overview of LangChain components, we will work within a Python environment to gain hands-on coding experience. However, it is important to note that this overview is not a substitute for the official documentation, and it is recommended to refer to the documentation for a more comprehensive understanding.Choosing the Right Python EnvironmentWhen working with Python, Jupyter Notebook and Google Colab are popular choices for quickly getting started in the Python environment. Additionally, Visual Studio Code (VSCode) Atom, PyCharm, or Sublime Text integrated with a conda environment are also excellent options. While many of these can be used, Google Colab is used here for its convenience in quick testing and code sharing. Find the code link here.PrerequisitesBefore we begin, make sure to install the necessary Python libraries. Use the pip command within a notebook cell to install them.Installing LangChain: In order to install the "LangChain" library, which is essential for this section, you can conveniently use the following command:!pip install langchainRegular Updates: Personally, I would recommend taking advantage of LangChain’s frequent releases by frequently upgrading the packages. Use the following command for this purpose:!pip install langchain - - upgradeIntegrating LangChain with LLMs: Previously, we discussed how the LangChain library facilitates interaction with Large Language Models (LLMs) provided by platforms such as OpenAI, Cohere, or HuggingFace. To integrate LangChain with these models, we need to follow these steps:Obtain API Keys: In this tutorial, we will use OpenAI. We need to sign up; to easily access the API keys for the various endpoints which Open AI provides. The key must be confidential. You can obtain the API via this link.Install Python Package: Install the required Python package associated with your chosen LLM provider. For OpenAI language models, execute the command:!pip install openaiConfiguring the API Key for OpenAI: To initialize the API key for the OpenAI library, we will use the getpass Python Library. Alternatively, you can set the API key as an environment variable.# Importing the library OPENAI_API_KEY = getpass.getpass() import getpass # In order to double check # print(OPENAI_API_KEY) # not recommendedRunning the above lines of code will create a secure text input widget where we can enter the API key, obtained for accessing OpenAI LLMs endpoints. After hitting enter, the inputted value will be stored as the assigned variable OPENAI_API_KEY, allowing it to be used for subsequent operations throughout our notebook.We will explore different LangChain modules in the section below:Prompt TemplateWe need to import the necessary module, PromptTemplate, from the langchain library. A multi-line string variable named template is created - representing the structure of the prompt and containing placeholders for the context, question, and answer which are the crucial aspects of any prompt template.Image by Author | Key components of a prompt template is shown in the figure. A PromptTemplate the object is instantiated using the template variable. The input_variables parameter is provided with a list containing the variable names used in the template, in this case, only the query.:from langchain import PromptTemplate template = """ You are a Scientific Chat Assistant. Your job is to answer scientific facts and evidence, in a bullet point wise. Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: {query} Answer: """ prompt = PromptTemplate(template=template, input_variables=["query"])The generated prompt structure can be further utilized to dynamically fill in the question placeholder and obtain responses within the specified template format. Let's print our entire prompt! print(prompt) lc_kwargs={'template': ' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n', 'input_variables': ['query']} input_variables=['query'] output_parser=None partial_variables={} template=' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n' template_format='f-string' validate_template=TrueChainsThe LangChain documentation covers various types of LLM chains, which can be effectively categorized into two main groups: Generic chains and Utility chains.Image 2: ChainsChains can be broadly classified into Generic Chains and Utility Chains. (a) Generic chains are designed to provide general-purpose language capabilities, such as generating text, answering questions, and engaging in natural language conversations by leveraging LLMs. On the other contrary, (b) Utility Chains: are specialized to perform specific tasks or provide targeted functionalities. These chains are fine-tuned and optimized for specific use cases. Note, although Index-related chains can be classified into a sub-group, here we keep such chains under the banner of utility chains. They are often considered to be very useful while working with Vector databases.Since this is the very first time we are running the LLM chain, we will walk through the code in detail.We need to import the OpenAI LLM module from langchain.llms and the LLMChain module from langchain Python package.Then, an instance of the OpenAI LLM is created, using the arguments such as temperature (affects the randomness of the generated responses), openai_api_key (the API key for OpenAI which we just assigned before), model (the specific OpenAI language model to be used - other models are available here), and streaming. Note the verbose argument is pretty useful to understand the abstraction that LangChain provides under the hood, while executing our query.Next, an instance of LLMChain is created, providing the prompt (the previously defined prompt template) and the LLM (the OpenAI LLM instance).The query or question is defined as the variable query.Finally, the llm_chain.run(query) line executes the LLMChain with the specified query, generating the response based on the defined prompt and the OpenAI LLM:# Importing the OpenAI LLM module from langchain.llms import OpenAI # Importing the LLMChain module from langchain import LLMChain # Creating an instance of the OpenAI LLM llm = OpenAI(temperature=0.9, openai_api_key=OPENAI_API_KEY, model="text-davinci-003", streaming=True) # Creating an instance of the LLMChain with the provided prompt and OpenAI LLM llm_chain = LLMChain(prompt=prompt,llm=llm, verbose=True) # Defining the query or question to be asked query = "What is photosynthesis?" # Running the LLMChain with the specified query print(llm_chain.run(query)) Let's have a look at the response that is generated after running the chain with and without verbose,a) with verbose = True;Prompt after formatting:You are an Scientific Chat Assistant. Your job is to reply scientific facts and evidence in a bullet point wise.Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: What is photosynthesis?Answer:> Finished chain.• Photosynthesis is the process used by plants, algae and certain bacteria to convert light energy from the sun into chemical energy in the form of sugars.• Photosynthesis occurs in two stages: the light reactions and the Calvin cycle. • During the light reactions, light energy is converted into ATP and NADPH molecules.• During the Calvin cycle, ATP and NADPH molecules are used to convert carbon dioxide into sugar molecules. b ) with verbose = False;• Photosynthesis is a process used by plants and other organisms to convert light energy, normally from the sun, into chemical energy which can later be released to fuel the organisms' activities.• During photosynthesis, light energy is converted into chemical energy and stored in sugars.• Photosynthesis occurs in two stages: light reactions and the Calvin cycle. The light reactions trap light energy and convert it into chemical energy in the form of the energy-storage molecule ATP. The Calvin cycle uses ATP and other molecules to create glucose.Seems like our general-purpose LLMChain has done a pretty decent job and given a reasonable output by leveraging the LLM.Now let's move onto the utility chain and understand it, using a simple code snippet:from langchain import OpenAI from langchain import LLMMathChain llm = OpenAI(temperature=0.9,openai_api_key= OPENAI_API_KEY) # Using the LLMMath Chain / LLM defined in Prompt Template section llm_math = LLMMathChain.from_llm(llm = llm, verbose = True) question = "What is 4 times 5" llm_math.run(question) # You know what the response would be 🎈Here the utility chain serves a specific function, i.e. to solve a fundamental maths question using the LLMMathChain. It's crucial to look at the prompt used under the hood for such chains. However , in addition, a few more notable utility chains are there as well,BashChain: A utility chain designed to execute Bash commands and scripts.SQLDatabaseChain: This utility chain enables interaction with SQL databasesSummarizationChain: The SummarizationChain is designed specifically for text summarization tasks.Such utility chains, along with other available chains in the LangChain framework, provide specialized functionalities and ready-to-use tools that can be utilized to expedite and enhance various aspects of the language processing pipeline.MemoryUntil now, we have seen, each incoming query or input to the LLMs or to its subsequent chain is treated as an independent interaction, meaning it is "stateless" (in simpler terms, information IN, information OUT). This can be considered as one of the major drawbacks, as it hinders the ability to provide a seamless and natural conversational experience for users who are seeking reasonable responses further on. To overcome this limitation and enable better context retention, LangChain offers a broad spectrum of memory components that are extremely helpful.Image by Author | The various types of Memory modules that LangChain provides.By utilizing the memory components supported, it becomes possible to remember the context of the conversation, making it more coherent and intuitive. These memory components allow for the storage and retrieval of information, enabling the LLMs to have a sense of continuity. This means they can refer back to previous relevant contexts, which greatly enhances the conversational experience for users. A typical example of such memory-based interaction is the very popular chatbot - ChatGPT, which remembers the context of our conversations.Let's have a look at how we can leverage such a possibility using LangChain:from langchain.llms import OpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory llm = OpenAI(temperature=0, openai_api_key= OPENAI_API_KEY) conversation = ConversationChain( llm=llm, verbose=True, memory = ConversationBufferMemory() ) In the above code, we have initialized an instance of the ConversationChain class, configuring it with the OpenAI language model, enabling verbose mode for detailed output, and utilizing a ConversationBufferMemory for memory management during conversations. Now, let's begin our conversation,conversation.predict(input="Hi there!I'm Avra") Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there! I'm AvraAI:> Finished chain.' Hi, Avra! It's nice to meet you. My name is AI. What can I do for you today?Let's add a few more contexts to the chain, so that later we can test the context memory of the chain.conversation.predict(input="I'm interested in soccer and building AI web apps.")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there!I'm AvraAI: Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI:> Finished chain.' That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Now, we make a query, which requires the chain to trace back to its memory storage and provide a reasonable response based on it.conversation.predict(input="Who am I and what's my interest ?")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Current conversation:Human: Hi there!I'm AvraAI: Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI: That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Human: Who am I and what's my interest ?AI:> Finished chain.' That's a difficult question to answer. I don't have enough information to answer that question. However, based on what you've told me, it seems like you are Avra and your interests are soccer and building AI web apps.The above response highlights the significance of the ConversationBufferMemory chain in retaining the context of the conversation. It would be worthwhile to try out the above example without a buffer memory to get a clear perspective of the importance of the memory module. Additionally, LangChain provides several memory modules that can enhance our understanding of memory management in different ways, to handle conversational contexts.Moving forward, we will delve into the next section, where we will focus on the final two components called the “Indexes” and the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.ReferencesLangChain Official Docs - https://python.langchain.com/en/latest/index.htmlCode available for this section here (Google Collab) - https://colab.research.google.com/drive/1_SpAvehzfbYYdDRnhU6v9-KHwIHMC1yj?usp=sharingPart 1: Using LangChain for Large Language Model — powered Applications : https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsPart 3 : Building and deploying Web App using LangChain <Insert Link>How to build a Chatbot with ChatGPT API and a Conversational Memory in Python: https://medium.com/@avra42/how-to-build-a-chatbot-with-chatgpt-api-and-a-conversational-memory-in-python-8d856cda4542Databutton - https://www.databutton.io/Author BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter YouTube Medium GitHub

0
0
2037

article-image-democratizing-ai-with-stability-ais-initiative-stablelm

Julian Melanson

22 Jun 2023

6 min read

Democratizing AI with Stability AI’s Initiative, StableLM

Julian Melanson

22 Jun 2023

6 min read

Artificial Intelligence is becoming a cornerstone of modern technology, transforming our work, lives, and communication. However, its development has largely remained in the domain of a handful of tech giants, limiting accessibility for smaller developers or independent researchers. A potential shift in this paradigm is visible in Stability AI's initiative - StableLM, an open-source language model aspiring to democratize AI. Developed by Stability AI, StableLM leverages a colossal dataset, "The Pile," comprising 1.5 trillion tokens of content. It encompasses models with parameters from 3 billion to 175 billion, facilitating diverse research and commercial applications. Furthermore, this open-source language model employs an assortment of datasets from recent models like Alpaca, GPT4All, Dolly, ShareGPT, and HH for fine-tuning.StableLM represents a paradigm shift towards a more inclusive and universally accessible AI technology. In a bid to challenge dominant AI players and foster innovation, Stability AI plans to launch StableChat, a chat model devised to compete with OpenAI's ChatGPT. The democratization of AI isn't a novel endeavor for Stability AI. Their earlier project, Stable Diffusion, an open-source alternative to OpenAI’s DALL-E 2, rejuvenated the generative content market and spurred the conception of new business ideas. This accomplishment set the stage for the launch of StableLM in a market rife with competition.Comparing StableLM with models like ChatGPT and LLama reveals unique advantages. While both ChatGPT and StableLM are designed for natural language processing (NLP) tasks, StableLM emphasizes transparency and accessibility. ChatGPT, developed by OpenAI, boasts a parameter count of 1 trillion, far exceeding StableLM's highest count of 175 billion. Furthermore, using ChatGPT entails costs, unlike the open-source StableLM. On the other hand, LLama, another open-source language model, relies on a different training dataset than StableLM's "The Pile." Regardless of the disparities, all three models present valuable alternatives for AI practitioners.A potential partnership with AWS Bedrock, a platform providing a standard approach to building, training, and deploying machine learning models, could bolster StableLM's utility. Integrating StableLM with AWS Bedrock's infrastructure could allow developers to leverage StableLM's performance and AWS Bedrock's robust tools.Enterprises favor open-source models like StableLM for their transparency, flexibility, and cost-effectiveness. These models promote rapid innovation, offer technology control, and lead to superior performance and targeted results. They are maintained by large developer communities, ensuring regular updates and continual innovation. StableLM demonstrates Stability AI's commitment to democratizing AI, and fostering diversity in the AI market. It brings forth a multitude of options, refined applications, and tools for users. The core of StableLM's value proposition lies in its dedication to transparency, accessibility, and user support.Following the 2022 public release of the Stable Diffusion model, Stability AI continued its mission to democratize AI with the introduction of the StableLM set of models. Trained on an experimental dataset three times larger than "The Pile," StableLM shows excellent performance in conversational and coding tasks, despite having fewer parameters than GPT-3. In addition to this, Stability AI has introduced research models optimized for academic research. These models utilize data from recently released open-source conversational agent datasets such as Alpaca, GPT4All, Dolly, ShareGPT, and HH.StableLM's vision revolves around fostering transparency, accessibility, and supportiveness. By focusing on enhancing AI's effectiveness in real-world tasks rather than chasing superhuman intelligence, Stability AI opens up innovative and practical applications of AI. This approach augments AI's potential to drive innovation, boost productivity, and expand economic prospects.A Guide to Installing StableLMStableLM can be installed using two different methods: one with a text generation web UI and the other with llama.cpp. Both of these methods provide a straightforward process for setting up StableLM on various operating systems including Windows, Linux, and macOS.Installing StableLM with Text Generation Web UIThe installation process with the one-click installer involves a simple three-step procedure that works across Windows, Linux, and macOS. First, download the zip file and extract it. Then double-click on "start". These zip files are provided directly by the web UI's developer. Following this, the model can be downloaded from Hugging Face, completing the installation process.Installing StableLM with llama.cppThe installation procedure with llama.cpp varies slightly between Windows and Linux/macOS. For Windows, start by downloading the latest release and extracting the zip file. Next, create a "models" folder inside the extracted folder. After this, download the model and place it inside the model's folder. Lastly, run the following command, replacing 'path\to' with the actual directory path of your files: 'path\to\main.exe -m models\7B\ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -n 128'.For Linux and macOS, the procedure involves a series of commands run through the Terminal. Start by installing the necessary libraries with the'python3 -m pip install torch numpy sentencepiece'. Next, clone the llama.cpp repository from GitHub with 'git clone https://github.com/ggerganov/llama.cpp' and navigate to the llama.cpp directory with 'cd llama.cpp'. Compile the program with the 'make' command. Finally, download the pre-quantized model, or convert the original following the documentation provided in the llama.cpp GitHub page. To run StableLM, use the command './main -m ./models/7B/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -n 128'.In sum, StableLM's introduction signifies a considerable leap in democratizing AI. Stability AI is at the forefront of a new AI era characterized by openness, scalability, and transparency, widening AI's economic benefits and making it more inclusive and accessible.SummaryIn this article, we have introduced StabilityLM, a new language model that is specifically designed to be more stable and robust than previous models. We have shown how to install StabilityLM using the Text Generation Web UI, as well as by compiling the llama.cpp code. We have also discussed some of the benefits of using StabilityLM, such as its improved stability and its ability to generate more creative and informative text. StabilityLM can be used for a variety of tasks, including text generation, translation, and summarization.Overall, StabilityLM is a promising new language model that offers a number of advantages over previous models. If you are looking for a language model that is stable, robust, and creative, then StabilityLM is a good option to consider.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!

0
0
306

article-image-making-the-best-out-of-hugging-face-hub-using-langchain

Ms. Valentina Alto

17 Jun 2023

6 min read

Making the best out of Hugging Face Hub using LangChain

Ms. Valentina Alto

17 Jun 2023

6 min read

Since the launch of ChatGPT in November 2022, everyone is talking about GPT models and OpenAI. There is no doubt that the Generative Pre-trained Transformers (GPT) architecture developed by OpenAI has demonstrated incredible results, also given the investments in training (almost 500 billion tokens) and complexity of the model (175 billion parameters for the GPT-3).Nevertheless, there is an incredible number of open-source Large Language Models (LLMs) that have been widespread in the last months. Below are some examples:Dolly: 12 billion parameters LLM developed by Databricks and trained on their ML platform. Source codeàhttps://github.com/databrickslabs/dollyStableML: This is a series of LLM developed by StabilityAI, the company behind the popular image generation model Stable Diffusion. The series encompasses a variety of LLMs, some of which are fine-tuned on specific use cases. Source codeàhttps://github.com/Stability-AI/StableLMFalcon LLM: A 40 billion parameters LLM developed by the Technology Innovation Institute and trained on a particularly high-quality dataset called RefinedWeb. Plus, as for now (June 2023) ranks 1 globally in the latest Hugging Face independent verification of open-source AI models. Source codeà https://huggingface.co/tiiuaeGPT NeoX and GPT-J: An open-source reproduction of the OpenAI’s GPT series developed by Eulether AI, with respectively 40 and 6 billion parameters. Source codeà https://huggingface.co/EleutherAI/gpt-neox-20b and https://huggingface.co/EleutherAI/gpt-j-6bOpenLLaMa: As for the previous class of models, also this one is an open-source reproduction of Meta AI’s LLaMA and has 3.7 billion parameters. Source codeàhttps://github.com/openlm-research/open_llamaIf you are interested in getting deeper into those models and their performance, you can reference the Hugging Face leaderboard here.Image1: Hugging Face Leaderboard Now, LLMs are great, yet to unlock their real power we need them to be positioned within an applicative logic. In other words, we want our LLMs to infuse intelligence within our applications.For this purpose, we will be using LangChain, a powerful lightweight SDK which makes it easier to integrate and orchestrate LLMs within applications. LangChain is one of the most popular LLMs orchestrators, yet if you want to explore further packages I encourage you to read about Semantic Kernel and Jarvis.One of the nice things about LangChain is its integration with external tools: those might be OpenAI (and other LLMs vendors), data sources, search APIs, and so on. In this article, we are going to explore how LangChain makes it easier to leverage open-source LLMs by leveraging its integration with the Hugging Face Hub.Welcome to the realm of open source LLMsThe Hugging Face Hub serves as a comprehensive platform comprising more than 120k models, 20kdatasets, and 50k demo apps (Spaces), all of which are openly accessible and shared as open-source projects. It provides an online environment where developers can effortlessly collaborate and collectively develop machine learning solutions. Thanks to LangChain, it is way easier to start interacting with open-source LLMs. Plus, you can also surround those models with all the libraries provided by LangChain in terms of prompt design, memory retention, chain management, and so on.Let’s see an implementation with Python. To reproduce the code, make sure to have: Python 3.7.1 or higheràyou can check your Python version running python --version in your terminalLangChain installedàyou can install it via pip install langchainThe huggingface_hub Python package installedà you can install it via pip install huggingface_butHugging Face Hub API keyàto get the API key, you can register into the portal here and then generate your secret key.For this example, I’m going to use the lightest version of Dolly, developed by Databricks and available in three sizes: 3, 7, and 12 billion parameters.from langchain import HuggingFaceHub from getpass import getpass HUGGINGFACEHUB_API_TOKEN = "your-api-key" import os os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN repo_id = " databricks/dolly-v2-3b" llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={"temperature":0, "max_length":64})As you can see from the above code, the only information we need are our Hugging Face Hub API key and the model’s repo ID; then, LangChain will take care of initializing our model thanks to the direct integration with Hugging Face Hub.Now that we have initialized our model, it is time to define the structure of the prompt:from langchain import PromptTemplate, LLMChain template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate(template=template, input_variables=["question"]) llm_chain = LLMChain(prompt=prompt, llm=llm)Finally, we can feed our model with a first question:question = "In the first movie of Harry Potter, what is the name of the three-headed dog? “ print(llm_chain.run(question)) Output: The name of the three-headed dog in Harry Potter and the Philosopher Stone is Fuffy.Even though I tested the light version of Dolly with “only” 3 billion parameters, it came with pretty accurate results. Of course, for more complex tasks or real-world projects, heavier models might be taken into consideration, like the one emerging as top performers in the Hugging Face leaderboard mentioned at the beginning of this article.ConclusionThe realm of open-source LLM is growing exponentially, and this creates a vibrant environment of experimentation and tuning from which anyone can benefit. Plus, some interesting trends are rising, like the reduction of the number of models’ parameters in favour of an increase in quality of the training dataset. In fact, we saw that the current top performer among the open source model is Falcon LLM, with “only” 40 billion parameters, which gained its strength from the high-quality training dataset. Finally, with the development of orchestration frameworks like LangChain and similar, it’s getting easier and easier to leverage open source LLMs and integrate them into our applications. Referenceshttps://huggingface.co/docs/hub/indexOpen LLM Leaderboard — a Hugging Face Space by HuggingFaceH4Hugging Face Hub — 🦜🔗 LangChain 0.0.189Overview (huggingface.co)stabilityai (Stability AI) (huggingface.co)Stability-AI/StableLM: StableLM: Stability AI Language Models (github.com)Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLink - Medium LinkedIn

0
0
2023

article-image-using-langchain-for-large-language-model-powered-applications

Avratanu Biswas

15 Jun 2023

5 min read

Using LangChain for Large Language Model — Powered Applications

Avratanu Biswas

15 Jun 2023

5 min read

This article is the second part of a series of articles, please refer to Part 2 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionLangChain is a powerful and open-source Python library specifically designed to enhance the usability, accessibility, and versatility of Large Language Models (LLMs) such as GPT-3 (Generative Pre-trained Transformer 3), BERT(Bidirectional Encoder Representations from Transformers), BLOOM (BigScience Large Open-science Open-access Multilingual Language Model). It provides developers with a comprehensive set of tools to seamlessly combine multiple prompts, creating a harmonious orchestra for working with LLMs effortlessly. The project was initiated by Harrison Chase, with the first commit made in late October 2022. In just a few weeks, LangChain gained immense popularity within the open-source community. Image 1: The popularity of the LangChain Python libraryLangChain for LLMsTo fully grasp the fundamentals of LangChain and utilize it effectively — understanding the fundamentals of LLMs is essential. In simple terms, LLMs are sophisticated language models or AI systems that have been extensively trained on massive amounts of text data to comprehend and generate human-like language. Albeit their powerful capabilities, LLMs are generic in nature i.e. lacking domain-specific knowledge or expertise. For instance, when addressing queries in fields like medicine or law, while an LLM can provide general insights, it, however, may struggle to offer in-depth or nuanced responses that require specialized expertise. Alongside such limitations, LLMs are susceptible to biases and inaccuracies present in training data which can yield contextually plausible, yet incorrect outputs. This is where LangChain shines — serving as an open-source library that leverages the power of LLMs and mitigates their drawbacks by providing abstractions and a diverse range of modules, akin to Lego blocks, thus facilitating intuitive integration with other tools and knowledge bases.In brief, LangChain presents a useful approach for handling text data, wherein the initial step involves preprocessing of the large corpus by segmenting it into smaller chunks or summaries. These chunks are then transformed into vector representations, enabling efficient comparisons and retrieval of similar chunks when questions are posed. This approach of preprocessing, real-time data collection, and interaction with the LLM is not only applicable to the specific context but can also be effectively utilized in other scenarios like code and semantic search.Image 2 - Typical workflow of Langchain ( Image created by Author)A typical workflow of LangChain involves several steps that enable efficient interaction between the user, the preprocessed text corpus, and the LLM. Notably, the strengths of LangChain lie in its provision of an abstraction layer, streamlining the intricate process of composing and integrating these text components, thereby enhancing overall efficiency and effectiveness.Key Attributes offered by LangChainThe core concept behind LangChain is its ability to connect a “Chain of thoughts” around LLMs, as evident from its name. However, LangChain is not limited to just a few LLMs — it provides a wide range of components that work together as building blocks for advanced use cases involving LLMs. Now, let’s delve into the various components that the LangChain library offers, making our work with LLMs easier and more efficient.Image 3: LangChain features at a glance. (Image created by Author)Prompts and Prompt Templates: Prompts refer to the inputs or queries we send to LLMs. As we have experienced with ChatGPT, the quality of the response depends heavily on the prompt. LangChain provides several functionalities to simplify the construction and handling of prompts. A prompt template consists of multiple parts, including instructions, content, and queries.Models: While LangChain itself does not provide LLMs, it leverages various Language Models (such as GPT3 and BLOOM, discussed earlier), Chat Models (like get-3.5-turbo), and Text Embedding Models (offered by CohereAI, HuggingFace, OpenAI).Chains: Chains are an end-to-end wrapper around multiple individual components, playing a major role in LangChain. The two most common types of chains are LLM chains and vector index chains.Memory: By default, Chains in LangChain are stateless, treating each incoming query or input independently without retaining context (i.e., lacking memory). To overcome this limitation, LangChain assists in both short-term memory (using previous conversational messages or summarised messages) and long-term memory (managing the retrieval and updating of information between conversations).Indexes: Index modules provide various document loaders to connect with different data resources and utility functions to seamlessly integrate with external vector databases like Pinecone, ChromoDB, and Weaviate, enabling smooth handling of large arrays of vector embeddings. The types of vector indexes include Document Loaders, Text Splitters, Retriever, and Vectorstore.Agents: While the sequence of chains is often deterministic, in certain applications, the sequence of calls may not be deterministic, with the next step depending on the user input and previous responses. Agents utilize LLMs to determine the appropriate actions and their orders. Agents perform these tasks using a suite of tools.Limitations on LangChain usageAbstraction challenge for debugging: The comprehensive abstraction provided by LangChain poses challenges for debugging as it becomes difficult to comprehend the underlying processes.Higher token consumption due to prompt coupling: Coupling a chain of prompts when executing multiple chains for a specific task often leads to higher token consumption, making it less cost-effective. Increased latency and slower performance: The latency period experienced when using LangChain in applications with agents or tools is higher, resulting in slower performance.Overall, LangChain provides a broad spectrum of features and modules that greatly enhance our interaction with LLMs. In the subsequent sections, we will explore the practical usage of LangChain and demonstrate how to build simple demo web applications using its capabilities.Referenceshttps://docs.langchain.com/docs/https://github.com/hwchase17/langchain https://medium.com/databutton/getting-started-with-langchain-a-powerful-tool-for-working-with-large-language-models-286419ba0842https://medium.com/@avra42/how-to-build-a-personalized-pdf-chat-bot-with-conversational-memory-965280c160f8AuthorAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter YouTube Medium GitHub

0
0
2365

article-image-creating-a-langchain-agent-azure-openai-python-with-the-react-approach

Valentina Alto

11 Jun 2023

17 min read

Creating a LangChain Agent: Azure OpenAI & Python with the ReAct Approach

Valentina Alto

11 Jun 2023

17 min read

0
0
2106

article-image-falcon-llm-the-dark-horse-in-open-source-llm-race

Valentina Alto

07 Jun 2023

6 min read

Falcon LLM: The Dark Horse in Open Source LLM Race

Valentina Alto

07 Jun 2023

6 min read

Discover the ground-breaking capabilities of the Falcon Language Model (LLM) in natural language processing. This article presents an architectural overview of Falcon LLM, highlighting its transformer-based design and distinctive features. Gain practical guidance on leveraging Falcon LLM's power effectively, including fine-tuning techniques and optimization strategies. We also address ethical considerations and responsible AI deployment. Whether you're a researcher, developer, or simply curious about cutting-edge language models, this article provides valuable insights to harness the full potential of Falcon LLM.Foundation models and LLMsWhen we talk about Generative AI models, we are talking about a new generation of deep learning models called Foundation models. Foundation models are pre-trained AI models that can be fine-tuned for specific tasks.Foundational Models In the specific case of ChatGPT and similar models, we talk about Large language models (LLMs), a subset of Foundation models specifically designed for natural language processing tasks. Models like GPT-4 are examples of LLMs that can generate human-like text, answer questions, translate languages, and more.LLMs are characterized by huge training sets and a number of parameters of the network. To make an example, GPT-3 has been trained on almost 500 billion tokens and has 175 billion parameters. However, models with such a high number of parameters are heavy, both in the training phase and inference phase. This also implies a high computational cost, being needed GPU-powered hardware, and a lot of training time. That’s why a new trend has emerged lately, that is the one of building lighter models (with fewer parameters) focusing rather on the quality of the training dataset.Introducing Falcon LLMOne of the latest models of this new trend is Falcon LLM, an open-source model launched by Abu Dhabi’s Technology Innovation Institute (TII) that as of now (June 2023) ranks 1 globally in the latest Hugging Face independent verification of open-source AI models: Open LLM Leaderboard — a Hugging Face Space by HuggingFaceH4Falcon LLM has been trained on 1 trillion tokens and has 40 billion parameters (even though it has also been released a lighter version with 7 billion parameters). So the question might be: how can a model with “only” 40 billion parameters perform so well? In fact, the answer is in the quality of the dataset.Falcon was developed using specialized tools and incorporates a unique data pipeline capable of extracting valuable content from web data. The pipeline was designed to extract high-quality content by employing extensive filtering and deduplication techniques.The resulting dataset, called RefinedWeb, has been released by TII under the Apache-2.0 license and can be found here →https://huggingface.co/datasets/tiiuae/falcon-refinedweb.Plus, the architecture of Falcon was meticulously fine-tuned for optimal performance and efficiency. By combining superior data quality with these optimizations, Falcon achieves remarkable performance while utilizing around 75% of the training compute budget of the GPT-3. Furthermore, it requires only a fifth of the computing resources during inference.A decoder-only (Falcon LLM) architectureFalcon LLM is a decoder-only model, but what does it mean?Source: https://arxiv.org/abs/1706.03762 The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper in 2017. We have the “encoder”, which has the task to represent the input into a lower-dimensional space; on the right-hand side, we have the “decoder”, which has the task to translate back to the original data format the lower-dimensional data provided by the encoder.While the original transformer architecture was made of both the components — encoder and decoder, in last years, AI labs and companies shifted towards a new architecture made of a decoder-only framework. To name one example, the OpenAI’s GPT-3 is made of a decoder-only architecture.The key distinction between the Decoder-only architecture and the Encoder-Decoder architecture lies in the absence of a separate encoder responsible for summarizing the input information. Instead, in the Decoder-only architecture, the decoder’s hidden state implicitly encodes the relevant information and is continually updated at each step of the generation process.How to use Falcon LLMAs it is an open-source model, you can try Falcon LLM directly from the frontend provided on the Hugging Face site:Hugging face frontendPlus, you can download the model using Python:!pip install torch from transformers import AutoTokenizer, AutoModelForCausalLM import transformers import torch #model = "tiiuae/falcon-40b" model = "tiiuae/falcon-7b" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", ) sequences = pipeline( "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:", max_length=200, do_sample=True, top_k=10, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id, ) for seq in sequences: print(f"Result: {seq['generated_text']}")Depending on your hardware capacity, you can decide to use either the 40b or the 7b parameters model. Also, note that the 7b version of the model is trained in English and French only.ConclusionsLLMs are extremely powerful, and they have seen an exponential growth in their number of parameters in the last few years. Nevertheless, we are quickly approaching towards a hard cap that is the computational capacity needed. Henceforth, it is pivotal to start exploring new ways of making LLMs less “large” yet more accurate, as TII is achieving with Falcon LLM. This implies a major focus on the quality of the training set, which massively impacts on the performance of the model.Falcon LLM paper will be released soon, so stay tuned to learn more about this amazing model!Referenceshttps://huggingface.co/datasets/tiiuae/falcon-refinedwebhttps://falconllm.tii.ae/Open LLM Leaderboard — a Hugging Face Space by HuggingFaceH4Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Valentina is also the author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLinks - Medium LinkedIn

0
0
1199

How-To Tutorials - LLM

Exploring Token Generation Strategies

Text Classification with Transformers

Detecting Anomalies Using LLM Sentence Embeddings

Deploying LLM Models in Kubernetes with KFServing

Building a Containerized LLM Chatbot Application

Hands-On tutorial on how to use Pinecone with LangChain

How Open-Source Language Models Could Reshape the Tech Industry

BloombergGPT: Putting Finance to Work using Large Language Models

Building and deploying Web App using LangChain

How to work with LangChain Python modules

Trending Topics

Democratizing AI with Stability AI’s Initiative, StableLM

Making the best out of Hugging Face Hub using LangChain

Using LangChain for Large Language Model — Powered Applications

Creating a LangChain Agent: Azure OpenAI & Python with the ReAct Approach

Falcon LLM: The Dark Horse in Open Source LLM Race