Databricks Dolly for Future AI Adoption

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

Introduction

Artificial intelligence is playing an increasingly crucial role in assisting businesses and organizations to process huge volumes of data that the world is producing. The development of huge language models to evaluate enormous amounts of text data is one of the largest challenges in AI research. Databricks Dolly revolutionized the Databricks project, opening the door for more complex NLP models and improving the field of AI technology.

Databricks Dolly for AI

Before we deep dive into Databricks Dolly and its impact on the future of AI adoption, let’s understand the basics of Large Language Models and their current challenges.

Large Language Models & Databricks Dolly

An artificial intelligence system called a large language model is used to produce human-like language and comprehend natural language processing activities. These models are created using deep learning methods and are trained on a lot of text input using a neural network design. Its major objective is to produce meaningful and coherent text from a given prompt or input. There are many uses for this, including speech recognition, chatbots, language translation, etc.

They have gained significant popularity because of below capabilities :

Text Generation
Language Translation
Classification and Categorization
Conversational AI

Recently ChapGPT from OpenAI, Google Bard, and Bing have created unified models for training and fine-tuning such models at a large scale. Now the issue with these LLMs is that they save user data on external servers, opening the cloud to unauthorized users and increasing the risk of sensitive data being compromised. Additionally, They may provide irrelevant information that could potentially injure users and lead to poor judgments, as well as offensive, discriminating content against certain individuals.

In order to overcome this challenge, there is a need for open-source alternatives that promote the accuracy, and security of Large Language Models. The Databricks team has built Databricks Dolly, an open-source chatbot that adheres to these criteria and performs exceptionally in a variety of use situations, in response to these requirements after carefully examining user issues.

Databricks Dolly can produce text by responding to questions, summarising ideas, and other natural language commands. It is built on an open-source, 6-billion-parameter model from EleutherAI that has been modified using the databricks-dolly-15k dataset of user-generated instructions. Due to Dolly's open-source nature and commercial licensing, anyone can use it to build interactive applications without having to pay for API access or divulge their data to outside parties. Dolly may be trained for less than $30, making construction costs low. Data can be saved in the DBFS root or another cloud object storage location that we specify when Dolly generates an answer. Using Dolly, we can design, construct, and personalize LLM without sharing any data.

databricks-dolly-for-future-ai-adoption-img-0

Image 1 - Databricks Dolly Differentiators

Democratizing the magic of Databricks Dolly

With Databricks Dolly , we can manage the below types of engagements.

1. Open & Close ended Question and Answers

2. Information Parsing from web

3. Detailed Answers based on the input

4. Creative Writing

Now, Let’s see in detail how we can use Databricks dolly.

Step 1 : Install Required Libraries

Use the below command in Databricks notebook or use cmd to install the required packages.

%pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"

databricks-dolly-for-future-ai-adoption-img-1

Image 2 - Databricks Dolly Package Installation

As you can see from the image, once we execute this command in Databricks, the required packages are installed.

Accelerate : Accelerate the training of machine learning models
Transformers : Collection of pre-trained models for NLP activities
Torch : To build and train deep learning models

Step 2 : Input to the Databricks Dolly

Once the model is loaded, the next step is to generate text based on the generate_next function.

databricks-dolly-for-future-ai-adoption-img-2

Image 3 - Databricks Dolly - Create Pipeline for remote code execution

Here, the pipeline function from the Transformers library is used to execute the NLP tasks such as text generation, sentiment analysis, etc. Option trust_remote_code is used for the remote code execution.

Step 3 : Pipeline reference to parse the output

databricks-dolly-for-future-ai-adoption-img-3

Image 4 -Databricks Dolly - Create a Pipeline for remote code execution

Now, the final step is to provide the textual input to the model using the generate_text function to which will use the language model to generate the response.

Best Practices of Using Databricks Dolly

Be specific and lucid in your instructions to Dolly
Use Databricks Machine Learning Models to train and deploy Dolly for a scalable and faster execution
Use the hugging face library and repo which has multiple tutorials and examples

Conclusion

This article describes the difficulties that organizations have adopting Large Language Models and how Databricks may overcome these difficulties by utilising Dolly. Dolly gives businesses the ability to create a customized LLM that meets their unique requirements and has the added benefit of having open-source source code. In order to maximize LLM performance, the article also highlights the significance of recommended practices.

Author Bio:

Sagar Lad is a Cloud Data Solution Architect with a leading organization and has deep expertise in designing and building Enterprise-grade Intelligent Azure Data and Analytics Solutions. He is a published author, content writer, Microsoft Certified Trainer, and C# Corner MVP.

Link - Medium , Amazon , LinkedIn