Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Getting Started with Microsoft Guidance

Save for later
  • 8 min read
  • 28 Feb 2024

article-image

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!

Introduction

The emergence of a massive language model is a watershed moment in the field of artificial intelligence (AI) and natural language processing (NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.

Large Language Models (LLMs) such as GPT-3 and its descendants, like any technology or strategy, have downsides and limits. And, in order to use LLMs properly, ethically, and to their maximum capacity, it is critical to grasp their downsides and limitations. Unlike large language models such as GPT-4, which can follow the majority of commands. Language models that are not equivalently large enough (such as GPT-2, LLaMa, and its derivatives) frequently suffer from the difficulty of not following instructions adequately, particularly the part of instruction that asks for generating output in a specific structure. This causes a bottleneck when constructing a pipeline in which the output of LLMs is fed to other downstream functions.

Introducing Guidance - an effective and efficient means of controlling modern language models compared to conventional prompting methods. It supports both open (LLaMa, GPT-2, Alpaca, and so on) and closed LLMs (ChatGPT, GPT-4, and so on). It can be considered as a part of a larger ecosystem of tools for expanding the capabilities of language models.

Guidance uses Handlebars - a templating language. Handlebars allow us to build semantic templates effectively by compiling templates into JavaScript functions. Making it’s execution faster than other templating engines. Guidance also integrates well with Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a detailed notebook on the same. Also, in case you were to use OpenAI from Azure AI then Guidance has you covered - notebook.

Moving on to some of the outstanding features that Guidance offers. Feel free to check out the entire list of features.

Features

1. Guidance Acceleration - This addition significantly improves inference performance by efficiently utilizing the Key/Value caches as we proceed through the prompt by keeping a session state with the LLM inference. Benchmarking revealed a 50% reduction in runtime when compared to standard prompting approaches. Here’s the link to one of the benchmarking exercises. The below image shows an example of generating a character profile of an RPG game in JSON format. The green highlights are the generations done by the model, whereas the blue and no highlights are the ones that are copied as it is from the input prompt, unlike the traditional method that tries to generate every bit of it.

getting-started-with-microsoft-guidance-img-0

Source

Note: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.

2.  Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.

For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.

Input

“The link is <a href=http:”Actual Output: “The link is <a href=http: //www.google.com/search?q”

Expected Output: 

“The link is <a href=http://www.google.com/search?q” 

This is the exact bucket of problems that Token Healing tries to solve using the backtracking method. Feel free to check out this jupyter notebook for more examples.

3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at R$50/month. Cancel anytime

Let’s now see Guidance in action -

Installation

Installing guidance is a breeze, just do a pip :

$ pip install guidance

Assume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with this colab notebook with both the below examples.

Traditional Generation

Input:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently.
{ prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>,
prod_price: <should be an integer between 1 and 16. Should end with
suffix '$'> }
The product description is

Output:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. {   prod_id: <numeric value of 5 digits>,     prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and
16. Should end with suffix '$'> } The product description is { resentprod_id:
<numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_type

Here’s the code for the above example with GPT-2 language model -

``` from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large")
inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate(
**inputs, max_new_tokens=256, temperature=0.7, do_sample=True,
)

Output:

tokenizer.decode(tokens[0], skip_special_tokens=True))
```

Guidance Generation

Input w/ code:

guidance.llm = guidance.llms.Transformers("gpt-large")
# define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON.
The following is the format
```json {
"prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}",
"prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}",
"prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}"
}```""")
# execute the prompt
Output = program()

Output:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format

```json {
"prod_id": "11231",
"prod_name": "p_pizzas",
"prod_price": "11$"
}```

As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.

I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a consolidated notebook showcasing all the features of Guidance, feel free to check it out.

Author Bio

Prakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn