In the world of natural language processing, Large Language Models (LLMs) have proven to be highly successful at a variety of language-based tasks, such as machine translation, text summarization, question answering, reasoning, and code generation. LLMs like ChatGPT, GPT-4, and others have demonstrated outstanding performance by predicting the next token in a sequence based on input prompts. Users interact with these models by providing language instructions or examples to perform various downstream tasks. However, to achieve optimal results or adapt LLMs for specific tasks, complex and task-specific programs must be implemented, often requiring ad-hoc interactions and deep knowledge of the model's internals.
In this article, we discuss LMQL, a framework for Language Model Programming (LMP), that allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals using a declarative programming language similar to SQL. LMQL supports high-level, logical constraints and users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work to steer model generation, avoiding costly re-querying, and guiding the text generation process according to their specific criteria. Let’s start.
Language models (LMs) operate on sequences of tokens, where tokens are discrete elements that represent words or sub-words in a text. The process involves using a tokenizer to map input words to tokens, and then a language model predicts the probabilities of possible next tokens based on the input sequence. Various decoding methods are used in the LMs to output the right sequence of tokens from the language model's predictions out of which we can name:
It is also important to name that for beam searching and sampling there is a parameter named temperature which we can use to control the diversity of the output.
These techniques enable LMs to be versatile and perform a wide range of tasks without requiring task-specific training, making them powerful multi-task reasoners.
While LLMs can be prompted with examples or instructions, using them effectively and adapting to new models often demands a deep understanding of their internal workings, along with the use of vendor-specific libraries and implementations. Constrained decoding to limit text generation to legal words or phrases can be challenging. Many advanced prompting methods require complex interactions and control flows between the LLM and the user, leading to manual work and restricting the generality of implementations. Additionally, generating complete sequences from LLMs may require multiple calls and become computationally expensive, resulting in high usage costs per query in pay-to-use APIs. Generally, the challenges that can associated with creating proper promts for LLMs are:
Addressing these challenges, Language Model Programming and constraints offer new optimization opportunities. By defining behavior and limiting the search space, the number of LM invocations can be reduced. In this context, the cost of validation, parsing, and mask generation becomes negligible compared to the significant cost of a single LM call.
So the question arises, how can we overcome the challenges of implementing complex interactions and constraints with LLMs while reducing computational costs and retaining or improving accuracy on downstream tasks?
To address these challenges and enhance language model programming, a team of researchers has introduced LMQL (Language Model Query Language). LMQL is an open-source programming language and platform for LLM interaction that combines prompts, constraints, and scripting. It is designed to elevate the capabilities of LLMs like ChatGPT, GPT-4, and any future models, offering a declarative, SQL-like approach based on Python.
LMQL enables Language Model Programming (LMP), a novel paradigm that extends traditional natural language prompting by allowing lightweight scripting and output constraining. This separation of front-end and back-end interaction allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals. This approach abstracts away tokenization, implementation, and architecture details, making it more portable and easier to use across different LLMs.
With LMQL, users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work. The language supports high-level, logical constraints, enabling users to steer model generation and avoid costly re-querying and validation. By guiding the text generation process according to specific criteria, users can achieve the desired output with fewer iterations and improved efficiency.
Moreover, LMQL leverages evaluation semantics to automatically generate token masks for LM decoding based on user-specified constraints. This optimization reduces inference cost by up to 80%, resulting in significant latency reduction and lower computational expenses, particularly beneficial for pay-to-use APIs.
LMQL ddresses certain challenges in LM interaction and usage which are namely.
The LMQL syntax involves components such as the decoder, the actual query, the model to query, and the constraints. The decoder specifies the decoding procedure, which can include argmax, sample, or beam search. LMQL allows for constraints on the generated text using Python syntax, making it more user-friendly and easily understandable. Additionally, the distribution instruction allows users to augment the returned result with probability distributions, which is useful for tasks like sentiment analysis.
LMQL can be utilized in various ways - as a standalone language, in the Playground, or even as a Python library being the latter what we will demonstrate now. Integrating LMQL into Python projects allows users to streamline their code and incorporate LMQL queries seamlessly. Let's explore how to use LMQL as a Python library and understand some examples.
To begin, make sure you have LMQL and LangChain installed by running the following command:
!pip install lmql==0.0.6.6 langchain==0.0.225
You can then define and execute LMQL queries within Python using a simple approach. Decorate a Python function with the lmql.query decorator, providing the query code as a multi-line string. The decorated function will automatically be compiled into an LMQL query. The return value of the decorated function will be the result of the LMQL query.
Here's an example code snippet demonstrating this:
import lmql
import aiohttp
import os
os.environ['OPENAI_API_KEY'] = '<your-openai-key>'
@lmql.query
async def hello():
'''lmql
argmax
"Hello[WHO]"
from
"openai/text-ada-001"
where
len(TOKENS(WHO)) < 10
'''
print(await hello())
LMQL provides a fully asynchronous API that enables running multiple LMQL queries in parallel. By declaring functions as async with @lmql.query, you can use await to execute the queries concurrently.
The code below demonstrates how to look up information from Wikipedia and incorporate it into an LMQL prompt dynamically:
async def look_up(term):
# Looks up term on Wikipedia
url = f"<https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles={term}&origin=*>"
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
# Get the first sentence on the first page
page = (await response.json())["query"]["pages"]
return list(page.values())[0]["extract"].split(".")[0]
@lmql.query
async def greet(term):
'''
argmax
"""Greet {term} ({await look_up(term)}):
Hello[WHO]
"""
from
"openai/text-davinci-003"
where
STOPS_AT(WHO, "\\n")
'''
print((await greet("Earth"))[0].prompt)
As an alternative to @lmql.query you can use lmql.query(...)
as a function that compiles a provided string of LMQL code into a Python function.
q = lmql.query('argmax "Hello[WHO]" from "openai/text-ada-001" where len(TOKENS(WHO)) < 10')
await q()
LMQL queries can also be easily integrated into langchain's Chain components. This allows for sequential prompting using multiple queries.
pythonCopy code
from langchain import LLMChain, PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (ChatPromptTemplate, HumanMessagePromptTemplate)
from langchain.llms import OpenAI
# Setup the LM to be used by langchain
llm = OpenAI(temperature=0.9)
human_message_prompt = HumanMessagePromptTemplate(
prompt=PromptTemplate(
template="What is a good name for a company that makes {product}?",
input_variables=["product"],
)
)
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])
chat = ChatOpenAI(temperature=0.9)
chain = LLMChain(llm=chat, prompt=chat_prompt_template)
# Run the chain
chain.run("colorful socks")
Lastly, by treating LMQL queries as Python functions, you can easily build pipelines by chaining functions together. Furthermore, the guaranteed output format of LMQL queries ensures ease of processing the returned values using data processing libraries like Pandas.
Here's an example of processing the output of an LMQL query with Pandas:
pythonCopy code
import pandas as pd
@lmql.query
async def generate_dogs(n: int):
'''lmql
sample(n=n)
"""Generate a dog with the following characteristics:
Name:[NAME]
Age: [AGE]
Breed:[BREED]
Quirky Move:[MOVE]
"""
from
"openai/text-davinci-003"
where
STOPS_BEFORE(NAME, "\\n") and STOPS_BEFORE(BREED, "\\n") and
STOPS_BEFORE(MOVE, "\\n") and INT(AGE) and len(AGE) < 3
'''
result = await generate_dogs(8)
df = pd.DataFrame([r.variables for r in result])
df
By employing LMQL as a Python library, users can make their code more efficient and structured, allowing for easier integration with other Python libraries and tools.
LMQL can be used in various ways - as a standalone language, in the Playground, or even as a Python library. When integrated into Python projects, LMQL queries can be executed seamlessly. Below, we provide a brief overview of using LMQL as a Python library.
LMQL introduces an efficient and powerful approach to interact with language models, revolutionizing language model programming. By combining prompts, constraints, and scripting, LMQL offers a user-friendly interface for working with large language models, significantly improving efficiency and accuracy across diverse tasks. Its capabilities allow developers to leverage the full potential of language models without the burden of complex implementations, making language model interaction more accessible and cost-effective.
With LMQL, users can overcome challenges in LM interaction, including manual interactions, constraints on variable parts, and generalization of multi-part prompting. By automating the selection process and eager application of constraints during decoding, LMQL reduces the number of LM invocations, resulting in substantial time and cost savings. Moreover, LMQL's declarative, SQL-like approach simplifies the development process and abstracts away tokenization and implementation details, making it more portable and user-friendly.
In conclusion, LMQL represents a promising advancement in the realm of large language models and language model programming. Its efficiency, flexibility, and ease of use open up new possibilities for creating complex interactions and steering model generation without deep knowledge of the model's internals. By embracing LMQL, developers can make the most of language models, unleashing their potential across a wide range of language-based tasks with heightened efficiency and reduced computational costs.
Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.