Discover the ground-breaking capabilities of the Falcon Language Model (LLM) in natural language processing. This article presents an architectural overview of Falcon LLM, highlighting its transformer-based design and distinctive features. Gain practical guidance on leveraging Falcon LLM's power effectively, including fine-tuning techniques and optimization strategies. We also address ethical considerations and responsible AI deployment. Whether you're a researcher, developer, or simply curious about cutting-edge language models, this article provides valuable insights to harness the full potential of Falcon LLM.
When we talk about Generative AI models, we are talking about a new generation of deep learning models called Foundation models. Foundation models are pre-trained AI models that can be fine-tuned for specific tasks.
Foundational Models
In the specific case of ChatGPT and similar models, we talk about Large language models (LLMs), a subset of Foundation models specifically designed for natural language processing tasks. Models like GPT-4 are examples of LLMs that can generate human-like text, answer questions, translate languages, and more.
LLMs are characterized by huge training sets and a number of parameters of the network. To make an example, GPT-3 has been trained on almost 500 billion tokens and has 175 billion parameters. However, models with such a high number of parameters are heavy, both in the training phase and inference phase. This also implies a high computational cost, being needed GPU-powered hardware, and a lot of training time. That’s why a new trend has emerged lately, that is the one of building lighter models (with fewer parameters) focusing rather on the quality of the training dataset.
One of the latest models of this new trend is Falcon LLM, an open-source model launched by Abu Dhabi’s Technology Innovation Institute (TII) that as of now (June 2023) ranks 1 globally in the latest Hugging Face independent verification of open-source AI models:
Open LLM Leaderboard — a Hugging Face Space by HuggingFaceH4
Falcon LLM has been trained on 1 trillion tokens and has 40 billion parameters (even though it has also been released a lighter version with 7 billion parameters). So the question might be: how can a model with “only” 40 billion parameters perform so well? In fact, the answer is in the quality of the dataset.
Falcon was developed using specialized tools and incorporates a unique data pipeline capable of extracting valuable content from web data. The pipeline was designed to extract high-quality content by employing extensive filtering and deduplication techniques.
The resulting dataset, called RefinedWeb, has been released by TII under the Apache-2.0 license and can be found here →https://huggingface.co/datasets/tiiuae/falcon-refinedweb.
Plus, the architecture of Falcon was meticulously fine-tuned for optimal performance and efficiency. By combining superior data quality with these optimizations, Falcon achieves remarkable performance while utilizing around 75% of the training compute budget of the GPT-3. Furthermore, it requires only a fifth of the computing resources during inference.
Falcon LLM is a decoder-only model, but what does it mean?
Source: https://arxiv.org/abs/1706.03762
The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper in 2017. We have the “encoder”, which has the task to represent the input into a lower-dimensional space; on the right-hand side, we have the “decoder”, which has the task to translate back to the original data format the lower-dimensional data provided by the encoder.
While the original transformer architecture was made of both the components — encoder and decoder, in last years, AI labs and companies shifted towards a new architecture made of a decoder-only framework. To name one example, the OpenAI’s GPT-3 is made of a decoder-only architecture.
The key distinction between the Decoder-only architecture and the Encoder-Decoder architecture lies in the absence of a separate encoder responsible for summarizing the input information. Instead, in the Decoder-only architecture, the decoder’s hidden state implicitly encodes the relevant information and is continually updated at each step of the generation process.
As it is an open-source model, you can try Falcon LLM directly from the frontend provided on the Hugging Face site:
Hugging face frontend
Plus, you can download the model using Python:
!pip install torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
#model = "tiiuae/falcon-40b"
model = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Depending on your hardware capacity, you can decide to use either the 40b or the 7b parameters model. Also, note that the 7b version of the model is trained in English and French only.
LLMs are extremely powerful, and they have seen an exponential growth in their number of parameters in the last few years. Nevertheless, we are quickly approaching towards a hard cap that is the computational capacity needed. Henceforth, it is pivotal to start exploring new ways of making LLMs less “large” yet more accurate, as TII is achieving with Falcon LLM. This implies a major focus on the quality of the training set, which massively impacts on the performance of the model.
Falcon LLM paper will be released soon, so stay tuned to learn more about this amazing model!
Valentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.
Valentina is also the author of the book: Modern Generative AI with ChatGPT and OpenAI Models