RAG with Llama
We initialized meta-llama/Llama-2-7b-chat-hf
in the Installing the environment section. We must now create a function to configure Llama 2’s behavior:
def LLaMA2(prompt):
sequences = pipeline(
prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=100, # Control the output length more granularly
temperature=0.5, # Slightly higher for more diversity
repetition_penalty=2.0, # Adjust based on experimentation
truncation=True
)
return sequences
You can tweak each parameter to your expectations:
prompt
: The input text that the model uses to generate the output. It’s the starting point for the model’s response.do_sample
: A Boolean value (True
orFalse
). When set toTrue
, it enables stochastic sampling, meaning the model will pick tokens randomly based on their probability distribution, allowing...