Pipeline 2: Creating and populating the Deep Lake vector store
The pipeline in this section of Deep_Lake_LlamaIndex_OpenAI_RAG.ipynb
was built with the code of Pipeline 2 from Chapter 3. We can see that by creating pipelines as components, we can rapidly repurpose and adapt them to other applications. Also, Activeloop Deep Lake possesses in-built default chunking, embedding, and upserting functions, making it seamless to integrate various types of unstructured data, as in the case of the Wikipedia documents we are upserting.
The output of the display_record(record_number)
function shows how seamless the process is. The output displays the ID and metadata such as the file information, the data collected, the text, and the embedded vector:
ID:
['a61734be-fe23-421e-9a8b-db6593c48e08']
Metadata:
file_path: /content/data/24-hour_news_cycle.txt
file_name: 24-hour_news_cycle.txt
file_type: text/plain
file_size: 2763
creation_date: 2024-07-05
last_modified_date: 2024-07...