Using the LlamaHub data loaders to ingest content
Apart from the Wikipedia reader that we discussed in the previous chapter, to get a better understanding of how data readers work, let’s look at a few more examples of LlamaHub readers that we can use to ingest data.
Ingesting data from a web page
SimpleWebPageReader
can extract text content from web pages.
To use it, we must first install the corresponding integration:
pip install llama-index-readers-web
Once installed, it’s really easy to use:
from llama_index.readers.web import SimpleWebPageReader urls = ["https://docs.llamaindex.ai"] documents = SimpleWebPageReader().load_data(urls) for doc in documents: print(doc.text)
This loads and displays the text content of the specified web pages into documents.
At its core, SimpleWebPageReader
serves as a bridge between the vast, unstructured world of the internet and the structured environment of the LlamaIndex RAG pipeline...