The amount of text you vectorize matters!
The vector we showed earlier came from the text What are the advantages of using RAG?
. That is a relatively short amount of text, which means a 1,536-dimension vector is going to do a very thorough job representing the context within that text. But if we go back to the code, the content that we vectorize to represent our data comes from here:
loader = WebBaseLoader( web_paths=("https://kbourne.github.io/chapter1.html",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ), ) docs = loader.load()...