Questions
Answer the following questions with Yes or No:
- Does multimodal modular RAG handle different types of data, such as text and images?
- Are drones used solely for agricultural monitoring and aerial photography?
- Is the Deep Lake VisDrone dataset used in this chapter for textual data only?
- Can bounding boxes be added to drone images to identify objects such as trucks and pedestrians?
- Does the modular system retrieve both text and image data for query responses?
- Is building a vector index necessary for querying the multimodal VisDrone dataset?
- Are the retrieved images processed without adding any labels or bounding boxes?
- Is the multimodal modular RAG performance metric based only on textual responses?
- Can a multimodal system such as the one described in this chapter handle only drone-related data?
- Is evaluating images as easy as evaluating text in multimodal RAG?