Applications that are beyond the current state of the art
This section talks about several applications that are not yet possible, but that are theoretically feasible. In some cases, they could probably be achieved if the right training data and computing resources were available. In other cases, they might require some new algorithmic insights. In all of these examples, it is very interesting to think about how these and other futuristic applications might be accomplished.
Processing very long documents
Current LLMs have relatively small limits on the length of documents (or prompts) they can process. For example, GPT-4 can only handle texts of up to 8,192 tokens (https://platform.openai.com/docs/models/gpt-4), which is around 16 single-spaced pages. Clearly, this means that many existing documents can’t be fully analyzed with these cloud systems. If you are doing a typical classification task, you can train your own model, for example, with a Term frequency-inverse document...