Prompt engineering – the art of getting more with less
At this point in the book, and in your project, you should have a lot invested in your new foundation model. From compute costs to datasets, custom code, and research papers you’ve read, you might have spent a solid 50-100 hours or more of your own time eking out performance gains. Kudos to you! It’s a great life to live.
After you’ve done this, however, and especially after you’ve learned how to build a complete application around your model, it’s time to maximize your model’s performance on inference. In the last chapter, we learned about multiple ways to optimize your model’s runtime, from compilation to quantization and distillation to distribution, and each of these is helpful in speeding up your inference results. This entire chapter, however, is dedicated to getting the most accurate response you can. Here, I use the word “accurate” heuristically to...