From AI predictions to generative AI
The intent of this section is to provide a brief overview of artificial intelligence, highlighting our initial experiences with it. In the early 2000s, AI started to become more tangible for consumers. For example, in 2001, Google introduced the “Did you mean?” feature (https://blog.google/intl/en-mena/product-updates/explore-get-answers/25-biggest-moments-in-search-from-helpful-images-to-ai/), which suggests spelling corrections. This was one of Google’s first applications of machine learning and one of the early AI features that the general public got to experience on a large scale.
Over the following years, AI systems became more sophisticated, especially in areas like computer vision, speech-to-text conversion, and text-to-speech synthesis. Working in the telecom industry helped me witness the innovation driven by speech-to-text in particular. Integrating speech-to-text capabilities into interactive voice response (IVR) systems led to better user experiences by allowing people to speak their requests rather than punch numbers into a keypad. For example, you could be calling a bank where you would be welcomed by a message asking you to say “balance” to check your balance, “open account” in order to open an account, etc. Nowadays we are seeing more and more implementations of AI, simplifying more complex and time-consuming tasks.
The exponential increase in available computing power, paired with the massive datasets needed to train machine learning models, unleashed new AI capabilities. In the 2010s, AI started matching and even surpassing human performance on certain tightly defined tasks like image classification.
The advent of generative AI has reignited interest and innovation in the AI field, introducing new approaches for exploring use cases and system integration. Models like Gemini, PaLM, Claude, DALL-E, OpenAI GPT, and Stable Diffusion showcase the ability of AI systems to generate synthetic text, images, and other media. The outputs exhibit creativity and imagination that capture the public’s attention. However, the powerful capabilities of generative models also highlight new challenges around system design and responsible deployment. There is a need to rethink integration patterns and architecture to support safe, robust, and cost-effective implementations. Specifically, issues around security, bias, toxicity, and misinformation must be addressed through techniques like dataset filtering, human-in-the-loop systems, enhanced monitoring, and immediate remediation.
As generative AI continues maturing, best practices and governance frameworks must evolve in tandem. Industry leaders have formed partnerships like the Content Authenticity Initiative to develop technical standards and policy guidance around the responsible development of the next iteration of AI. This technology’s incredible potential, from accelerating drug discovery to envisioning new products, can only be realized through a commitment to transparency, ethics, and human rights. Constructive collaboration that balances innovation with caution is imperative.
Generative AI marks an inflection point for the field. The ripples from this groundswell of creative possibility are just beginning to reach organizations and communities. Maintaining an open, evidence-driven dialogue around not just capabilities but also challenges lays a foundation for AI deployment that empowers people, unlocks new utility, and earns widespread trust.
We are witnessing an unprecedented democratization of generative AI capabilities through publicly accessible APIs from established companies like Google, Meta, and Amazon, and startups such as Anthropic, Mistral AI, Stability AI, and OpenAI. The table below summarizes several leading models that provide versatile foundations for natural language and image generation.
Just a few years ago, developing with generative AI required specialized expertise in deep learning and access to vast computational resources. Now, models like Gemini, Claude, GPT-4, DALL-E, and Stable Diffusion can be accessed via simple API calls at near-zero cost. The bar for experimentation has never been lower.
This commoditization has sparked an explosion of new applications leveraging these pre-trained models – from creative tools for content generation to process automation solutions infused with AI. Expect integrations with generative foundations across all industries in the coming months and years.
Models are becoming more knowledgeable, with broader capabilities and reasoning that will reduce hallucinations and increase accuracy across model responses. Multimodality is also gaining traction, with models able to ingest and generate content across text, images, audio, video, and 3D scenes. In terms of scalability, model size and context windows continue expanding exponentially; for example, Google’s Gemini 1.5 now supports a context window of 1 million tokens.
Overall, the outlook points to a future where generative AI will become deeply integrated into most technologies. These models introduce new efficiencies and automation potential and inspire creativity across nearly every industry imaginable.
The table below highlights some of the most popular LLMs and their providers. The purpose of the table is to highlight the vast number of options available on the market at the time of writing this book. We expect this table to quickly become outdated by the time of publication and highly encourage readers to dive deep into the model providers’ websites to stay up to date with any new launches.
Model |
Provider |
Landing Page |
Gemini |
|
|
Claude |
Anthropic |
|
ChatGPT |
OpenAI |
|
Stable Diffusion |
Stability AI |
|
Mistral |
Mistral AI |
|
LLaMA |
Meta |
Table 1.1: Overview of popular LLMs and their providers