Limitations of ML
ML is very powerful, but it’s not the answer to every single problem. There are problems that ML is just not suitable for, and there are some cases where ML can’t be applied due to technical or business constraints. As an ML practitioner, it is important to develop the ability to find relevant business problems where ML can provide significant value instead of applying it blindly everywhere. Additionally, there are algorithm-specific limitations that can render an ML solution not applicable in some business applications. In this section, we will learn about some common limitations of ML that should be kept in mind while finding relevant use cases.
Keep in mind that the limitations we are discussing in this section are very general. In real-world applications, there are more limitations possible due to the nature of the problem we are solving. Some common limitations that we will discuss in detail are as follows:
- Data-related concerns
- Deterministic nature of problems
- Lack of interpretability and reproducibility
- Concerns related to cost and customizations
- Ethical concerns and bias
Let’s now deep dive into each of these common limitations.
Data-related concerns
The quality of an ML model highly depends upon the quality of the training data it is provided with. Data present in the real world is often noisy, incomplete, unlabeled, and sometimes unusable. Moreover, most supervised learning algorithms require large amounts of properly labeled training data to produce good results. The training data requirements of some algorithms (e.g., deep learning) are so high that even manually labeling data is not an option. And even if we manage to label the data manually, it is often error-prone due to human bias.
Another major issue is incompleteness or missing data. For example, consider the problem of automatic speech recognition. In this case, model results are highly biased toward the accent present in the training dataset. A model that is trained on the American accent doesn’t produce good results on other accented speech. Since accents change significantly as we travel to different parts of the world, it is hard to gather and label relevant amounts of training data for every possible accent. For this reason, developing a single speech recognition model that works for everyone is not yet feasible, and thus, the tech giants providing speech recognition solutions often develop accent-specific models. Developing a new model for each new accent is not very scalable.
Deterministic nature of problems
ML has achieved great success in solving some highly complex problems, such as numerical weather prediction. One problem with most of the current ML algorithms is that they are stochastic in nature and thus cannot be trusted blindly when the problem is deterministic. Considering the case of numerical weather prediction, today we have ML models that can predict rain, wind speed, air pressure, and so on, with acceptable accuracy, but they completely fail to understand the physics behind real weather systems. For example, an ML model might provide negative value estimations of parameters such as density.
However, it is very likely that these kinds of limitations can be overcome in the near future. Future research in the field of ML might discover new algorithms that are smart enough to understand the physics of our world. Such models will open infinite possibilities in the future.
Lack of interpretability and reproducibility
One major issue with many ML algorithms (and often with neural networks) is the lack of interpretability of results. Many business applications, such as fraud detection and disease prediction, require a justification for model results. If an ML model classifies a financial transaction as fraud, it should also provide solid evidence for the decision; otherwise, this output may not be useful for the business. Deep learning or neural network models often lack interpretability, and the explainability of such models is an active area of research. Multiple methods have been developed for model interpretability or explainability purposes. Though these methods can provide some insights into the results, they are still far from the actual requirements.
Reproducibility, on the other hand, is another complex and growing issue with ML solutions. Some of the latest research papers might show us great improvements in results using some technological advancements on a fixed set of datasets, but the same method may not work in real-world scenarios. Secondly, ML models are often unstable, which means that they produce different results when trained on different partitions of the dataset. This is a challenging situation because models developed for one business segment may be completely useless for another business segment, even though the underlying problem statement is similar. This makes them less reusable.
Concerns related to cost and customizations
Developing and maintaining ML solutions is often expensive, more so in the case of deep learning algorithms. Development costs may come from employing highly skilled developers as well as the infrastructure needed for data analytics and ML experimentation. Deep learning models usually require high-compute resources such as GPUs and TPUs for training and experimentation. Running a hyperparameter tuning job with such models is even more costly and time-consuming. Once the model is ready for production, it requires dedicated resources for deployment, monitoring, and maintenance. This cost further increases as you scale your deployments to serve a large number of customers, and even more if there are very low latency concerns. Thus, it is very important to understand the value that our solution is going to bring before jumping into the development phase and check whether it is worth the investment.
Another concern with the ML solutions is their lack of customizations. ML models are often very difficult to customize, meaning it can be hard to change their parameters or make them adapt to new datasets. Pre-built general-purpose ML solutions often do not work well on specific business use cases, and this leaves them with two choices – either to develop the solution from scratch or customize the prebuilt general-purpose solutions. Though the customization of prebuilt models seems like a better choice here, even the customization is not easy in the case of ML models. ML model customization requires a skilled set of data engineers and ML specialists with a deep understanding of technical concepts such as deep learning, predictive modeling, and transfer learning.
Ethical concerns and bias
ML is quite powerful and is adopted today by many organizations to guide their business strategy and decisions. As we know, some of these ML algorithms are black boxes; they may not provide reasons behind their decisions. ML systems are trained on a finite set of datasets, and they may not apply to some real-world scenarios; if those scenarios are encountered in the future, we can’t tell what decision the ML system will take. There might be ethical concerns related to such black-box decisions. For example, if a self-driving car is involved in a road accident, whom should you blame – the driver, the team that developed the AI system, or the car manufacturer? Thus, it is clear that the current advancements in ML and AI are not suitable for ethical or moral decision-making. Also, we need a framework to solve ethical concerns involving ML and AI systems.
The accuracy and speed of ML solutions are often commendable, but these solutions cannot always be trusted to be fair and unbiased. Consider AI software that recognizes faces or objects in a given image; this system could go wrong on photos where the camera is not able to capture racial sensitivity properly, or it may classify a certain type of dog (that is somewhat similar to a cat) as a cat. This kind of bias may come from a biased set of training or testing datasets used for developing AI systems. Data present in the real world is often collected and labeled by humans; thus, the bias that exists in humans is transferred into AI systems. Avoiding bias completely is impossible as we all are humans and are thus biased, but there are measures that can be taken to reduce it. Establishing a culture of ethics and building teams from diverse backgrounds can be a good step to reduce bias to a certain extent.