In a paper, Google engineers have pointed out the various costs of maintaining a machine learning system. The paper, Hidden Technical Debt in Machine Learning Systems, talks about technical debt and other ML specific debts that are hard to detect or hidden.
They found that is common to incur massive maintenance costs in real-world machine learning systems. They looked at several ML-specific risk factors to account for in system design. These factors include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a number of system-level anti-patterns.
In traditional software engineering, setting strict abstractions boundaries helps in logical consistency among the inputs and outputs of a given component. It is difficult to set these boundaries in machine learning systems. Yet, machine learning is needed in areas where the desired behavior cannot be effectively expressed with traditional software logic without depending on data. This results in a boundary erosion in a couple of areas.
Machine learning systems mix signals together, entangle them and make isolated improvements impossible. Change to one input may change all the other inputs and an isolated improvement cannot be done. It is referred to as the CACE principle: Change Anything Changes Everything.
There are two possible ways to avoid this:
There are cases where a problem is only slightly different than another which already has a solution. It can be tempting to use the same model for the slightly different problem. A small correction is learned as a fast way to solve the newer problem. This correction model has created a new system dependency on the original model. This makes it significantly more expensive to analyze improvements to the models in the future. The cost increases when correction models are cascaded. A correction cascade can create an improvement deadlock.
Many times a model is made widely accessible that may later be consumed by other systems. Without access controls, these consumers may be undeclared, silently using the output of a given model as an input to another system. These issues are referred to as visibility debt. These undeclared consumers may also create hidden feedback loops.
Data dependencies can carry a similar capacity as dependency debt for building debt, only more difficult to detect. Without proper tooling to identify them, data dependencies can form large chains that are difficult to untangle.
They are of two types.
For moving along the process quickly, it is often convenient to use signals from other systems as input to your own. But some input signals are unstable, they can qualitatively or quantitatively change behavior over time. This can happen as the other system updates over time or made explicitly. A mitigation strategy is to create versioned copies.
Underutilized data dependencies are input signals that provide little incremental modeling benefit. These can make an ML system vulnerable to change where it is not necessary. Underutilized data dependencies can come into a model in several ways—via legacy, bundled, epsilon or correlated features.
Live ML systems often end up influencing their own behavior on being updated over time. This leads to analysis debt. It is difficult to predict the behavior of a given model before it is released in such a case. These feedback loops are difficult to detect and address if they occur gradually over time. This may be the case if the model is not updated frequently.
A direct feedback loop is one in which a model may directly influence the selection of its own data for future training. In a hidden feedback loop, two systems influence each other indirectly.
It is common for systems that incorporate machine learning methods to end up with high-debt design patterns.
Debt can also accumulate when configuring a machine learning system. A large system has a wide number of configurations with respect to features, data selection, verification methods and so on. It is common that configuration is treated an afterthought. In a mature system, config lines can be larger than the code lines themselves and each configuration line has potential for mistakes.
ML systems interact directly with the external world and the external world is rarely stable. Some measures that can be taken to deal with the instability are:
It is necessary to pick a decision threshold for a given model to perform some action. Either to predict true or false, to mark an email as spam or not spam, to show or not show a given advertisement.
Unit testing and end-to-end testing cannot ensure complete proper functioning of an ML system. For long-term system reliability, comprehensive live monitoring and automated response is critical. Now there is a question of what to monitor. The authors of the paper point out three areas as starting points—prediction bias, limits for actions, and upstream producers.
In addition to the mentioned areas, an ML system may also face debts from other areas. These include data testing debt, reproducibility debt, process management debt, and cultural debt.
Moving quickly often introduces technical debt. The most important insight from this paper, according to the authors is that technical debt is an issue that both engineers and researchers need to be aware of.
Paying machine learning related technical debt requires commitment, which can often only be achieved by a shift in team culture. Prioritizing and rewarding this effort which needs to be recognized is important for the long-term health of successful machine learning teams.
For more details, you can read the paper at NIPS website.
Uses of Machine Learning in Gaming
Julia for machine learning. Will the new language pick up pace?
Machine learning APIs for Google Cloud Platform