Principle 1 – data should be the center of ML development
As we discussed in Chapter 2, From Model-Centric to Data-Centric – ML’s Evolution, the predominant model-centric approach is lacking in several ways: computing and storage have been commoditized, algorithms have become practically automated and highly data-dependent, models are accessible but less malleable, and deep learning and AutoML tools are available everywhere. But the data? Well, that’s still the wildcard.
Rather than relying on powerful computing and storage environments and sophisticated algorithms that demand excess amounts of data to give us the incremental uplift in model accuracy, a better approach is to be driven by data – specifically, by the data that is available and relevant to the problem at hand.
Data is unique to every company, problem, and situation, and the data-centric paradigm recognizes this by putting the spotlight and development efforts on the data before...