Although raw data can come from a variety of sources and in a wide range of formats, it will help us to think of all data fundamentally as arrays of numbers. For example, images can be thought of as simply 2D arrays of numbers representing pixel brightness across an area. Sound clips can be thought of 1D arrays of intensity over time. For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to machine learning.
If you have mostly been using OpenCV's C++ application programming interface (API) and plan on continuing to do so, you might find that dealing with data in C++ can be a bit of a pain. Not only will you have to deal with the syntactic overhead of the C++ language, but you will also have to wrestle with different data types and cross-platform compatibility issues.
This process is radically...