Packt+ | Advance your knowledge in tech

You're reading from Go Machine Learning Projects Eight projects demonstrating end-to-end machine learning and predictive analytics applications in Go

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781788993401

Length 348 pages

Edition 1st Edition

Languages

Tools

OpenCV

Concepts

Machine Learning

Author (1):

Xuanyi Chew

View More author details

All models are wrong; but some are useful.

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law

relating pressure , volume

and temperature

of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".

- George Box (1978)

Model train are a fairly common hobby, despite being lampooned by the likes of The Big Bang Theory. A model train is not a real train. For one, the sizes are different. Model trains do not work exactly the same way a real train does. There are gradations of model trains, each being more similar to actual trains than the previous.

A model is in that sense a representation of reality. What do we represent it with? By and large, numbers. A model is a bunch of numbers that describes reality, and a bit more.

Every time I try to explain what a model is I inevitably get responses along the lines of "You can't just reduce us to a bunch of numbers!". So what do I mean "numbers"?

Consider the following right angle triangle:

How do we describe all right angle triangles? We might say something like this:

This says that the sum of all angles in a right angle adds up to 180 degrees, and there exists an angle that is 90 degrees. This is sufficient to describe all right angle triangles in Cartesian space.

But is the description the triangle itself? It is not. This issue has plagued philosophers ever since the days of Aristotle. We shall not enter into a philosophical discussion for such a discussion will only serve to prolong the agony of this chapter.

So for our purposes in this chapter, we'll say that a model is the values that describe reality, and the algorithm that produces those values. These values are typically numbers, though they may be of other types as well.

A model is a bunch of values that describe the world, alongside the program that produces these values. That much, we have concluded from the previous section. Now we have to pass some value judgments on models - whether a model is good or bad.

A good model needs to describe the world accurately. This is said in the most generic way possible. Described thus, this statement about a good model encompasses many notions. We shall have to make this abstract idea a bit more concrete to proceed.

A machine learning algorithm is trained on a bunch of data. To the machine, this bunch of data is the world. But to us, the data that we feed the machine in for training is not the world. To us humans, there is much more to the world than what the machine may know about. So when I say "a good model needs to describe the world accurately", there are two senses to the word "world" that applies - the world as the machine knows, and the world as we know it.

The machine has only seen portions of the world as we know it. There are parts of the world the machine has not seen. So it is then a good machine learning model when it is able to provide the correct outputs for inputs it has not seen yet.

As a concrete example, let's once again suppose that we have a machine learning algorithm that determines if an image is that of a hot dog or not. We feed the model images of hot dogs and hamburgers. To the machine, the world are simply images of hot dogs and hamburgers. What happens when we pass in as input, an image of vegetables? A good model would be able to generalize and say it's not a hot dog. A poor model would simply crash.

And thus with this analogy, we have defined a good model to be one that generalizes well to unknown situations.

Often, as part of the process of building machine learning systems, we would want to put this notion to test. So we would have to split our dataset into testing and training datasets. The machine would be trained on the training dataset, and to test how good the model is once the training has completed, we will then feed in the testing dataset to the machine. It's assumed of course that the machine has never seen the testing dataset. A good machine learning model hence would be able to generate the correct output for the testing dataset, despite never having seen it.

You're reading from Go Machine Learning Projects Eight projects demonstrating end-to-end machine learning and predictive analytics applications in Go

Table of Contents (12) Chapters

What is a model?

What is a good model?

Authors (1)

Other recommended products

Personalised recommendations for you