Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning With Go

You're reading from   Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks, and More using the Go Programming Language

Arrow left icon
Product type Paperback
Published in Sep 2017
Publisher Packt
ISBN-13 9781785882104
Length 304 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Joseph Langstaff Whitenack Joseph Langstaff Whitenack
Author Profile Icon Joseph Langstaff Whitenack
Joseph Langstaff Whitenack
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Gathering and Organizing Data 2. Matrices, Probability, and Statistics FREE CHAPTER 3. Evaluation and Validation 4. Regression 5. Classification 6. Clustering 7. Time Series and Anomaly Detection 8. Neural Networks and Deep Learning 9. Deploying and Distributing Analyses and Models 10. Algorithms/Techniques Related to Machine Learning

Handling data - Gopher style

In comparison to many other languages that are used for data science/analysis, Go provides a very strong foundation for data manipulation and parsing. Although other languages (for example, Python or R) may allow users to quickly explore data interactively, they often promote integrity-breaking convenience, that is, dynamic and interactive data exploration often results in code that behaves strangely when applied more generally.

Take, for instance, this simple CSV file:

1,blah1
2,blah2
3,blah3

It is true that, very quickly, we can write some Python code to parse this CSV and output the maximum value from the integer column without even knowing what types are in the data:

import pandas as pd

# Define column names.
cols = [
'integercolumn',
'stringcolumn'
]

# Read in the CSV with pandas.
data = pd.read_csv('myfile.csv', names=cols)

# Print out the maximum value in the integer column.
print(data['integercolumn'].max())

This simple program will print the correct result:

$ python myprogram.py
3

We now remove one of the integer values to produce a missing value, as shown here:

1,blah1
2,blah2
,blah3

The Python program consequently has a complete breakdown in integrity; specifically, the program still runs, doesn't tell us that anything went differently, still produces a value, and produces a value of a different type:

$ python myprogram.py
2.0

This is unacceptable. All but one of our integer values could disappear, and we wouldn't have any insight into the changes. This could produce profound changes in our modeling, but they would be extremely hard to track down. Generally, when we opt for the conveniences of dynamic types and abstraction, we are accepting this sort of variability in behavior.

The important thing here is not that you cannot handle such behavior in Python, because Python, experts will quickly recognize that you can properly handle such behavior. The point is that such conveniences do not promote integrity by default, and thus, it is very easy to shoot yourself in the foot.

On the other hand, we can leverage Go's static typing and explicit error handling to ensure that our data is parsed as expected. In this small example, we can also write some Go code, without too much trouble, to parse our CSV (don't worry about the details right now):

// Open the CSV.
f, err := os.Open("myfile.csv")
if err != nil {
log.Fatal(err)
}

// Read in the CSV records.
r := csv.NewReader(f)
records, err := r.ReadAll()
if err != nil {
log.Fatal(err)
}

// Get the maximum value in the integer column.
var intMax int
for _, record := range records {

// Parse the integer value.
intVal, err := strconv.Atoi(record[0])
if err != nil {
log.Fatal(err)
}

// Replace the maximum value if appropriate.
if intVal > intMax {
intMax = intVal
}
}

// Print the maximum value.
fmt.Println(intMax)

This will produce the same correct result for the CSV file with all the integer values present:

$ go build
$ ./myprogram
3

But in contrast to our previous Python code, our Go code will inform us when we encounter something that we don't expect in the input CSV (for the case when we remove the value 3):

$ go build
$ ./myprogram
2017/04/29 12:29:45 strconv.ParseInt: parsing "": invalid syntax

Here, we have maintained integrity, and we can ensure that we can handle missing values in a manner that is appropriate for our use case.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image