Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning With Go

You're reading from   Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks, and More using the Go Programming Language

Arrow left icon
Product type Paperback
Published in Sep 2017
Publisher Packt
ISBN-13 9781785882104
Length 304 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Joseph Langstaff Whitenack Joseph Langstaff Whitenack
Author Profile Icon Joseph Langstaff Whitenack
Joseph Langstaff Whitenack
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Gathering and Organizing Data 2. Matrices, Probability, and Statistics FREE CHAPTER 3. Evaluation and Validation 4. Regression 5. Classification 6. Clustering 7. Time Series and Anomaly Detection 8. Neural Networks and Deep Learning 9. Deploying and Distributing Analyses and Models 10. Algorithms/Techniques Related to Machine Learning

Best practices for gathering and organizing data with Go

As you can see in the preceding section, Go itself provides us with an opportunity to maintain high levels of integrity in our data gathering, parsing, and organization. We want to ensure that we leverage Go's unique properties whenever we are preparing our data for machine learning workflows.

Generally, Go data scientists/analysts should follow the following best practices when gathering and organizing data. These best practices are meant to help you maintain integrity in your applications, and been able you to reproduce any analysis:

  1. Check for and enforce expected types: This might seem obvious, but it is too often overlooked when using dynamically typed languages. Although it is slightly verbose, explicitly parsing data into expected types and handling related errors can save you big headaches down the road.
  2. Standardize and simplify your data ingress/egress: There are many third-party packages for handling certain types of data or interactions with certain sources of data (some of which we will cover in this book). However, if you standardize the ways you are interacting with data sources, particularly centered around the use of stdlib, you can develop predictable patterns and maintain consistency within your team. A good example of this is a choice to utilize database/sql for database interactions rather than using various third-party APIs and DSLs.
  3. Version your data: Machine learning models produce extremely different results depending on the training data you use, your choice of parameters, and input data. Thus, it is impossible to reproduce results without versioning both your code and data. We will discuss the appropriate techniques for data versioning later in this chapter.
If you start to stray from these general principles, you should stop immediately. You are likely to sacrifice integrity for the sake of convenience, which is a dangerous road. We will let these principles guide us through the book and as we consider various data formats/sources in the following section.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime