Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Practical Machine Learning
Practical Machine Learning

Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials

eBook
$9.99 $40.99
Paperback
$50.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Practical Machine Learning

Chapter 2. Machine learning and Large-scale datasets

We have seen a dramatic change in the way data has been handled in the recent past with the advent of big data. The field of Machine learning has seen the need to include scaling up strategies to handle the new age data requirements. This actually means that some of the traditional Machine learning implementations will not all be relevant in the context of big data now. Infrastructure and tuning requirements are now the challenges with the need to store and process large scale data complimented by the data format complexities.

With the evolution of hardware architectures, accessibility of cheaper hardware with distributed architectures and new programming paradigms for simplified parallel processing options, which can now be applied to many learning algorithms, we see a rising interest in scaling up the Machine learning systems.

The topics listed next are covered in-depth in this chapter:

  • An introduction to big data and typical...

Big data and the context of large-scale Machine learning

I have covered some of the core aspects of big data in my previous Packt book titled Getting Started with Greenplum for Big Data Analytics. In this section, we will quickly recap some of the core aspects of big data and its impact in the field of Machine learning:

  • The definition of large-scale is a scale of terabytes, petabytes, exabytes, or higher. This is typically the volume that cannot be handled by traditional database engines. The following chart lists the orders of magnitude that represents data volumes:

    Multiples of bytes

    SI decimal prefixes

    Binary Usage

    Name(Symbol)

    Value

    Kilobyte (KB)

    103

    210

    Megabyte (MB)

    106

    220

    Gigabyte (GB)

    109

    230

    Terabyte (TB)

    1012

    240

    Petabyte (PB)

    1015

    250

    Exabyte (EB)

    1018

    260

    Zettabyte (ZB)

    1021

    270

    Yottabyte (YB)

    1024

    280

  • Data formats that are referred to in this context are distinct; they are generated and consumed, and need not be structured (for example,...

Algorithms and Concurrency

Let's now look at some basics of algorithms in general, the time complexity; and the order of magnitude measurements, before we start talking about building concurrency in executing algorithms, then explore the approaches to parallelizing algorithms.

An algorithm can be defined as a sequence of steps that takes an input to produce the desired output. They are agnostic technology representations; let's look at a sorting algorithm example:

Input: A sequence of n number—a1, a2, …,an
Output: A permutation (reordering)—a1', a2', …,an' such that a1'<=a2'<=… <=an'

The following algorithm is an insertion-sort algorithm:

INSERTION-SORT(A)
1. for j = 2 to length[A]
2. dokey<-A[j]
3. //insert A[j] to sorted sequence A[1..j-1]
4. i<-j-1
5. while i>0 and A[i]>key
6. do A[i+1] <- A[i] //move A[i] one position right
7. i<-i-1
8. A[i+1]<-key

For measuring the time and space complexity...

Technology and implementation options for scaling-up Machine learning

In this section, we will explore some parallel programming techniques and distributed platform options that Machine learning implementations can adopt. The Hadoop platform will be introduced in the next chapter, and we will look into some practical examples starting from Chapter 3, An Introduction to Hadoop's Architecture and Ecosystem with some real-world examples.

MapReduce programming paradigm

MapReduce is a parallel programming paradigm that abstracts the parallelizing computing and data complexities in a distributed computing environment. It works on the concept of taking the compute function to the data rather than taking the data to the compute function.

MapReduce is more of a programming framework that comes with many built-in functions that the developer need not worry about building, and can alleviate many implementation complexities like data partitioning, scheduling, managing exceptions, and intersystem...

Summary

In this chapter we have explored the qualifiers of large datasets, their common characteristics, the problems of repetition, and the reasons for the hyper-growth in volumes; in fact, the big data context.

The need for applying conventional Machine learning algorithms to large datasets has given rise to new challenges for Machine learning practitioners. Traditional Machine learning libraries do not quite support, processing huge datasets. Parallelization using modern parallel computing frameworks, such as MapReduce, have gained popularity and adoption; this has resulted in the birth of new libraries that are built over these frameworks.

The concentration was on methods that are suitable for massive data, and have potential for the parallel implementation. The landscape of Machine learning applications has changed dramatically in the last decade. Throwing more machines doesn't always prove to be a solution. There is a need to revisit traditional algorithms and models in the way...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Fully-coded working examples using a wide range of machine learning libraries and tools, including Python, R, Julia, and Spark
  • Comprehensive practical solutions taking you into the future of machine learning
  • Go a step further and integrate your machine learning projects with Hadoop

Description

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.

Who is this book for?

This book has been created for data scientists who want to see machine learning in action and explore its real-world application. With guidance on everything from the fundamentals of machine learning and predictive analytics to the latest innovations set to lead the big data revolution into the future, this is an unmissable resource for anyone dedicated to tackling current big data challenges. Knowledge of programming (Python and R) and mathematics is advisable if you want to get started immediately.

What you will learn

  • Implement a wide range of algorithms and techniques for tackling complex data
  • Get to grips with some of the most powerful languages in data science, including R, Python, and Julia
  • Harness the capabilities of Spark and Hadoop to manage and process data successfully
  • Apply the appropriate machine learning technique to address real-world problems
  • Get acquainted with Deep learning and find out how neural networks are being used at the cutting-edge of machine learning
  • Explore the future of machine learning and dive deeper into polyglot persistence, semantic data, and more

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jan 30, 2016
Length: 468 pages
Edition : 1st
Language : English
ISBN-13 : 9781784394011
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jan 30, 2016
Length: 468 pages
Edition : 1st
Language : English
ISBN-13 : 9781784394011
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 99.98
Practical Machine Learning
$50.99
Python Machine Learning
$48.99
Total $ 99.98 Stars icon
Banner background image

Table of Contents

15 Chapters
1. Introduction to Machine learning Chevron down icon Chevron up icon
2. Machine learning and Large-scale datasets Chevron down icon Chevron up icon
3. An Introduction to Hadoop's Architecture and Ecosystem Chevron down icon Chevron up icon
4. Machine Learning Tools, Libraries, and Frameworks Chevron down icon Chevron up icon
5. Decision Tree based learning Chevron down icon Chevron up icon
6. Instance and Kernel Methods Based Learning Chevron down icon Chevron up icon
7. Association Rules based learning Chevron down icon Chevron up icon
8. Clustering based learning Chevron down icon Chevron up icon
9. Bayesian learning Chevron down icon Chevron up icon
10. Regression based learning Chevron down icon Chevron up icon
11. Deep learning Chevron down icon Chevron up icon
12. Reinforcement learning Chevron down icon Chevron up icon
13. Ensemble learning Chevron down icon Chevron up icon
14. New generation data architectures for Machine learning Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.9
(19 Ratings)
5 star 52.6%
4 star 15.8%
3 star 10.5%
2 star 10.5%
1 star 10.5%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon buyer Sep 21, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Very informative.
Amazon Verified review Amazon
Santosh Korrapati Dec 19, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is Superhuman effort to explain machine learrning so lucidly and interestingly. I simply loved it.
Amazon Verified review Amazon
Grypho Mar 01, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book covers most of the basic concepts and a wide range of technologies and methods in the field of Machine Learning. Theory is This book covers most of the basic concepts and a wide range of technologies and methods in the field of Machine Learning. Theory is explained clearly but it's far too synthetic to be considered a textbook; instead it's perfect for a quick rehearsal of theory or just an introduction to the ML algorithms and tools. As implied in the "Practical" title, the code examples are not optional but should be considered as integral part of the book: they are clear (although sometimes too naive for real application) and reasonably well commented. To get the most out of this book the reader should already have a basic knowledge of machine learning concepts and possibly some confidence with the technology she/he wants to use (Java/Scala/Python/Julia/R...).DISCLOSURE: I received a complimentary copy by the editor in order to read it and write my genuine and sincere review
Amazon Verified review Amazon
Brad Aug 14, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a really outstanding book that gives you an explanation of what machine learning is in the most direct way. So many machine learning books are bogged down in boring dry theory that really doesnt give you an intuitive and high level view of what you are doing. This book cuts through the clutter and explains in very practical terms what you are trying to do. The other reviews criticize the command of english of the author, I did not find anything that was vague or hard to understand and only came across a few minor errors in word choice that would indicate a foriegn speaker. Also the other reviewers criticized the coverage of so many different tools and programming langauges. All I can say is that this is a great addition to this book that all the major machine learning tools are covered...including Hadoop! Most machine learning books just use R because that is what has been used in statistics for years so there is a lot of support for it. But R is very limiting and obsolete. Python is currently being used as a stopgap replacement for R because Python is a very intuitive and versitile scripting language, but the Python is really not made for statistics or machine learning and has no built-in support for it and requires libraries like Scikit to function. Julia is a new language that is expressly designed for data processing that will eventually take over machine learning and eliminate Python and R. So all of these languages are covered in the book so this book will prepare someone do deal with all the technologies currently in use to process machine learning data. No other machine learning book that I am aware of can say that. I do have a criticism for this book, I do wish the author went into a little more detail in describing the machine learning algorithms, by using diagrams and more practical examples. I was almost tempted to give this book 4 stars for this lacking, but overall the book is so far ahead of all the other machine learning books out there that I have to give it 5 stars.
Amazon Verified review Amazon
Ravi Mar 09, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I was the technical reviewer for this book. This book will help the readers in understanding machine learning, implement algorithms and also play around with technology stack using R, Python, Mahout, Hadoop. It covers major aspects in practical machine learning techniques with real-world examples across multiple verticals. This book will immensely help a rookie to be a star in machine learning and can also compete in kaggle competition. For the ones who are good at machine learning, this book will expose you to the multiple approaches you can use to solve a problem.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.