Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Practical Machine Learning
Practical Machine Learning

Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials

Arrow left icon
Profile Icon Sunila Gollapudi
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.9 (19 Ratings)
Paperback Jan 2016 468 pages 1st Edition
eBook
€20.99 €30.99
Paperback
€38.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Sunila Gollapudi
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.9 (19 Ratings)
Paperback Jan 2016 468 pages 1st Edition
eBook
€20.99 €30.99
Paperback
€38.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.99 €30.99
Paperback
€38.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Practical Machine Learning

Chapter 2. Machine learning and Large-scale datasets

We have seen a dramatic change in the way data has been handled in the recent past with the advent of big data. The field of Machine learning has seen the need to include scaling up strategies to handle the new age data requirements. This actually means that some of the traditional Machine learning implementations will not all be relevant in the context of big data now. Infrastructure and tuning requirements are now the challenges with the need to store and process large scale data complimented by the data format complexities.

With the evolution of hardware architectures, accessibility of cheaper hardware with distributed architectures and new programming paradigms for simplified parallel processing options, which can now be applied to many learning algorithms, we see a rising interest in scaling up the Machine learning systems.

The topics listed next are covered in-depth in this chapter:

  • An introduction to big data and typical...

Big data and the context of large-scale Machine learning

I have covered some of the core aspects of big data in my previous Packt book titled Getting Started with Greenplum for Big Data Analytics. In this section, we will quickly recap some of the core aspects of big data and its impact in the field of Machine learning:

  • The definition of large-scale is a scale of terabytes, petabytes, exabytes, or higher. This is typically the volume that cannot be handled by traditional database engines. The following chart lists the orders of magnitude that represents data volumes:

    Multiples of bytes

    SI decimal prefixes

    Binary Usage

    Name(Symbol)

    Value

    Kilobyte (KB)

    103

    210

    Megabyte (MB)

    106

    220

    Gigabyte (GB)

    109

    230

    Terabyte (TB)

    1012

    240

    Petabyte (PB)

    1015

    250

    Exabyte (EB)

    1018

    260

    Zettabyte (ZB)

    1021

    270

    Yottabyte (YB)

    1024

    280

  • Data formats that are referred to in this context are distinct; they are generated and consumed, and need not be structured (for example,...

Algorithms and Concurrency

Let's now look at some basics of algorithms in general, the time complexity; and the order of magnitude measurements, before we start talking about building concurrency in executing algorithms, then explore the approaches to parallelizing algorithms.

An algorithm can be defined as a sequence of steps that takes an input to produce the desired output. They are agnostic technology representations; let's look at a sorting algorithm example:

Input: A sequence of n number—a1, a2, …,an
Output: A permutation (reordering)—a1', a2', …,an' such that a1'<=a2'<=… <=an'

The following algorithm is an insertion-sort algorithm:

INSERTION-SORT(A)
1. for j = 2 to length[A]
2. dokey<-A[j]
3. //insert A[j] to sorted sequence A[1..j-1]
4. i<-j-1
5. while i>0 and A[i]>key
6. do A[i+1] <- A[i] //move A[i] one position right
7. i<-i-1
8. A[i+1]<-key

For measuring the time and space complexity...

Technology and implementation options for scaling-up Machine learning

In this section, we will explore some parallel programming techniques and distributed platform options that Machine learning implementations can adopt. The Hadoop platform will be introduced in the next chapter, and we will look into some practical examples starting from Chapter 3, An Introduction to Hadoop's Architecture and Ecosystem with some real-world examples.

MapReduce programming paradigm

MapReduce is a parallel programming paradigm that abstracts the parallelizing computing and data complexities in a distributed computing environment. It works on the concept of taking the compute function to the data rather than taking the data to the compute function.

MapReduce is more of a programming framework that comes with many built-in functions that the developer need not worry about building, and can alleviate many implementation complexities like data partitioning, scheduling, managing exceptions, and intersystem...

Summary

In this chapter we have explored the qualifiers of large datasets, their common characteristics, the problems of repetition, and the reasons for the hyper-growth in volumes; in fact, the big data context.

The need for applying conventional Machine learning algorithms to large datasets has given rise to new challenges for Machine learning practitioners. Traditional Machine learning libraries do not quite support, processing huge datasets. Parallelization using modern parallel computing frameworks, such as MapReduce, have gained popularity and adoption; this has resulted in the birth of new libraries that are built over these frameworks.

The concentration was on methods that are suitable for massive data, and have potential for the parallel implementation. The landscape of Machine learning applications has changed dramatically in the last decade. Throwing more machines doesn't always prove to be a solution. There is a need to revisit traditional algorithms and models in the way...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Fully-coded working examples using a wide range of machine learning libraries and tools, including Python, R, Julia, and Spark
  • Comprehensive practical solutions taking you into the future of machine learning
  • Go a step further and integrate your machine learning projects with Hadoop

Description

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.

Who is this book for?

This book has been created for data scientists who want to see machine learning in action and explore its real-world application. With guidance on everything from the fundamentals of machine learning and predictive analytics to the latest innovations set to lead the big data revolution into the future, this is an unmissable resource for anyone dedicated to tackling current big data challenges. Knowledge of programming (Python and R) and mathematics is advisable if you want to get started immediately.

What you will learn

  • Implement a wide range of algorithms and techniques for tackling complex data
  • Get to grips with some of the most powerful languages in data science, including R, Python, and Julia
  • Harness the capabilities of Spark and Hadoop to manage and process data successfully
  • Apply the appropriate machine learning technique to address real-world problems
  • Get acquainted with Deep learning and find out how neural networks are being used at the cutting-edge of machine learning
  • Explore the future of machine learning and dive deeper into polyglot persistence, semantic data, and more

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jan 30, 2016
Length: 468 pages
Edition : 1st
Language : English
ISBN-13 : 9781784399689
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jan 30, 2016
Length: 468 pages
Edition : 1st
Language : English
ISBN-13 : 9781784399689
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 75.98
Python Machine Learning
€36.99
Practical Machine Learning
€38.99
Total 75.98 Stars icon

Table of Contents

15 Chapters
1. Introduction to Machine learning Chevron down icon Chevron up icon
2. Machine learning and Large-scale datasets Chevron down icon Chevron up icon
3. An Introduction to Hadoop's Architecture and Ecosystem Chevron down icon Chevron up icon
4. Machine Learning Tools, Libraries, and Frameworks Chevron down icon Chevron up icon
5. Decision Tree based learning Chevron down icon Chevron up icon
6. Instance and Kernel Methods Based Learning Chevron down icon Chevron up icon
7. Association Rules based learning Chevron down icon Chevron up icon
8. Clustering based learning Chevron down icon Chevron up icon
9. Bayesian learning Chevron down icon Chevron up icon
10. Regression based learning Chevron down icon Chevron up icon
11. Deep learning Chevron down icon Chevron up icon
12. Reinforcement learning Chevron down icon Chevron up icon
13. Ensemble learning Chevron down icon Chevron up icon
14. New generation data architectures for Machine learning Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.9
(19 Ratings)
5 star 52.6%
4 star 15.8%
3 star 10.5%
2 star 10.5%
1 star 10.5%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon buyer Sep 21, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Very informative.
Amazon Verified review Amazon
Santosh Korrapati Dec 19, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is Superhuman effort to explain machine learrning so lucidly and interestingly. I simply loved it.
Amazon Verified review Amazon
Grypho Mar 01, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book covers most of the basic concepts and a wide range of technologies and methods in the field of Machine Learning. Theory is This book covers most of the basic concepts and a wide range of technologies and methods in the field of Machine Learning. Theory is explained clearly but it's far too synthetic to be considered a textbook; instead it's perfect for a quick rehearsal of theory or just an introduction to the ML algorithms and tools. As implied in the "Practical" title, the code examples are not optional but should be considered as integral part of the book: they are clear (although sometimes too naive for real application) and reasonably well commented. To get the most out of this book the reader should already have a basic knowledge of machine learning concepts and possibly some confidence with the technology she/he wants to use (Java/Scala/Python/Julia/R...).DISCLOSURE: I received a complimentary copy by the editor in order to read it and write my genuine and sincere review
Amazon Verified review Amazon
Brad Aug 14, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a really outstanding book that gives you an explanation of what machine learning is in the most direct way. So many machine learning books are bogged down in boring dry theory that really doesnt give you an intuitive and high level view of what you are doing. This book cuts through the clutter and explains in very practical terms what you are trying to do. The other reviews criticize the command of english of the author, I did not find anything that was vague or hard to understand and only came across a few minor errors in word choice that would indicate a foriegn speaker. Also the other reviewers criticized the coverage of so many different tools and programming langauges. All I can say is that this is a great addition to this book that all the major machine learning tools are covered...including Hadoop! Most machine learning books just use R because that is what has been used in statistics for years so there is a lot of support for it. But R is very limiting and obsolete. Python is currently being used as a stopgap replacement for R because Python is a very intuitive and versitile scripting language, but the Python is really not made for statistics or machine learning and has no built-in support for it and requires libraries like Scikit to function. Julia is a new language that is expressly designed for data processing that will eventually take over machine learning and eliminate Python and R. So all of these languages are covered in the book so this book will prepare someone do deal with all the technologies currently in use to process machine learning data. No other machine learning book that I am aware of can say that. I do have a criticism for this book, I do wish the author went into a little more detail in describing the machine learning algorithms, by using diagrams and more practical examples. I was almost tempted to give this book 4 stars for this lacking, but overall the book is so far ahead of all the other machine learning books out there that I have to give it 5 stars.
Amazon Verified review Amazon
Ravi Mar 09, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I was the technical reviewer for this book. This book will help the readers in understanding machine learning, implement algorithms and also play around with technology stack using R, Python, Mahout, Hadoop. It covers major aspects in practical machine learning techniques with real-world examples across multiple verticals. This book will immensely help a rookie to be a star in machine learning and can also compete in kaggle competition. For the ones who are good at machine learning, this book will expose you to the multiple approaches you can use to solve a problem.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.