What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Clojure Data Analysis Cookbook - Second Edition

Chapter 2. Cleaning and Validating Data

In this chapter, we will cover the following recipes:

Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip

What you will learn

Read data from a variety of data formats

Transform data to make it more useful and easier to analyze

Process data concurrently and in parallel for faster performance

Harness multiple computers to analyze big data

Use powerful data analysis libraries such as Incanter, Hadoop, and Weka to get things done quickly

Apply powerful clustering and data mining techniques to better understand your data

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

€48.99

€41.99

Clojure Data Analysis Cookbook - Second Edition

€45.99

Total € 136.97

Fabio Mancinelli Jun 03, 2015

This is a very interesting book with plenty of practical recipes about Data Analysis.The author goes through the different phases of Data Analysis, starting from the low level details about how to read data from actual sources, continuing on how to clean it up to obtain meaningful results and presenting different algorithms for actually performing data analysis.The recipes go from querying, aggregating data and displaying, to statistical analysis and machine learning (clustering and classification).Recipes are presented in a very clear way, and they give the reader a clear context where to apply them, how they work, actual working code to experiment with and some additional reference for getting more in-depth information.I really appreciated the fact the author is well versed both in the theorical aspect and in Clojure programming. He presents very important details about advanced topics like parallelism, concurrency and laziness, and warn the reader about the pitfalls to be aware of. For example when talking about lazy-data-reading he clearly explains how to correcly handle underlying resources explicitly showing the source of potential issues.I read also the first edition and I've found that almost all the common recipes have been updated, and also a new chapter about unstructured and textual data has been added.Even though I am not a data analyst this book was very clear, and gave me a lot of insights about how to deal with data. I will for sure apply some of these recipes in my daily work to make more sense of what happens in what I manage.

Amazon Verified review

armel esnault Apr 10, 2015

Clojure Data Analysis Cookbook, Second Edition by EricRochester. Format ebook PDFTwo years after the first version Eric Rochester has published anupdated version of his book "Clojure Data Analysis Cookbook".The book gives a nice overview of data analysis in the clojureprogramming language. It provides hundreds of useful tips on varioussoftware such as Incanter: the clojure statistics platform or Weka: ajava platform for machine learning.The examples provided are easy to test assuming you have a basicknowledge of clojure (especially regarding the repl interaction). Asmost examples are independent from each other it is easy to pickrecipes without having to follow the whole chapter from the beginning.The first two chapter deal with the importation and the validation ofdata using common format such as XML,JSON,CSV or RDF. It appears to bevery useful as it is a mandatory step (usually the first) of dataanalysis. While it is not very complicated to do everything byyourself those tips may save you some time.Clojure has a very good concurrency and parallel model by default,which are usually covered in every clojure introduction book but you will stillfind information in this book that you don't find inothers. I particularly like the Monte Carlo and simulated annealing methodsthat find optimal partition size for parallel processing.Chapter 8 is perhaps less interresting becausse it covers Clojureinteraction with Mathematica and the R language. and the only reason touse them is when a big library or framework is not available inclojure and you have some constraints (e.g. time or performance) that preventyou from implementing it in clojure.As the title implies the book will not make you autonomous on dataanalysis but it would be good to have some tips and examples on how todesign and build a full scale data analysis oriented application.A good book for discovering and playing with data in Clojure.