What this book covers
Chapter 1, Importing Data for Analysis, covers how to read data from a variety of sources, including CSV files, web pages, and linked semantic web data.
Chapter 2, Cleaning and Validating Data, presents strategies and implementations to normalize dates, fix spelling, and work with large datasets. Getting data into a useable shape is an important, but often overlooked, stage of data analysis.
Chapter 3, Managing Complexity with Concurrent Programming, covers Clojure's concurrency features and how you can use them to simplify your programs.
Chapter 4, Improving Performance with Parallel Programming, covers how to use Clojure's parallel processing capabilities to speed up the processing of data.
Chapter 5, Distributed Data Processing with Cascalog, covers how to use Cascalog as a wrapper over Hadoop and the Cascading library to process large amounts of data distributed over multiple computers.
Chapter 6, Working with Incanter Datasets, covers the basics of working with Incanter datasets. Datasets are the core data structures used by Incanter, and understanding them is necessary in order to use Incanter effectively.
Chapter 7, Statistical Data Analysis with Incanter, covers a variety of statistical processes and tests used in data analysis. Some of these are quite simple, such as generating summary statistics. Others are more complex, such as performing linear regressions and auditing data with Benford's Law.
Chapter 8, Working with Mathematica and R, talks about how to set up Clojure in order to talk to Mathematica or R. These are powerful data analysis systems, and we might want to use them sometimes. This chapter will show you how to get these systems to work together, as well as some tasks that you can perform once they are communicating.
Chapter 9, Clustering, Classifying, and Working with Weka, covers more advanced machine learning techniques. In this chapter, we'll primarily use the Weka machine learning library. Some recipes will discuss how to use it and the data structures its built on, while other recipes will demonstrate machine learning algorithms.
Chapter 10, Working with Unstructured and Textual Data, looks at tools and techniques used to extract information from the reams of unstructured, textual data.
Chapter 11, Graphing in Incanter, shows you how to generate graphs and other visualizations in Incanter. These can be important for exploring and learning about your data and also for publishing and presenting your results.
Chapter 12, Creating Charts for the Web, shows you how to set up a simple web application in order to present findings from data analysis. It will include a number of recipes that leverage the powerful D3 visualization library.