To get the most out of this book
Let me share a few tips that will help along the path you are about to begin:
- The book comes with multiple step-by-step, hands-on tutorials that are integral to the development path I have designed for you. Some of the most subtle—and fascinating—aspects of data analytics (like the need to interact with business partners and go back-and-forth during the setup process) can only be understood through real examples: tutorials do a great job explaining them. I strongly suggest you set some quality time for completing each tutorial. The entire execution of a tutorial will take up to two hours: make sure you have access to a computer on which you can install all the required software.
- At the end of the book, you will find a short series of useful resources organized by chapter. They offer an opportunity to complement your learning experience with selected additional readings. So don't forget to skim through them after you complete a chapter and see if any of them intrigues you.
- Depending on your background, some parts of the book might feel less "natural" to you. This is normal. Don't get discouraged if any portion of the book is less clear: the chapters are all closely interconnected to each other, and you might find the answers to your doubts in some subsequent pages, so just keep going until the end.
- Use the book's GitHub and KNIME Hub pages to download all the data you need to complete the tutorials. In there, you will also find the final result of each tutorial (like the complete KNIME workflow or the resulting Power BI dashboard). If you feel lost, you can refer to them to find your way forward.
- Software improves and updates continuously. This book relies on the latest versions of KNIME and Power BI available at the time of its launch (precisely: KNIME 4.4 and Power BI 2.93). Although the bulk of the content will stay valid for a while, it might be that some of the steps in a tutorial change slightly, making the windows a bit different versus what you find in the figures. This is unavoidable and will not jeopardize your learning. Keep an eye on the book's web page for any errata or addenda I post in case of any significant divergences due to a new version of the software. You can also get in touch with me using the contact details you find in the And Now? section.
- This book is targeted toward the "business application" of data analytics. For this reason, whenever I had to choose between a rigorous mathematical dissertation and a pragmatic and intuitive explanation of an analytical method, I went for the latter. The rationale is that you will always have the time to learn how to make statistical learning more accurate and dive deeper into the math behind the algorithms you will learn. Thus, the focus will stay on empowering you to use analytics in your work more than giving you the formal description behind each mathematical concept.
Download the data files
All the data and the supporting files related to the tutorials presented in the book are also hosted on GitHub at https://github.com/PacktPublishing/Data-Analytics-Made-Easy. The data and the completed KNIME workflows are also available on KNIME Hub, at the address http://tiny.cc/knimehub.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801074155_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText
: Indicates user input, code words in text, highlighted keywords in code, paths, and file names. For example: "In the configuration window, type the expression $Quantity$*$Price$
, to calculate revenues." In this book, you will find only a handful of blocks of example code in Chapter 9, Extending Your Toolbox. They will look like this:
predictions = model.predict(test_set)
print('R2 score is',r2_score(test_set.Rent,predictions))
print('Root Mean Squared Error is', \
np.sqrt(mean_squared_error(test_set.Rent,predictions)))
Bold: Indicates a new term, an important word, a KNIME node, or words that you see on the screen, like in menus or dialog boxes. For example: "Click OK to close the window and execute the CSV Reader node."
Italic: Is used to emphasize specific words in the context of a sentence and when referring to columns in a dataset, like in: "Neighborhood is the single most useful column when predicting the rent, followed by the Surface of the property."
Warnings, important notes, or interesting facts appear like this.
Tips and tricks appear like this.