Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from R Data Mining Implement data mining techniques through practical use cases and real-world datasets

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781787124462

Length 442 pages

Edition 1st Edition

Languages

Tools

ggplot

Concepts

Data Mining

Author (1):

Andrea Cirillo

View More author details

Table of Contents (16) Chapters

Preface

1. Why to Choose R for Your Data Mining and Where to Start

2. A First Primer on Data Mining Analysing Your Bank Account Data FREE CHAPTER

3. The Data Mining Process - CRISP-DM Methodology

4. Keeping the House Clean – The Data Mining Architecture

5. How to Address a Data Mining Problem – Data Cleaning and Validation

6. Looking into Your Data Eyes – Exploratory Data Analysis

7. Our First Guess – a Linear Regression

8. A Gentle Introduction to Model Performance Evaluation

9. Don't Give up – Power up Your Regression Including Multiple Variables

10. A Different Outlook to Problems with Classification Models

11. The Final Clash – Random Forests and Ensemble Learning

12. Looking for the Culprit – Text Data Mining with R

13. Sharing Your Stories with Your Stakeholders through R Markdown

14. Epilogue

15. Dealing with Dates, Relative Paths and Functions

Developing wordclouds from text

We can make our first attempt to look at these words using the wordcloud package, which basically lets you obtain what you are thinking of: wordclouds.

To create a wordcloud, we just have to call the wordcloud() function, which requires two arguments:

words: The words to be plotted
frequency: The number of occurrences of each word

Let's do it:

comments_tidy %>%
count(word) %>%
with(wordcloud(word, n))

Reproduced in the plot are all the words stored within the comments_tidy object, with a size proportionate to their frequency. You should also be aware that the position of each word has no particular meaning hear.

What do you think about it? Not too bad, isn't it? Nevertheless, I can see too many irrelevant words, such as we and with. These words do not actually convey any useful information about the content...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Andrea Cirillo

Andrea Cirillo is currently working as an audit quantitative analyst at Intesa Sanpaolo Banking Group. He gained financial and external audit experience at Deloitte Touche Tohmatsu and internal audit experience at FNM, a listed Italian company. His main responsibilities involve the evaluation of credit risk management models and their enhancement, mainly within the field of the Basel III capital agreement. He is married to Francesca and is the father of Tommaso, Gianna, Zaccaria, and Filippo. Andrea has written and contributed to a few useful R packages such as updateR, ramazon, and paletteR, and regularly shares insightful advice and tutorials on R programming. His research and work mainly focus on the use of R in the fields of risk management and fraud detection, largely by modeling custom algorithms and developing interactive applications. Andrea has previously authored RStudio for R Statistical Computing Cookbook for Packt Publishing.

See other products by Andrea Cirillo