Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Introduction to R for Business Intelligence

You're reading from   Introduction to R for Business Intelligence Profit optimization using data mining, data analysis, and Business Intelligence

Arrow left icon
Product type Paperback
Published in Aug 2016
Publisher Packt
ISBN-13 9781785280252
Length 228 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Jay Gendron Jay Gendron
Author Profile Icon Jay Gendron
Jay Gendron
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Extract, Transform, and Load FREE CHAPTER 2. Data Cleaning 3. Exploratory Data Analysis 4. Linear Regression for Business 5. Data Mining with Cluster Analysis 6. Time Series Analysis 7. Visualizing the Datas Story 8. Web Dashboards with Shiny A. References
B. Other Helpful R Functions C. R Packages Used in the Book
D. R Code for Supporting Market Segment Business Case Calculations

Understanding big data in BI analytics

Before we begin describing the ETL process, consider its importance in business intelligence. CIO Magazine provides a popular and useful definition of BI (Mulcahy, 2007):

"Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization's raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting."

Mulcahy captures the essence of this book, which presents solutions in R to walk you through the steps from data analytic techniques to communicating your results. The purpose of BI applications has changed over the last decade as big data challenges affect the business world in ways first experienced in the sciences decades ago.

You can find the term big data in many business settings. It appears in advertisements for boot camps, draws attendees to conferences, and perplexes business leaders. Arguably, the term is ill-defined. A 1998 presentation given by John Mashey, then the Chief Scientist of Silicon Graphics, is often cited as the document that introduced the term (Press, 2013). The impact of big data on business is undeniable, despite its elusive meaning. There is a general agreement on the following three characteristics of big data, called the 3Vs:

  • Volume: The size of datasets has grown from megabytes to petabytes
  • Velocity: The speed of data arrival has changed to near real time
  • Variety: The sources of data have grown from structured databases to unstructured ones, such as social media, websites, audio, and video

Together these three characteristics pose a growing challenge to the business community. Data is stored in facilities across a vast network of local servers or relational databases. Virtual software access it with cloud-based applications. BI applications have typically included static dashboards based on fixed measures using structured data. Big data changes the business by affording a competitive advantage to those who can extract value from the large and rapidly changing sources of diverse data.

Today, people ask business analysts, what is going to happen? To answer this type of question, a business needs tools and processes to tap into the growing stream of data. Often this data will not fit into the existing databases without transformation. The continual need to acquire data requires a structured ETL approach to wrangle the unstructured nature of modern data. As you read this chapter, think about how companies may benefit from using the techniques presented, even when they are less complex than big data.

Note

Use case: Bike Sharing, LLC

You will begin your exploration of BI and analytics through the lens of a fictional business called Bike Sharing, LLC. The company operates and maintains a fleet of publically rental bikes in the Washington D.C. metropolitan area. Their customers are typically from the urban area, including people from business, government, and universities. Customers enjoy the convenience of finding bikes easily within a network of bike-sharing stations throughout the city. Renters may rent a bicycle at one location and leave it at another station.Bike Sharing, LLC started operations in 2011, and has enjoyed continued growth. They quickly established a BI group to keep track of the data collected about transactions, customers, and factors related to rentals, such as weather, holidays, and times of day. In 2014, they began to understand how they might use open source datasets to guide decisions regarding sales, operations, and advertising. In 2015, they expanded their BI talent pool with business analysts experienced with R and statistical methods that could use Bike Sharing data in new ways.

You joined Bike Sharing just a few months ago. You have a basic understanding of R from the many courses and tutorials that you used to expand your skills. You are working with a good group that has a diverse skillset, including programming, databases, and business knowledge. The first data you have been given is bike rental data covering the two-year period from Jan 1, 2011 to Dec 31, 2012 (Kaggle, 2014). You can download this same Ch1_bike_sharing_data.csv file from the book's website at http://jgendron.github.io/com.packtpub.intro.r.bi/.

Data sources often include a data dictionary to help new users understand the contents and coding of the data. Data Dictionary for Bike Sharing Data (Kaggle, 2014):

  • datetime: Hourly date + timestamp
  • season: 1 = spring, 2 = summer, 3 = fall, 4 = winter
  • holiday: Whether the day is considered a holiday
  • workingday: Whether the day is neither a weekend nor holiday
  • weather:
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pellets + Thunderstorm + Mist, Snow + Fog

  • temp: Temperature in Celsius
  • atemp: Feels like temperature in Celsius
  • humidity: Relative humidity
  • windspeed: Wind speed
  • casual: Number of non-registered user rentals initiated
  • registered: Number of registered user rentals initiated
  • count: Number of total rentals

One of your goals is to strengthen your ETL skills. In this use case, you will learn common extraction, transformation, and loading skills to store a dataset in a file for analysis. Welcome to the Bike Sharing team.

You have been reading a chapter from
Introduction to R for Business Intelligence
Published in: Aug 2016
Publisher: Packt
ISBN-13: 9781785280252
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image