Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Data Science with R

You're reading from   Hands-On Data Science with R Techniques to perform data manipulation and mining to build smart analytical models using R

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781789139402
Length 420 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (4):
Arrow left icon
Nataraj Dasgupta Nataraj Dasgupta
Author Profile Icon Nataraj Dasgupta
Nataraj Dasgupta
Vitor Bianchi Lanzetta Vitor Bianchi Lanzetta
Author Profile Icon Vitor Bianchi Lanzetta
Vitor Bianchi Lanzetta
Doug Ortiz Doug Ortiz
Author Profile Icon Doug Ortiz
Doug Ortiz
Ricardo Anjoleto Farias Ricardo Anjoleto Farias
Author Profile Icon Ricardo Anjoleto Farias
Ricardo Anjoleto Farias
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Getting Started with Data Science and R FREE CHAPTER 2. Descriptive and Inferential Statistics 3. Data Wrangling with R 4. KDD, Data Mining, and Text Mining 5. Data Analysis with R 6. Machine Learning with R 7. Forecasting and ML App with R 8. Neural Networks and Deep Learning 9. Markovian in R 10. Visualizing Data 11. Going to Production with R 12. Large Scale Data Analytics with Hadoop 13. R on Cloud 14. The Road Ahead 15. Other Books You May Enjoy

Active domains of data science

Data science plays a role in virtually all aspects of our day-to-day lives and is used across nearly all industries. The adoption of data science was largely spurred by the successes of start-ups such as Uber, Airbnb, and Facebook that rose rapidly and earned valuations of billions of dollars in a very short span of time.

Data generated by social media networks such as Facebook and Twitter, search engines such as Google and Yahoo!, and various other networks, such as Pinterest and Instagram led to a deluge of information about personal tastes, preferences, and habits of individuals. Companies leveraged the information using various machine learning techniques to gain insights.

For example, Natural Language Processing (NLP) is a machine learning technique used to analyse textual data on comments posted on public forums to extract users' interests. The users are then shown ads relevant to their interests generating sales from which companies earn ad revenue. Image recognition algorithms are utilized to automatically identify objects in an image and serve the relevant images when users search for those objects on search engines.

The use of data science as a means to not only increase user engagement but also increase revenue, has become a widespread phenomenon. Some of the domains in which data science is prevalent is given as follows. The list is not all-inclusive, but highlights some of the key industries in which data science plays an important role today:

A few of these domains have been discussed in the following sections.

Finance

Data science has been used in finance, especially in trading for many decades. Investment banks, especially trading desks, have employed complex models to analyse and make trading decisions. Some examples of data science as used in finance include:

  • Credit risk management: Analyse the creditworthiness of a user by analyzing the historical financial records, assets, and transactions of the user
  • Loan fraud: Identifying applications for credit or loans that may be fraudulent by analyzing the loan and applicant's characteristics
  • Market Basket Analysis: Understanding the correlation among stocks and other securities and formulating trading and hedging strategies
  • High-frequency trading: Analyzing trades and quotes to discover pricing inefficiencies and arbitrage opportunities

Healthcare

Healthcare and related fields such as pharmaceuticals and life sciences, have also seen a gradual rise in the adoption and use of machine learning. A leading example has been IBM Watson. Developed in late 2000s, IBM Watson rose to popularity after it won the Double Jeopardy, a popular quiz contest in the US in 2011. Today, IBM Watson is being used for clinical research and several institutions have published preliminary results of success. (Source: http://www.ascopost.com/issues/june-25-2017/how-watson-for-oncology-is-advancing-personalized-patient-care/). The primary impediment to wider adoption has been the extremely high cost of using the system with usually an uncertain return on investment. Companies that are generally well capitalized can invest in the technology.

More common uses of data science in healthcare include:

  • Epidemiology: Preventing the spread of diseases and other epidemiology related use cases are being solved with various machine learning techniques. A recent example of the use of clustering to detect the Ebola outbreak received attention, being one of the first times that machine learning was used in a medical use case very effectively. (Source: https://spectrum.ieee.org/tech-talk/biomedical/diagnostics/healthmap-algorithm-ebola-outbreak).
  • Health insurance fraud detection: The health insurance industry loses billions each year in the US due to fraudulent claims for insurance. Machine learning, and more generally, data science is being used to detect cases of fraud and reduce the loss incurred by leading health insurance firms. (Source: https://www.sciencedirect.com/science/article/pii/S1877042812036099).
  • Recommender engines: Algorithms that match patients with physicians are used to provide recommendations based on the patients' symptoms and doctor specialties.
  • Image recognition: Arguably, the most common use of data science in healthcare, image recognition algorithms are used for a variety of cases ranging from segmentation of malignant and non-malignant tumours to cell segmentation. (Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3159221/).

Pharmaceuticals

Although closely linked to the data science use cases in healthcare, data science use cases in pharma are geared toward the development of drugs, physician marketing, and treatment-related analysis. Examples of data science in pharma include the following:

Government

Data science is used by state and national governments for a wide range of uses. These include topics in cyber security, voter benefits, climate change, social causes, and other similar use cases that are geared toward public policy and public benefits.

Some examples include the following:

  • Climate change: One of the most popular topics among climate change proponents, there is extensive machine learning related work that is being conducted around the globe to detect and understand the causes of climate change. (Source: https://toolkit.climate.gov).
  • Cyber security: The use of extremely advanced machine learning techniques for national cyber security is evident and well known all over the world, ever since such practices were disclosed by consultants at security firms a few years back. Security-related organizations employ some of the most advanced hardware and software stacks for detecting cyber threats and prevent hacking attempts. (Source: https://www.csoonline.com/article/2942083/big-data-security/cybersecurity-is-the-killer-app-for-big-data-analytics.html).
  • Social causes: The use of data science for a wide range of use cases geared toward social good is well known due to several conferences and papers that have been organized and released respectively on the topic. Examples include topics in urban analytics, power grids utilizing smart meters, criminal justice. (Source: https://dssg.uchicago.edu/data-science-for-social-good-conference-2017/agenda/).

Manufacturing and retail

The manufacturing and retail industry has used data science to designing better products, optimize pricing, and design strategic marketing techniques. Some examples include the following:

Web industry

One of the earliest beneficiaries of data science was the web industry. Empowered by the collection of user-specific data from social networks, firms around the world employ algorithms to understand user behavior and generate targeted ads. Google, one of the earliest proponents of targeted ad marketing today, earns most of its revenue from ads, more than $95 billion in 2017. (Source: https://www.statista.com/statistics/266249/advertising-revenue-of-google/). The use of data science for web-related businesses is ubiquitous today and companies such as Uber, Airbnb, Netflix, and Amazon have successfully navigated and made full use of this complex ecosystem, generating not only huge profits but also added millions of new jobs directly or indirectly as a result.

  • Targeted ads: Click through ads have been one of the prime areas of machine learning. By reading cookies saved on users' computers from various sites, other sites can assess the users interests and accordingly decide which ads to serve when they visit new sites. As per online sources, the value of internet advertising is over $1 trillion and has generated over 10 million jobs in 2017 alone. (Source: https://www.iab.com/insights/economic-value-advertising-supported-internet-ecosystem/).
  • Recommender engines: Netflix, Pandora, and other movies and audio streaming services utilize recommender engines to understand which movies or music the viewer or listener would be interested in and make recommendations. The recommendations are often based on what other users with similar tastes might have already seen and leverage recommender algorithms such as collaborative, content-based, and hybrid filtering.
  • Web design: Using A/B testing, mouse tracking, and other sophisticated techniques, web developers leverage data science to design better web pages such as landing pages and in general websites. A/B testing for instance allows developers to decide between different versions of the same web page and deploy accordingly.

Other industries

There are various other industries today that benefit from data science and as such, it has become so common that it would be impractical to list all, but at a high level, some of the others include the following:

  • Oil and natural gas for oil production
  • Meteorology for understanding weather patterns
  • Space research for detecting and/or analyzing stars and galaxies
  • Utilities for energy production and energy savings
  • Biotechnology for research and finding new cures for diseases

In general, since data science, or machine learning algorithms are not specific to any particular industry, it is entirely possible to apply algorithms to creative use cases and derive business benefits.

You have been reading a chapter from
Hands-On Data Science with R
Published in: Nov 2018
Publisher: Packt
ISBN-13: 9781789139402
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime