Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Apache Cassandra 3.x An expert guide to improving database scalability and availability without compromising performance

Product type Paperback

Published in Oct 2018

Publisher Packt

ISBN-13 9781789131499

Length 348 pages

Edition 3rd Edition

Languages

Java

Tools

Cassandra

Concepts

Database Programming

Authors (2):

Aaron Ploetz

Tejaswi Malepati

View More author details

Table of Contents (12) Chapters

Preface

1. Quick Start FREE CHAPTER

2. Cassandra Architecture

3. Effective CQL

4. Configuring a Cluster

5. Performance Tuning

6. Managing a Cluster

7. Monitoring

8. Application Development

9. Integration with Apache Spark

10. References

11. Other Books You May Enjoy

Leave a review - let other readers know what you think

PYSpark through Juypter

If Spark is already installed on the machine and SPARK_HOME is set, then the findspark pip package will get information related to the installed Spark. It will then connect Jupyter to the Spark installation with this package, which needs to be installed as follows:

pip install findspark

Otherwise, pip would not have the PySpark package installed by default. Hence, for using PySpark through Jupyter, it is mandatory to install it with the following command:

pip install pyspark

For example, a business wants to know the total number of orders counted by user. As Cassandra doesn't have an aggregation ability, Spark gives us the ability to do all of the required transformation along with sorting for a cleaner report. Setting a custom Spark and Cassandra config after startup to Jupyter Notebook is done as follows:

import os
import sys
import findspark

findspark...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Malepati

Tejaswi Malepati is the Cassandra Tech Lead for Target. He has been instrumental in designing and building custom Cassandra integrations, including a web-based SQL interface and data validation frameworks between Oracle and Cassandra. Tejaswi earned a master's degree in computer science from the University of New Mexico, and a bachelor's degree in electronics and communication from Jawaharlal Nehru Technological University in India. He is passionate about identifying and analyzing data patterns in datasets using R, Python, Spark, Cassandra, and MySQL.

See other products by Malepati

Ploetz

Aaron Ploetz is the NoSQL Engineering Lead for Target, where his DevOps team supports Cassandra, MongoDB, and Neo4j. He has been named a DataStax MVP for Apache Cassandra three times and has presented at multiple events, including the DataStax Summit and Data Day Texas. Aaron earned a BS in Management/Computer Systems from the University of Wisconsin-Whitewater, and an MS in Software Engineering from Regis University. He and his wife, Coriene, live with their three children in the Twin Cities area.

See other products by Ploetz