Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Getting Started with Hazelcast, Second Edition
Getting Started with Hazelcast, Second Edition

Getting Started with Hazelcast, Second Edition: Get acquainted with the highly scalable data grid, Hazelcast, and learn how to bring its powerful in-memory features into your application

eBook
$9.99 $29.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Getting Started with Hazelcast, Second Edition

Chapter 1. What is Hazelcast?

Most, if not all, applications need to store some data—some applications far more than others. By holding this book in your hands and eagerly flipping through its pages, it might be safe to assume that you have previously architected, developed, or supported applications more towards the latter end of that scale. We can also take it as given that you are all too painfully familiar with the common pitfalls and issues that tend to crop up when scaling or distributing a data layer. However, to make sure that we are all up to speed, in this chapter, we shall examine the following topics:

  • Traditional approaches towards data persistence
  • How caches help improve performance but come with their own problems
  • Hazelcast's fresh approach towards the problem
  • A brief overview of the generic capabilities of Hazelcast
  • The type of problems that we can solve using Hazelcast

Starting out as usual

In most modern software systems, data is key. In traditional architectures, the role of persisting and providing access to your system's data tends to fall to a relational database. Typically, this is a monolithic beast, perhaps with a degree of replication. However, this tends to be more for resilience rather than performance or load distribution.

For example, here is what a traditional architecture might look like (which hopefully looks rather familiar):

Starting out as usual

This presents us with an issue in terms of application scalability in that it is relatively easy to scale our application layer by throwing more hardware to increase the processing capacity. However, the monolithic constraints of the data layer will only allow us to go so far before diminishing returns or resource saturation stunts further performance increases. So, what can we do to address this?

In the past and in legacy architectures, the only solution to this issue would be to potentially increase the performance capability of the database infrastructure by either buying a bigger, faster server, or further tweaking and fettling the utilization of the available resources. Both options are dramatic, either in terms of financial cost and/or manpower. So, what else could we do?

Data deciding to hang around

In order to improve the performance of the existing setup, we can hold copies of our data away from the primary database and use them in preference wherever possible. There are a number of different strategies that we can adopt, from transparent second-level caching layers to external key-value object storage. The details and exact use of each varies significantly, depending on the technology or its place in the architecture, but the main aim of these systems is to sit alongside the primary database infrastructure and attempt to protect it from excessive load. This would then likely lead to an improved performance of the primary database by reducing the overall dependency on it.

However, this strategy tends to be only particularly valuable as a short-term solution, effectively buying us a little more time before the database once again starts to reach its saturation point. The other downside is that it only protects the database from read-based load. If an application is predominately write-heavy, this strategy has very little to offer.

So, the expanded architecture looks like the following figure:

Data deciding to hang around

Therein lies the problem

By insulating the database from the read load, a problem is introduced in the form of a cache consistency issue. How does the local data cache deal with data changing underneath it within the primary database? The answer is rather disappointing—it can't! The manifestation of issues will largely depend on the data needs of the application and how frequently the data changes, but typically, caching systems will operate in one of the following two modes to combat the problem:

  • Time-bound cache: This holds entries for a defined period (time-to-live, popularly abbreviated as TTL)
  • Write-through cache: This holds entries until they are invalidated by subsequent updates

Time-bound caches almost always have consistency issues, but at least the amount of time that the issue would be present is limited to the expiry time of each entry. However, we must consider the application's access to this data, because if the frequency of accessing a particular entry is less than its cache expiry time, the cache provides no real benefit.

Write-through caches are consistent in isolation and can be configured to offer strict consistency, but if multiple write-through caches exist within the overall architecture, then there will be consistency issues between them. We can avoid this by having an intelligent cache that features a communication mechanism between the nodes that can propagate entry invalidations to the other nodes.

In practice, an ideal cache will feature a combination of both the features so that the entries will be held for a known maximum time, but will also pass around invalidations as changes are made.

So, the evolved architecture will look like the following figure:

Therein lies the problem

So far, we had a look at the general issues in scaling a data layer and introduced strategies to help combat the trade-offs that we will encounter along the way. However, the real world isn't this simple. There are various cache servers and in-memory database products in this area (such as memcached or Ehcache). However, most of these are stand-alone single instances, perhaps with some degree of distribution bolted on or provided by other supporting technologies. This tends to bring about the same issues that we experienced with the primary database in that we may encounter resource saturation or capacity issues if the product is a single instance or the distribution doesn't provide consistency control, perhaps inconsistent data, which might harm our application.

Breaking the mould

Hazelcast is a radical, new approach towards data that was designed from the ground up around distribution. It embraces a new, scalable way of thinking in that data should be shared for resilience and performance while allowing us to configure the trade-offs surrounding consistency, as the data requirements dictate.

The first major feature of Hazelcast is its masterless nature. Each node is configured to be functionally the same and operates in a peer-to-peer manner. The oldest node in the cluster is the de facto leader. This node manages the membership by automatically making decisions as to which node is responsible for which data. In this way, as the new nodes join in or drop out, the process is repeated and the cluster rebalances accordingly. This makes it incredibly simple to get Hazelcast up and running, as the system is self-discovering, self-clustering, and works straight out of the box.

However, the second feature of Hazelcast that you should remember is that we are persisting data entirely in-memory. This makes it incredibly fast, but this speed comes at a price. When a node is shut down, all the data that was held by it is lost. We combat this risk to resilience through replication; by holding a number of copies of a piece of data across multiple nodes. In the event of failure, the overall cluster will not suffer any data loss. By default, the standard backup count is 1 so that we can immediately enjoy basic resilience. However, don't pull the plug on more than one node at a time until the cluster has reacted to the change in membership and reestablished the appropriate number of backup copies of data.

So, when we introduce our new peer-to-peer distributed cluster, we get something that looks like the following figure:

Breaking the mould

Note

A distributed cache is by far the most powerful as it can scale up in response to changes to the application's needs.

We previously identified that multi-node caches tend to suffer from either saturation or consistency issues. In the case of Hazelcast, each node is the owner. Hence, responsible for a number of subset partitions of the overall data, so the load will be fairly spread across the cluster. Therefore, any saturation that exists will be at the cluster level rather than in any individual node. We can address this issue simply by adding more nodes. In terms of consistency, the backup copies of the data are internal to Hazelcast by default and are not directly used. Thus, we enjoy strict consistency. This does mean that we have to interact with a specific node to retrieve or update a particular piece of data. However, exactly which node that is an internal operational detail and can vary over time. We, as developers, actually never need to know.

It is obvious that Hazelcast is not trying to entirely replace the role of a primary database. Its focus and feature set do differ from that of the primary database (which has more transactionally stateful capabilities, long term persistent storage, and so on). However the more the data and processes we master within Hazelcast, the less dependant we become on this constrained resource. Thus, we remove the potential need to change the underlying database systems.

If you imagine the scenario where the data is split into a number of partitions, and each partition slice is owned by a node and backed up on another, the interactions will look like the following figure:

Breaking the mould

This means that for data belonging to Partition 1, our application will have to communicate to Node 1, Node 2 for data belonging to Partition 2, and so on. The slicing of the data into each partition is dynamic. So, in practice, there are typically more partitions than nodes, hence each node will own a number of different partitions and hold backups for the number of others. As mentioned before, this is an internal operational detail and our application does not need to know it. However, it is important that we understand what is going on behind the scenes.

Moving to new ground

So far, we have talked mostly about simple persisted data and caches, but in reality, we should not think of Hazelcast as purely a cache. It is much more powerful than just that. It is an in-memory data grid that supports a number of distributed collections, processors, and features. We can load the data from various sources into differing structures, send messages across the cluster, perform analytical processing on the stored data, take out locks to guard against concurrent activity, and listen to the goings-on inside the workings of the cluster. Most of these implementations correspond to a standard Java collection or function in a manner that is comparable to other similar technologies. However, in Hazelcast, the distribution and resilience capabilities are already built in.

  • Standard utility collections:
    • Map: Key-value pairs
    • List: A collection of objects
    • Set: Non-duplicated collection
    • Queue: Offer/poll FIFO collection
  • Specialized collection:
    • Multi-Map: Key–collection pairs
  • Lock: Cluster wide mutex
  • Topic: Publish and subscribe messaging
  • Concurrency utilities:
    • AtomicNumber: Cluster-wide atomic counter
    • IdGenerator: Cluster-wide unique identifier generation
    • Semaphore: Concurrency limitation
    • CountdownLatch: Concurrent activity gatekeeping
  • Listeners: This notifies the application as things happen

Playing around with our data

In addition to data storage collections, Hazelcast also features a distributed executor service that allows runnable tasks to be created. These tasks can be run anywhere on the cluster to obtain, manipulate, and store results. We can have a number of collections that contain source data, spin up tasks to process the disparate data (for example, averaging or aggregating), and outputs the results into another collection for consumption.

However, more recently, along with this general-purpose capability, Hazelcast has introduced a few extra ways that allow us to directly interact with data. The MapReduce functionality allows us to build data-centric tasks to search, filter, and process held data to find potential insights within it. You may have heard of this functionality before, but this extracting of value from raw data is at the heart of what big data is all about (forgive the excessive buzzword cliché). While MapReduce focuses more on generating additional information, the EntryProcessor interface enables us to quickly and safely manipulate data in-place throughout the cluster—on single entries and whole collections or even selectively based on a search criteria.

Again, just as we can scale up the data capacities by adding more nodes, we can also increase the processing capacity in exactly the same way. This essentially means that by building a data layer around Hazelcast, if our application's needs rapidly increase, we can continuously increase the number of nodes to satisfy the seemingly extensive demands, all without having to redesign or rearchitect the actual application.

Summary

With Hazelcast, we are dealing more with a technology than a server product; it is a distributed core library to build a system around rather than trying to retrospectively bolt on distribution, or blindly connecting to an off-the-shelf commercial system. While it is possible (and in some simple cases, quite practical) to run Hazelcast as a separate server-like cluster and connect to it remotely from our application, some of the greatest benefits come when we develop our own classes and tasks run within and alongside it.

With such a large range of generic capabilities, there is an entire world of problems that Hazelcast can help us solve. We can use the technology in many ways. We can use it in isolation to hold data such as user sessions, run it alongside a more long-term persistent data store to increase capacity, or shift towards performing high performance and scalable operations on our data. By moving more and more responsibility from monolithic systems to such a generic, scalable one, there is no limit to the performance that we can unlock.

This technology not only helps us to keep the application and data layers separate, but also enables us to scale them up independently as our application grows. This will prevent our application from becoming a victim of its own success while taking the world by storm.

In the next chapter, we shall start using the technology and investigate the data collections that we discovered in this chapter.

Left arrow icon Right arrow icon

Description

This book is a great introduction for Java developers, software architects, or DevOps looking to enable scalable and agile data within their applications. Providing in-memory object storage, cluster-wide state and messaging, or even scalable task execution, Hazelcast helps solve a number of issues that have troubled technologists for years.

Who is this book for?

This book is a great introduction for Java developers, software architects, or DevOps looking to enable scalable and agile data within their applications. Providing in-memory object storage, cluster-wide state and messaging, or even scalable task execution, Hazelcast helps solve a number of issues that have troubled technologists for years.

What you will learn

  • Learn and store numerous data types in different distributed collections
  • Set up a cluster from the ground up
  • Work with truly distributed queues and topics for clusterwide messaging
  • Make your application more resilient by listening into cluster internals
  • Run tasks within and alongside our stored data
  • Filter and search our data using MapReduce jobs
  • Discover the new JCache standard and one of its first implementations

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 30, 2015
Length: 162 pages
Edition : 1st
Language : English
ISBN-13 : 9781783554058
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 30, 2015
Length: 162 pages
Edition : 1st
Language : English
ISBN-13 : 9781783554058
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 120.97
Getting Started with Hazelcast, Second Edition
$38.99
Blueprints Visual Scripting for Unreal engine
$32.99
Learning Reactive Programming With Java 8
$48.99
Total $ 120.97 Stars icon
Banner background image

Table of Contents

13 Chapters
1. What is Hazelcast? Chevron down icon Chevron up icon
2. Getting off the Ground Chevron down icon Chevron up icon
3. Going Concurrent Chevron down icon Chevron up icon
4. Divide and Conquer Chevron down icon Chevron up icon
5. Listening Out Chevron down icon Chevron up icon
6. Spreading the Load Chevron down icon Chevron up icon
7. Gathering Results Chevron down icon Chevron up icon
8. Typical Deployments Chevron down icon Chevron up icon
9. From the Outside Looking In Chevron down icon Chevron up icon
10. Going Global Chevron down icon Chevron up icon
11. Playing Well with Others Chevron down icon Chevron up icon
A. Configuration Summary Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.3
(3 Ratings)
5 star 33.3%
4 star 33.3%
3 star 0%
2 star 0%
1 star 33.3%
Chris Everett Sep 08, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a great introduction for Java developers, software architects, or DevOps looking to enable scalable and agile data within their applications. Hands-on examples to progressively walk you through all the powerful features and capabilities on offer. Book also covers distributed task execution, in-place data manipulations and big data analytical processing using MapReduce.
Amazon Verified review Amazon
Dustin Marx Aug 22, 2015
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The second edition of "Getting Started with Hazelcast" (2015) adds three new chapters that cover trendy topics such as Big Data; Cloud; use of Hazelcast with popular frameworks such as Spring and Hibernate, and JMX, and how to use Hazelcast as an implementation of JCache (JSR 107). The book's title is apropos as the book is focused on "getting started with Hazelcast." The book serves as a nice complement for the Hazelcast Documentation (Reference Manual) and "Mastering Hazelcast" (a download available from Hazelcast for those who provide their name and email address)."Getting Started with Hazelcast" provides a narrative of what makes a technology like Hazelcast desirable and how Hazelcast meets different needs. In this way, the book helps familiarize the reader with concepts and vernacular of Hazelcast. Greater details on Hazelcast can be found online in forums and blogs and in "Mastering Hazelcast." "Getting Started with Hazelcast" does not cover the differences between standard Hazelcast and Hazelcast Enterprise.
Amazon Verified review Amazon
kalyan Nov 17, 2017
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
This book does not cover all topics , neither any topic in detail.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.