You're reading from System Design Guide for Software Professionals Build scalable solutions – from fundamental concepts to cracking top tech company interviews

Product type Paperback

Published in Aug 2024

Publisher Packt

ISBN-13 9781805124993

Length 384 pages

Edition 1st Edition

Concepts

Application Development

Authors (2):

Dhirendra Sinha

Tejas Chopra

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Foundations of System Design FREE CHAPTER

2. Chapter 1: Basics of System Design

3. Chapter 2: Distributed System Attributes

4. Chapter 3: Distributed Systems Theorems and Data Structures

5. Part 2: Core Components of Distributed Systems

6. Chapter 4: Distributed Systems Building Blocks: DNS, Load Balancers, and Application Gateways

7. Chapter 5: Design and Implementation of System Components –Databases and Storage

8. Chapter 6: Distributed Cache

9. Chapter 7: Pub/Sub and Distributed Queues

10. Part 3: System Design in Practice

11. Chapter 8: Design and Implementation of System Components: API, Security, and Metrics

12. Chapter 9: System Design – URL Shortener

13. Chapter 10: System Design – Proximity Service

14. Chapter 11: Designing a Service Like Twitter

15. Chapter 12: Designing a Service Like Instagram

16. Chapter 13: Designing a Service Like Google Docs

17. Chapter 14: Designing a Service Like Netflix

18. Chapter 15: Tips for Interviewees

19. Chapter 16: System Design Cheat Sheet

20. Index

HyperLogLog

HyperLogLog is a probabilistic algorithm that’s used for estimating the cardinality (or the number of distinct elements) of a set with very low memory usage. It was introduced by Philippe Flajolet and is particularly useful when dealing with large datasets or when memory efficiency is a concern. The HyperLogLog algorithm approximates the cardinality of a set by using a fixed amount of memory, regardless of the size of the set. It achieves this by exploiting the properties of hash functions and probabilistic counting.

The basic idea behind HyperLogLog is to hash each element of the set and determine the longest run of zeros in the binary representation of the hash values. The length of the longest run of zeros is used as an estimation of the cardinality. By averaging these estimations over multiple hash functions, a more accurate cardinality estimate can be obtained.

Let’s understand this by considering an example. The problem statement is, “We...