Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Learning Apache Apex Real-time streaming applications with Apex

Product type Paperback

Published in Nov 2017

Publisher

ISBN-13 9781788296403

Length 290 pages

Edition 1st Edition

Languages

Apex

Tools

Apache Apex

Concepts

Data Processing

Authors (5):

Munagala V. Ramanath

David Yan

Ananth Gundabattula

Thomas Weise

Kenneth Knowles

+1 more

View More author details

Table of Contents (11) Chapters

Preface

1. Introduction to Apex

2. Getting Started with Application Development FREE CHAPTER

3. The Apex Library

4. Scalability, Low Latency, and Performance

5. Fault Tolerance and Reliability

6. Example Project – Real-Time Aggregation and Visualization

7. Example Project – Real-Time Ride Service Data Processing

8. Example Project – ETL Using SQL

9. Introduction to Apache Beam

10. The Future of Stream Processing

Summary

This chapter has been a whirlwind tour regarding the core concepts of Apache Beam and how to run a basic WordCount pipeline using Apache Apex as a backend. Specifically, we looked at the following topics:

The technical vision of Beam—any language on any data processing engine
The main parallel processing patterns of Beam—ParDo and GroupByKey
The features of the Beam model that support unbounded data—windowing, watermarks, and triggers
A basic Beam pipeline to count occurrences of words
Launching a Beam pipeline using Apache Apex on a YARN cluster

For more details on both Beam and the Apex runner for Beam, visit the Beam website at https://beam.apache.org. Also, follow @ApacheBeam on Twitter and join our user mailing list at user@beam.apache.org by following the instructions at https://beam.apache.org/get-started/support/.

...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Gundabattula

Ananth is a senior application architect in the Decisioning and Advanced Analytics architecture team for Commonwealth Bank of Australia. Ananth holds a Ph.D degree in the domain of computer science security and is interested in all things data including low latency distributed processing systems, machine learning and data engineering domains. He holds 3 patents granted by USPTO and has one application pending. Prior to joining to CBA, he was an architect at Threatmetrix and the member of the core team that scaled Threatmetrix architecture to 100 million transactions per day that runs at very low latencies using Cassandra, Zookeeper and Kafka. He also migrated Threatmetrix data warehouse into the next generation architecture based on Hadoop and Impala. Prior to Threatmetrix, he worked for the IBM software labs and IBM CIO labs enabling some of the first IBM CIO projects onboarding HBase, Hadoop and Mahout stack. Ananth is a committer for Apache Apex and is currently working for the next generation architectures for CBA fraud platform and Advanced Analytics Omnia platform at CBA.

See other products by Gundabattula

Thomas Weise

Thomas Weise is the Apache Apex PMC Chair and cofounder at Atrato. Earlier, he worked at a number of other technology companies in the San Francisco Bay Area, including DataTorrent, where he was a cofounder of the Apex project. Thomas is also a committer to Apache Beam and has contributed to several more of the ecosystem projects. He has been working on distributed systems for 20 years and has been a speaker at international big data conferences. Thomas received the degree of Diplom-Informatiker (MSc in computer science) from TU Dresden, Germany. He can be reached on Twitter at: @thweise.

See other products by Thomas Weise

Munagala V. Ramanath

Dr. Munagala V. Ramanath got his PhD in Computer Science from the University of Wisconsin, USA and an MSc in Mathematics from Carleton University, Ottawa, Canada. After that, he taught Computer Science courses as Assistant/Associate Professor at the University of Western Ontario in Canada for a few years, before transitioning to the corporate sphere. Since then, he has worked as a senior software engineer at a number of technology companies in California including SeeBeyond, EMC, Sun Microsystems, DataTorrent, and Cloudera. He has published papers in peer reviewed journals in several areas including code optimization, graph theory, and image processing.

See other products by Munagala V. Ramanath

David Yan

David Yan is based in the Silicon Valley, California. He is a senior software engineer at Google. Prior to Google, he worked at DataTorrent, Yahoo!, and the Jet Propulsion Laboratory. David holds a master of science in Computer Science from Stanford University and a bachelor of science in Electrical Engineering and Computer Science from the University of California at Berkeley

See other products by David Yan

Kenneth Knowles

Kenneth Knowles is a founding PMC member of Apache Beam. Kenn has been working on Google Cloud Dataflow—Google's Beam backend—since 2014. Prior to that, he built backends for startups such as Cityspan, Inkling, and Dimagi. Kenn holds a PhD in Programming Language Theory from the University of California, Santa Cruz.

See other products by Kenneth Knowles