Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Filter Results

Sort By

Arrow up

Product Type

Arrow up
(153)
(29)
(5)

Features

Arrow up
(177)

Publication Status

Arrow up
(173)
(7)
(2)
(173)

Tech Category

Arrow up
(138)
(16)
(15)
(9)
(3)
(2)

Concept

Arrow up
(34)
(28)
(18)
(11)
(10)
(10)
show more

Tool

Arrow up
(12)
(12)
(11)
(11)
(3)
(3)
show more

Language

Arrow up
(73)
(11)
(5)
(4)
(4)
(4)
show more

Published Year

Arrow up
(7)
(59)
(31)
(24)
(14)
(13)
show more

Publisher

Arrow up
(1)
(1)
(1)

Search Results for 'Data Engineering' (182 products)

sort Bestselling
More Product Details Close
Data Engineering with Databricks Cookbook
Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
Description
Written by a Senior Solutions Architect at Databricks, Data Engineering with Databricks Cookbook will show you how to effectively use Apache Spark, Delta Lake, and Databricks for data engineering, starting with comprehensive introduction to data ingestion and loading with Apache Spark. What makes this book unique is its recipe-based approach, which will help you put your knowledge to use straight away and tackle common problems. You’ll be introduced to various data manipulation and data transformation solutions that can be applied to data, find out how to manage and optimize Delta tables, and get to grips with ingesting and processing streaming data. The book will also show you how to improve the performance problems of Apache Spark apps and Delta Lake. Advanced recipes later in the book will teach you how to use Databricks to implement DataOps and DevOps practices, as well as how to orchestrate and schedule data pipelines using Databricks Workflows. You’ll also go through the full process of setup and configuration of the Unity Catalog for data governance. By the end of this book, you’ll be well-versed in building reliable and scalable data pipelines using modern data engineering technologies.
Read more
May 2024 438 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Pulkit Chadha
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Google Cloud Platform
A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Description
The second edition of Data Engineering with Google Cloud builds upon the success of the first edition by offering enhanced clarity and depth to data professionals navigating the intricate landscape of data engineering. Beyond its foundational lessons, this new edition delves into the essential realm of data governance within Google Cloud, providing you with invaluable insights into managing and optimizing data resources effectively. Written by a Data Strategic Cloud Engineer at Google, this book helps you stay ahead of the curve by guiding you through the latest technological advancements in the Google Cloud ecosystem. You’ll cover essential aspects, from exploring Cloud Composer 2 to the evolution of Airflow 2.5. Additionally, you’ll explore how to work with cutting-edge tools like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream to perform data governance on datasets. By the end of this book, you'll be equipped to navigate the ever-evolving world of data engineering on Google Cloud, from foundational principles to cutting-edge practices.
Read more
Apr 2024 476 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Adi Wijaya
Purchase Options
eBook $22.99 $33.99
Paperback $41.99
VIEW PRODUCT
More Product Details Close
Data Engineering with AWS
Acquire the skills to design and build AWS-based data transformation pipelines like a pro
Description
This book, authored by a seasoned Senior Data Architect with 25 years of experience, aims to help you achieve proficiency in using the AWS ecosystem for data engineering. This revised edition provides updates in every chapter to cover the latest AWS services and features, takes a refreshed look at data governance, and includes a brand-new section on building modern data platforms which covers; implementing a data mesh approach, open-table formats (such as Apache Iceberg), and using DataOps for automation and observability. You'll begin by reviewing the key concepts and essential AWS tools in a data engineer's toolkit and getting acquainted with modern data management approaches. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how that transformed data is used by various data consumers. You’ll learn how to ensure strong data governance, and about populating data marts and data warehouses along with how a data lakehouse fits into the picture. After that, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. Then, you'll explore how the power of machine learning and artificial intelligence can be used to draw new insights from data. In the final chapters, you'll discover transactional data lakes, data meshes, and how to build a cutting-edge data platform on AWS. By the end of this AWS book, you'll be able to execute data engineering tasks and implement a data pipeline on AWS like a pro!
Read more
Oct 2023 636 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Gareth Eagar
Purchase Options
eBook $28.99 $41.99
Paperback $51.99
VIEW PRODUCT
More Product Details Close
Data Engineering Best Practices
Architect robust and cost-effective data solutions in the cloud era
Description
Revolutionize your approach to data processing in the fast-paced business landscape with this essential guide to data engineering. Discover the power of scalable, efficient, and secure data solutions through expert guidance on data engineering principles and techniques. Written by two industry experts with over 60 years of combined experience, it offers deep insights into best practices, architecture, agile processes, and cloud-based pipelines. You’ll start by defining the challenges data engineers face and understand how this agile and future-proof comprehensive data solution architecture addresses them. As you explore the extensive toolkit, mastering the capabilities of various instruments, you’ll gain the knowledge needed for independent research. Covering everything you need, right from data engineering fundamentals, the guide uses real-world examples to illustrate potential solutions. It elevates your skills to architect scalable data systems, implement agile development processes, and design cloud-based data pipelines. The book further equips you with the knowledge to harness serverless computing and microservices to build resilient data applications. By the end, you'll be armed with the expertise to design and deliver high-performance data engineering solutions that are not only robust, efficient, and secure but also future-ready.
Read more
Oct 2024 550 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Richard J. Schiller
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Data Engineering with dbt
A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL
Description
dbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps. This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work. By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.
Read more
Jun 2023 578 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Zagni
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Python
Work with massive datasets to design data models and automate data pipelines using Python
Description
Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines. By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.
Read more
Oct 2020 356 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Paul Crickard
Purchase Options
eBook $28.99 $41.99
Paperback $51.99
VIEW PRODUCT
More Product Details Close
Cracking the Data Engineering Interview
Land your dream job with the help of resume-building tips, over 100 mock questions, and a unique portfolio
Description
Preparing for a data engineering interview can often get overwhelming due to the abundance of tools and technologies, leaving you struggling to prioritize which ones to focus on. This hands-on guide provides you with the essential foundational and advanced knowledge needed to simplify your learning journey. The book begins by helping you gain a clear understanding of the nature of data engineering and how it differs from organization to organization. As you progress through the chapters, you’ll receive expert advice, practical tips, and real-world insights on everything from creating a resume and cover letter to networking and negotiating your salary. The chapters also offer refresher training on data engineering essentials, including data modeling, database architecture, ETL processes, data warehousing, cloud computing, big data, and machine learning. As you advance, you’ll gain a holistic view by exploring continuous integration/continuous development (CI/CD), data security, and privacy. Finally, the book will help you practice case studies, mock interviews, as well as behavioral questions. By the end of this book, you will have a clear understanding of what is required to succeed in an interview for a data engineering role.
Read more
Nov 2023 196 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Bryan
Purchase Options
eBook $15.99 $23.99
Paperback $29.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way
Description
In the world of ever-changing data and schemas, it is important to build data pipelines that can auto-adjust to changes. This book will help you build scalable data platforms that managers, data scientists, and data analysts can rely on. Starting with an introduction to data engineering, along with its key concepts and architectures, this book will show you how to use Microsoft Azure Cloud services effectively for data engineering. You'll cover data lake design patterns and the different stages through which the data needs to flow in a typical data lake. Once you've explored the main features of Delta Lake to build data lakes with fast performance and governance in mind, you'll advance to implementing the lambda architecture using Delta Lake. Packed with practical examples and code snippets, this book takes you through real-world examples based on production scenarios faced by the author in his 10 years of experience working with big data. Finally, you'll cover data lake deployment strategies that play an important role in provisioning the cloud resources and deploying the data pipelines in a repeatable and continuous way. By the end of this data engineering book, you'll know how to effectively deal with ever-changing data and create scalable data pipelines to streamline data science, ML, and artificial intelligence (AI) tasks.
Read more
Oct 2021 480 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Kukreja
Purchase Options
eBook $27.98 $39.99
Paperback $48.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Scala and Spark
Build streaming and batch pipelines that process massive amounts of data using Scala
Description
Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount. This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.
Read more
Jan 2024 300 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Eric Tome
Purchase Options
eBook $20.98 $29.99
Paperback $36.99
VIEW PRODUCT
More Product Details Close
Simplifying Data Engineering and Analytics with Delta
Create analytics-ready data that fuels artificial intelligence and business intelligence
Description
Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases. In this book, you’ll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You’ll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you’ll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products. By the end of this Delta book, you’ll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.
Read more
Jul 2022 334 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Anindita Mahapatra
Purchase Options
eBook $25.99 $37.99
Paperback $46.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Google Cloud Platform
A practical guide to operationalizing scalable data analytics systems on GCP
Description
With this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.
Read more
Mar 2022 440 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Adi Wijaya
Purchase Options
eBook $38.99 $55.99
Paperback $69.99
VIEW PRODUCT
More Product Details Close
Data Observability for Data Engineering
Proactive strategies for ensuring data accuracy and addressing broken data pipelines
Description
In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.
Read more
Dec 2023 228 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Michele Pinto
Purchase Options
eBook $20.98 $29.99
Paperback $36.99
VIEW PRODUCT
More Product Details Close
Data Engineering with AWS
Learn how to design and build cloud-based data transformation pipelines using AWS
Description
Written by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS. As you progress, you’ll be taken through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. You’ll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data. By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.
Read more
Dec 2021 482 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Eagar
Purchase Options
eBook $35.99 $51.99
Paperback $64.99
VIEW PRODUCT
More Product Details Close
Data Engineering with Alteryx
Helping data engineers apply DataOps practices with Alteryx
Description
Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Read more
Jun 2022 366 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Paul Houghton
Purchase Options
eBook $25.99 $37.99
Paperback $46.99
VIEW PRODUCT
More Product Details Close
Azure Data Engineering Cookbook
Get well versed in various data engineering techniques in Azure using this recipe-based guide
Description
The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure. You’ll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You’ll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you’ll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview. By the end of this book, you’ll be able to build superior data engineering pipelines along with having an invaluable go-to guide.
Read more
Sep 2022 608 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Nagaraj Venkatesan
Purchase Options
eBook $28.99 $41.99
Paperback $51.99
VIEW PRODUCT
More Product Details Close
Apache Spark 3 for Data Engineering and Analytics with Python
Learn how to use Python and PySpark 3.0.1 for Data Engineering/Analytics (Databricks) - Beginner to Ninja
Description
Apache Spark 3 is an open-source distributed engine for querying and processing data. This course will provide you with a detailed understanding of PySpark and its stack. This course is carefully developed and designed to guide you through the process of data analytics using Python Spark. The author uses an interactive approach in explaining keys concepts of PySpark such as the Spark architecture, Spark execution, transformations and actions using the structured API, and much more. You will be able to leverage the power of Python, Java, and SQL and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Apache Spark architecture and how to set up a Python environment for Spark. Followed by the techniques for collecting, cleaning, and visualizing data by creating dashboards in Databricks. You will learn how to use SQL to interact with DataFrames. The author provides an in-depth review of RDDs and contrasts them with DataFrames. There are multiple problem challenges provided at intervals in the course so that you get a firm grasp of the concepts taught in the course. The code bundle for this course is available here: https://github.com/PacktPublishing/Apache-Spark-3-for-Data-Engineering-and-Analytics-with-Python-
Read more
8hrs 30mins
Published : Aug 2021
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author David Mngadi
Purchase Options
Video $54.99
VIEW PRODUCT
More Product Details Close
Data Engineering with AWS Cookbook
A recipe-based approach to help you tackle data engineering problems with AWS services
Description
Performing data engineering with Amazon Web Services (AWS) combines AWS's scalable infrastructure with robust data processing tools, enabling efficient data pipelines and analytics workflows. This comprehensive guide to AWS data engineering will teach you all you need to know about data lake management, pipeline orchestration, and serving layer construction. Through clear explanations and hands-on exercises, you’ll master essential AWS services such as Glue, EMR, Redshift, QuickSight, and Athena. Additionally, you’ll explore various data platform topics such as data governance, data quality, DevOps, CI/CD, planning and performing data migration, and creating Infrastructure as Code. As you progress, you will gain insights into how to enrich your platform and use various AWS cloud services such as AWS EventBridge, AWS DataZone, and AWS SCT and DMS to solve data platform challenges. Each recipe in this book is tailored to a daily challenge that a data engineer team faces while building a cloud platform. By the end of this book, you will be well-versed in AWS data engineering and have gained proficiency in key AWS services and data processing techniques. You will develop the necessary skills to tackle large-scale data challenges with confidence.
Read more
Nov 2024 528 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Trâm Ngọc Phạm
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Azure Data Engineering Cookbook
Design and implement batch and streaming analytics using Azure Cloud Services
Description
Data engineering is one of the faster growing job areas as Data Engineers are the ones who ensure that the data is extracted, provisioned and the data is of the highest quality for data analysis. This book uses various Azure services to implement and maintain infrastructure to extract data from multiple sources, and then transform and load it for data analysis. It takes you through different techniques for performing big data engineering using Microsoft Azure Data services. It begins by showing you how Azure Blob storage can be used for storing large amounts of unstructured data and how to use it for orchestrating a data workflow. You'll then work with different Cosmos DB APIs and Azure SQL Database. Moving on, you'll discover how to provision an Azure Synapse database and find out how to ingest and analyze data in Azure Synapse. As you advance, you'll cover the design and implementation of batch processing solutions using Azure Data Factory, and understand how to manage, maintain, and secure Azure Data Factory pipelines. You’ll also design and implement batch processing solutions using Azure Databricks and then manage and secure Azure Databricks clusters and jobs. In the concluding chapters, you'll learn how to process streaming data using Azure Stream Analytics and Data Explorer. By the end of this Azure book, you'll have gained the knowledge you need to be able to orchestrate batch and real-time ETL workflows in Microsoft Azure.
Read more
Apr 2021 454 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Nagaraj Venkatesan
Author Ahmad Osama
Purchase Options
eBook $24.99 $35.99
Paperback $48.99
VIEW PRODUCT
More Product Details Close
Azure Data Engineer Associate Certification Guide
Ace the DP-203 exam with advanced data engineering skills
Description
One of the top global cloud providers, Azure offers extensive data hosting and processing services, driving widespread cloud adoption and creating a high demand for skilled data engineers. The Azure Data Engineer Associate (DP-203) certification is a vital credential, demonstrating your proficiency as an Azure data engineer to prospective employers. This comprehensive exam guide is designed for both beginners and seasoned professionals, aligned with the latest DP-203 certification exam, to help you pass the exam on your first try. The book provides a foundational understanding of IaaS, PaaS, and SaaS, starting with core concepts like virtual machines (VMs), VNETS, and App Services and progressing to advanced topics such as data storage, processing, and security. What sets this exam guide apart is its hands-on approach, seamlessly integrating theory with practice through real-world examples, practical exercises, and insights into Azure's evolving ecosystem. Additionally, you'll unlock lifetime access to supplementary practice material on an online platform, including mock exams, interactive flashcards, and exam tips, ensuring a comprehensive exam prep experience. By the end of this book, you’ll not only be ready to excel in the DP-203 exam, but also be equipped to tackle complex challenges as an Azure data engineer.
Read more
May 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
548 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Palmieri
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Getting Started with DuckDB
A practical guide for accelerating your data science, data analytics, and data engineering workflows
Description
DuckDB is a fast in-process analytical database. Getting Started with DuckDB offers a practical overview of its usage. You'll learn to load, transform, and query various data formats, including CSV, JSON, and Parquet. The book covers DuckDB's optimizations, SQL enhancements, and extensions for specialized applications. Working with examples in SQL, Python, and R, you'll explore analyzing public datasets and discover tools enhancing DuckDB workflows. This guide suits both experienced and new data practitioners, quickly equipping you to apply DuckDB's capabilities in analytical projects. You'll gain proficiency in using DuckDB for diverse tasks, enabling effective integration into your data workflows.
Read more
Jun 2024 382 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Simon Aubury
Purchase Options
eBook $29.99 $43.99
Paperback $54.99
VIEW PRODUCT
More Product Details Close
Azure Data Engineer Associate Certification Guide
A hands-on reference guide to developing your data engineering skills and preparing for the DP-203 exam
Description
Azure is one of the leading cloud providers in the world, providing numerous services for data hosting and data processing. Most of the companies today are either cloud-native or are migrating to the cloud much faster than ever. This has led to an explosion of data engineering jobs, with aspiring and experienced data engineers trying to outshine each other. Gaining the DP-203: Azure Data Engineer Associate certification is a sure-fire way of showing future employers that you have what it takes to become an Azure Data Engineer. This book will help you prepare for the DP-203 examination in a structured way, covering all the topics specified in the syllabus with detailed explanations and exam tips. The book starts by covering the fundamentals of Azure, and then takes the example of a hypothetical company and walks you through the various stages of building data engineering solutions. Throughout the chapters, you'll learn about the various Azure components involved in building the data systems and will explore them using a wide range of real-world use cases. Finally, you’ll work on sample questions and answers to familiarize yourself with the pattern of the exam. By the end of this Azure book, you'll have gained the confidence you need to pass the DP-203 exam with ease and land your dream job in data engineering.
Read more
Feb 2022 574 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Alex
Purchase Options
eBook $43.99 $63.99
Paperback $79.99
VIEW PRODUCT
More Product Details Close
Snowflake - Build and Architect Data Pipelines Using AWS
Data engineering and architecting pipelines using Snowflake and AWS cloud
Description
Snowflake is the next big thing, and it is becoming a full-blown data ecosystem. With the level of scalability and efficiency in handling massive volumes of data and also with several new concepts in it, this is the right time to wrap your head around Snowflake and have it in your toolkit. This course not only covers the core features of Snowflake but also teaches you how to deploy Python/PySpark jobs in AWS Glue and Airflow that communicate with Snowflake, which is one of the most important aspects of building pipelines. In this course, you will look at Snowflake, and then the most crucial aspects of Snowflake in an efficient manner. You will be writing Python/Spark Jobs in AWS Glue Jobs for data transformation and seeing real-time streaming using Kafka and Snowflake. You will be interacting with external functions and use cases, and see the security features in Snowflake. Finally, you will look at Snowpark and explore how it can be used for data pipelines and data science. By the end of this course, you will have learned about Snowflake and Snowpark, and learned how to build and architect data pipelines using AWS. You need to have an active AWS account in order to perform the sections related to Python and PySpark. For the rest of the course, a free trial Snowflake account should suffice.
Read more
8hrs 39mins
Last Updated : May 2023 Published : Sep 2022
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Siddharth Raghunath
Purchase Options
Video $69.99
VIEW PRODUCT
More Product Details Close
Spark Programming in Scala for Beginners with Apache Spark 3
Data Engineering Using Spark Structured API
Description
Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at a massive scale. It has quickly become the largest open-source community in big data. So, mastering Apache Spark opens a wide range of professional opportunities. This course starts with an introduction to Apache Spark where you see what Apache Spark is in brief. Then, you will be installing and using Apache Spark. After that, you will look at the Spark execution model and architecture in detail. Next, you will learn the Spark programming model and developer experience. Following that, you will look at the Spark Structured API foundation, and Spark data sources and sinks. Then, you will explore Spark Data frame and dataset transformations along with aggregations in Apache Spark. Finally, you will look at the Spark Data frame joins in detail. By the end of this course, you will understand Spark programming and apply that knowledge to build data engineering solutions. All the resource files are uploaded on the GitHub repository at https://github.com/PacktPublishing/Spark-Programming-in-Scala-for-Beginners-with-Apache-Spark-3
Read more
6hrs 47mins
Published : Mar 2022
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author ScholarNest
Purchase Options
Video $19.99
VIEW PRODUCT
More Product Details Close
Spark Programming in Python for Beginners with Apache Spark 3
Learn Data Engineering using Spark Structured API
Description
If you are looking to expand your knowledge in data engineering or want to level up your portfolio by adding Spark programming to your skillset, then you are in the right place. This course will help you understand Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session-like approach. We will be taking a live coding approach and explaining all the concepts needed along the way. In this course, we will start with a quick introduction to Apache Spark, then set up our environment by installing and using Apache Spark. Next, we will learn about Spark execution model and architecture, and about Spark programming model and developer experience. Next, we will cover Spark structured API foundation and then move towards Spark data sources and sinks. Then we will cover Spark Dataframe and dataset transformations. We will also cover aggregations in Apache Spark and finally, we will cover Spark Dataframe joins. By the end of this course, you will be able to build data engineering solutions using Spark structured API in Python. All the resources for the course are available at https://github.com/PacktPublishing/Spark-Programming-in-Python-for-Beginners-with-Apache-Spark-3
Read more
6hrs 35mins
Published : Feb 2022
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author ScholarNest
Purchase Options
Video $49.99
VIEW PRODUCT
More Product Details Close
LLM Engineer's Handbook
Master the art of engineering large language models from concept to production
Description
Artificial intelligence has undergone rapid advancements, and Large Language Models (LLMs) are at the forefront of this revolution. This LLM book offers insights into designing, training, and deploying LLMs in real-world scenarios by leveraging MLOps best practices. The guide walks you through building an LLM-powered twin that’s cost-effective, scalable, and modular. It moves beyond isolated Jupyter notebooks, focusing on how to build production-grade end-to-end LLM systems. Throughout this book, you will learn data engineering, supervised fine-tuning, and deployment. The hands-on approach to building the LLM Twin use case will help you implement MLOps components in your own projects. You will also explore cutting-edge advancements in the field, including inference optimization, preference alignment, and real-time data processing, making this a vital resource for those looking to apply LLMs in their projects. By the end of this book, you will be proficient in deploying LLMs that solve practical problems while maintaining low-latency and high-availability inference capabilities. Whether you are new to artificial intelligence or an experienced practitioner, this book delivers guidance and practical techniques that will deepen your understanding of LLMs and sharpen your ability to implement them effectively.
Read more
Oct 2024 522 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Paul Iusztin
Purchase Options
eBook $47.99
Paperback $59.99
VIEW PRODUCT
More Product Details Close
Solutions Architect's Handbook
Kick-start your career with architecture design principles, strategies, and generative AI techniques
Description
Master the art of solution architecture and excel as a Solutions Architect with the Solutions Architect's Handbook. Authored by seasoned AWS technology leaders Saurabh Shrivastav and Neelanjali Srivastav, this book goes beyond traditional certification guides, offering in-depth insights and advanced techniques to meet the specific needs and challenges of solutions architects today. This edition introduces exciting new features that keep you at the forefront of this evolving field. Large language models, generative AI, and innovations in deep learning are cutting-edge advancements shaping the future of technology. Topics such as cloud-native architecture, data engineering architecture, cloud optimization, mainframe modernization, and building cost-efficient and secure architectures remain important in today's landscape. This book provides coverage of these emerging and key technologies and walks you through solution architecture design from key principles, providing you with the knowledge you need to succeed as a Solutions Architect. It will also level up your soft skills, providing career-accelerating techniques to help you get ahead. Unlock the potential of cutting-edge technologies, gain practical insights from real-world scenarios, and enhance your solution architecture skills with the Solutions Architect's Handbook.
Read more
Mar 2024
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7
578 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Saurabh Shrivastava
Purchase Options
eBook $32.99 $47.99
Paperback $59.99
VIEW PRODUCT
More Product Details Close
Polars Cookbook
Over 60 practical recipes to transform, manipulate, and analyze your data using Python Polars 1.x
Description
The Polars Cookbook is a comprehensive, hands-on guide to Python Polars, one of the first resources dedicated to this powerful data processing library. Written by Yuki Kakegawa, a seasoned data analytics consultant who has worked with industry leaders like Microsoft and Stanford Health Care, this book offers targeted, real-world solutions to data processing, manipulation, and analysis challenges. The book also includes a foreword by Marco Gorelli, a core contributor to Polars, ensuring expert insights into Polars' applications. From installation to advanced data operations, you’ll be guided through data manipulation, advanced querying, and performance optimization techniques. You’ll learn to work with large datasets, conduct sophisticated transformations, leverage powerful features like chaining, and understand its caveats. This book also shows you how to integrate Polars with other Python libraries such as pandas, numpy, and PyArrow, and explore deployment strategies for both on-premises and cloud environments like AWS, BigQuery, GCS, Snowflake, and S3. With use cases spanning data engineering, time series analysis, statistical analysis, and machine learning, Polars Cookbook provides essential techniques for optimizing and securing your workflows. By the end of this book, you'll possess the skills to design scalable, efficient, and reliable data processing solutions with Polars.
Read more
Aug 2024 394 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Yuki Kakegawa
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Learn Microsoft Fabric
A practical guide to performing data analytics in the era of artificial intelligence
Description
Discover the capabilities of Microsoft Fabric, the premier unified solution designed for the AI era, seamlessly combining data integration, OneLake, transformation, visualization, universal security, and a unified business model. This book provides an overview of Microsoft Fabric, its components, and the wider analytics landscape. In this book, you'll explore workloads such as Data Factory, Synapse Data Engineering, data science, data warehouse, real-time analytics, and Power BI. You’ll learn how to build end-to-end lakehouse and data warehouse solutions using the medallion architecture, unlock the real-time analytics, and implement machine learning and AI models. As you progress, you’ll build expertise in monitoring workloads and administering Fabric across tenants, capacities, and workspaces. The book also guides you step by step through enhancing security and governance practices in Microsoft Fabric and implementing CI/CD workflows with Azure DevOps or GitHub. Finally, you’ll discover the power of Copilot, an AI-driven assistant that accelerates your analytics journey. By the end of this book, you’ll have unlocked the full potential of AI-driven data analytics, gaining a comprehensive understanding of the analytics landscape and mastery over the essential concepts and principles of Microsoft Fabric.
Read more
Feb 2024
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
338 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Arshad Ali
Purchase Options
eBook $24.99 $35.99
Paperback $35.98 $44.99
VIEW PRODUCT
More Product Details Close
Building ETL Pipelines with Python
Create and deploy enterprise-ready ETL pipelines by employing modern methods
Description
Modern extract, transform, and load (ETL) pipelines for data engineering have favored the Python language for its broad range of uses and a large assortment of tools, applications, and open source components. With its simplicity and extensive library support, Python has emerged as the undisputed choice for data processing. In this book, you’ll walk through the end-to-end process of ETL data pipeline development, starting with an introduction to the fundamentals of data pipelines and establishing a Python development environment to create pipelines. Once you've explored the ETL pipeline design principles and ET development process, you'll be equipped to design custom ETL pipelines. Next, you'll get to grips with the steps in the ETL process, which involves extracting valuable data; performing transformations, through cleaning, manipulation, and ensuring data integrity; and ultimately loading the processed data into storage systems. You’ll also review several ETL modules in Python, comparing their pros and cons when building data pipelines and leveraging cloud tools, such as AWS, to create scalable data pipelines. Lastly, you’ll learn about the concept of test-driven development for ETL pipelines to ensure safe deployments. By the end of this book, you’ll have worked on several hands-on examples to create high-performance ETL pipelines to develop robust, scalable, and resilient environments using Python.
Read more
Sep 2023 246 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Brij Kishore Pandey
Purchase Options
eBook $18.99 $27.99
Paperback $34.99
VIEW PRODUCT
More Product Details Close
Databricks Certified Associate Developer for Apache Spark Using Python
The ultimate guide to getting certified in Apache Spark using practical examples with Python
Description
Spark has become a de facto standard for big data processing. Migrating data processing to Spark saves resources, streamlines your business focus, and modernizes workloads, creating new business opportunities through Spark’s advanced capabilities. Written by a senior solutions architect at Databricks, with experience in leading data science and data engineering teams in Fortune 500s as well as startups, this book is your exhaustive guide to achieving the Databricks Certified Associate Developer for Apache Spark certification on your first attempt. You’ll explore the core components of Apache Spark, its architecture, and its optimization, while familiarizing yourself with the Spark DataFrame API and its components needed for data manipulation. You’ll also find out what Spark streaming is and why it’s important for modern data stacks, before learning about machine learning in Spark and its different use cases. What’s more, you’ll discover sample questions at the end of each section along with two mock exams to help you prepare for the certification exam. By the end of this book, you’ll know what to expect in the exam and gain enough understanding of Spark and its tools to pass the exam. You’ll also be able to apply this knowledge in a real-world setting and take your skillset to the next level.
Read more
Jun 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
274 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Saba Shah
Purchase Options
eBook $18.99 $27.99
Paperback $34.99
VIEW PRODUCT
More Product Details Close
Solutions Architect's Handbook
Kick-start your career as a solutions architect by learning architecture design principles and strategies
Description
Becoming a solutions architect requires a hands-on approach, and this edition of the Solutions Architect's Handbook brings exactly that. This handbook will teach you how to create robust, scalable, and fault-tolerant solutions and next-generation architecture designs in a cloud environment. It will also help you build effective product strategies for your business and implement them from start to finish. This new edition features additional chapters on disruptive technologies, such as Internet of Things (IoT), quantum computing, data engineering, and machine learning. It also includes updated discussions on cloud-native architecture, blockchain data storage, and mainframe modernization with public cloud. The Solutions Architect's Handbook provides an understanding of solution architecture and how it fits into an agile enterprise environment. It will take you through the journey of solution architecture design by providing detailed knowledge of design pillars, advanced design patterns, anti-patterns, and the cloud-native aspects of modern software design. By the end of this handbook, you'll have learned the techniques needed to create efficient architecture designs that meet your business requirements.
Read more
Jan 2022
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
17hrs 41mins
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Saurabh Shrivastava
Purchase Options
eBook $49.99 $71.99
Paperback $89.99
Audiobook $64.99
VIEW PRODUCT
More Product Details Close
Azure Data and AI Architect Handbook
Adopt a structured approach to designing data and AI solutions at scale on Microsoft Azure
Description
With data’s growing importance in businesses, the need for cloud data and AI architects has never been higher. The Azure Data and AI Architect Handbook is designed to assist any data professional or academic looking to advance their cloud data platform designing skills. This book will help you understand all the individual components of an end-to-end data architecture and how to piece them together into a scalable and robust solution. You’ll begin by getting to grips with core data architecture design concepts and Azure Data & AI services, before exploring cloud landing zones and best practices for building up an enterprise-scale data platform from scratch. Next, you’ll take a deep dive into various data domains such as data engineering, business intelligence, data science, and data governance. As you advance, you’ll cover topics ranging from learning different methods of ingesting data into the cloud to designing the right data warehousing solution, managing large-scale data transformations, extracting valuable insights, and learning how to leverage cloud computing to drive advanced analytical workloads. Finally, you’ll discover how to add data governance, compliance, and security to solutions. By the end of this book, you’ll have gained the expertise needed to become a well-rounded Azure Data & AI architect.
Read more
Jul 2023 284 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Olivier Mertens
Purchase Options
eBook $27.98 $39.99
Paperback $49.99
VIEW PRODUCT
More Product Details Close
Geospatial Analysis with SQL
A hands-on guide to performing geospatial analysis by unlocking the syntax of spatial SQL
Description
Geospatial analysis is industry agnostic and a powerful tool for answering location questions. Combined with the power of SQL, developers and analysts worldwide rely on database integration to solve real-world spatial problems. This book introduces skills to help you detect and quantify patterns in datasets through data exploration, visualization, data engineering, and the application of analysis and spatial techniques. You will begin by exploring the fundamentals of geospatial analysis where you’ll learn about the importance of geospatial analysis and how location information enhances data exploration. Walter Tobler’s second law of geography states, “the phenomenon external to a geographic area of interest affects what goes on inside.” This quote will be the framework of the geospatial questions we will explore. You’ll then observe the framework of geospatial analysis using SQL while learning to create spatial databases and SQL queries and functions. By the end of this book, you will have an expanded toolbox of analytic skills such as PostGIS and QGIS to explore data questions and analysis of spatial information.
Read more
Oct 2023 234 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Bonny P McClain
Purchase Options
eBook $24.99 $35.99
Paperback $44.99
VIEW PRODUCT
More Product Details Close
The Ultimate Guide to Snowpark
Design and deploy Snowflake Snowpark with Python for efficient data workloads
Description
Snowpark is a powerful framework that helps you unlock numerous possibilities within the Snowflake Data Cloud. However, without proper guidance, leveraging the full potential of Snowpark with Python can be challenging. Packed with practical examples and code snippets, this book will be your go-to guide to using Snowpark with Python successfully. The Ultimate Guide to Snowpark helps you develop an understanding of Snowflake Snowpark and how it enables you to implement workloads in data engineering, data science, and data applications within the Data Cloud. From configuration and coding styles to workloads such as data manipulation, collection, preparation, transformation, aggregation, and analysis, this guide will equip you with the right knowledge to make the most of this framework. You'll discover how to build, test, and deploy data pipelines and data science models. As you progress, you’ll deploy data applications natively in Snowflake and operate large language models (LLMs) using Snowpark container services. By the end of this book, you'll be able to leverage Snowpark's capabilities and propel your career as a Snowflake developer to new heights.
Read more
May 2024 254 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Shankar Narayanan SGS
Purchase Options
eBook $22.99 $33.99
Paperback $41.99
VIEW PRODUCT
More Product Details Close
Python for ArcGIS Pro
Automate cartography and data analysis using ArcPy, ArcGIS API for Python, Notebooks, and pandas
Description
Integrating Python into your day-to-day ArcGIS work is highly recommended when dealing with large amounts of geospatial data. Python for ArcGIS Pro aims to help you get your work done faster, with greater repeatability and higher confidence in your results. Starting from programming basics and building in complexity, two experienced ArcGIS professionals-turned-Python programmers teach you how to incorporate scripting at each step: automating the production of maps for print, managing data between ArcGIS Pro and ArcGIS Online, creating custom script tools for sharing, and then running data analysis and visualization on top of the ArcGIS geospatial library, all using Python. You’ll use ArcGIS Pro Notebooks to explore and analyze geospatial data, and write data engineering scripts to manage ongoing data processing and data transfers. This exercise-based book also includes three rich real-world case studies, giving you an opportunity to apply and extend the concepts you studied earlier. Irrespective of your expertise level with Esri software or the Python language, you’ll benefit from this book’s hands-on approach, which takes you through the major uses of Python for ArcGIS Pro to boost your ArcGIS productivity.
Read more
Apr 2022 586 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Toms
Purchase Options
eBook $36.99 $53.99
Paperback $65.99
VIEW PRODUCT
More Product Details Close
Essential PySpark for Scalable Data Analytics
A beginner's guide to harnessing the power and ease of PySpark 3
Description
Apache Spark is a unified data analytics engine designed to process huge volumes of data quickly and efficiently. PySpark is Apache Spark's Python language API, which offers Python developers an easy-to-use scalable data analytics framework. Essential PySpark for Scalable Data Analytics starts by exploring the distributed computing paradigm and provides a high-level overview of Apache Spark. You'll begin your analytics journey with the data engineering process, learning how to perform data ingestion, cleansing, and integration at scale. This book helps you build real-time analytics pipelines that help you gain insights faster. You'll then discover methods for building cloud-based data lakes, and explore Delta Lake, which brings reliability to data lakes. The book also covers Data Lakehouse, an emerging paradigm, which combines the structure and performance of a data warehouse with the scalability of cloud-based data lakes. Later, you'll perform scalable data science and machine learning tasks using PySpark, such as data preparation, feature engineering, and model training and productionization. Finally, you'll learn ways to scale out standard Python ML libraries along with a new pandas API on top of PySpark called Koalas. By the end of this PySpark book, you'll be able to harness the power of PySpark to solve business problems.
Read more
Oct 2021 322 pages
AI Assistant AI Assistant
AI Assistant: Features our revolutionary AI Assistant technology which lets you interact with our Books to enhance your learning
Authors
Author Nudurupati
Purchase Options
eBook $27.98 $39.99
Paperback $48.99
VIEW PRODUCT
36  items/page