Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!
👋 Hello,
Welcome to BI-Pro #54: Your Premier Destination for Data and Business Intelligence Insights! 🌟
In this edition, we dive deep into the cutting-edge solutions of business intelligence, data modeling, and advanced analytics. Prepare to explore an array of transformative topics and industry insights that will redefine how you interact with technology and data.
🧩 Highlights of This Issue:
Python Practice Platforms: The top 7 platforms where you can sharpen your Python skills.
Innovative Experiments: Dive into hands-on experiments with MLFlow and Microsoft Fabric to enhance your project’s efficiency.
SAP Expertise: Master the complex data models of SAP and leverage them for optimal performance.
AI-Powered Business Management: Learn how to integrate AI to streamline and enhance business management functions.
Snowflake’s Surveillance: Monitor your data pipelines effectively using Snowflake’s Data Metric Functions.
🧬 Stay Informed with Industry Highlights:
Power BI: Learn about the significant deprecation of AutoML in Power BI using Dataflows V1.
Microsoft Fabric: Get the scoop on the new code-first AutoML and hyperparameter tuning, now available in public preview.
AWS BI: Discover how to build SAP Golden AMIs with EC2 Image Builder and Ansible and explore the transformative impact of Amazon Q on business experiences.
Google Cloud Data: Catch up with the latest updates from the Google Cloud Cortex Framework.
Tableau: Uncover how Einstein Copilot for Tableau is building the next generation of AI-driven analytics.
From the Experts at Packt Community: Gain insights from industry leaders on the fundamentals of Analytics Engineering.
🧮 What’s the Latest from the BI Community?
Explore real-time AI capabilities with Datorios’ new observability tool.
Learn about Snowflake's launch of Arctic, an enterprise-grade LLM.
Discover how Qlik's AI Accelerator is integrating generative AI to deliver customer outcomes.
Witness the future of AI with Avant Technologies’ new supercomputing advancements.
Join us as we unpack these topics to keep you at the forefront of the data and BI world. Stay curious, stay informed!
📥 Feedback on the Weekly Edition
Take our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."
📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.
We appreciate your input and hope you enjoy the book!
Share your thoughts and opinions here!
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt
Packt BI-Pro is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Sign Up | Advertise | Archives
🧩 pixiedust/pixiedust: PixieDust is an open-source library enhancing Jupyter notebooks, improving data work experience, particularly for cloud-hosted notebooks without configuration access.
🧩 plotly/plotly.py: plotly.py is an interactive, open-source graphing library for Python, offering over 30 chart types, including scientific, 3D, statistical, and financial charts.
🧩 AykutSarac/jsoncrack.com: JSON Crack is a free, open-source data visualization app for JSON, YAML, XML, CSV, etc., offering interactive graphs for easy data exploration and analysis.
🧩 apexcharts/apexcharts.js: ApexCharts is a JavaScript charting library with a simple API, 100+ samples, and over a dozen chart types for beautiful, responsive visualizations in apps and dashboards.
🧩 antvis/G2: G2 is a visualization library inspired by "The Grammar of Graphics," offering an introduction, examples, tutorials, and API reference for learning and using its core concepts.
🧩 visgl/deck.gl: deck.gl simplifies high-performance, WebGL2/WebGPU-based visualization of large datasets. It offers pre-built layers for easy setup or customizable architecture for tailored needs.
Email Forwarded? Join BI-Pro Here!
🧬 7 Best Platforms to Practice Python: The article lists seven platforms—Practice Python, Edabit, CodeWars, Exercism, PYnative, LeetCode, and HackerRank—that offer various levels of programming challenges for learning and practicing Python, particularly for coding interviews and skill improvement.
🧬 Experimenting with MLFlow and Microsoft Fabric: The blog discusses the importance of systematic experimentation in machine learning (ML) to improve model performance, highlighting the use of MLFlow within Fabric for managing ML experiments. It covers setting up experiments, running them, logging results, and analyzing them, emphasizing the importance of tracking configurations and outcomes for iterative improvement in ML models.
🧬 Mastering SAP’s data models: The article discusses challenges faced in understanding SAP data models for analytics, focusing on integrating procurement data. It explains SAP's ERP software, data architecture basics, table types (master vs. transaction), and data mapping for procurement tables.
🧬 Building an AI-Powered Business Manager: The post explores the concept of consolidating business management into a single, chat-based platform powered by Large Language Models (LLMs). It discusses the advantages for small businesses, outlines project structure, sets up the database, and updates the Tool class to handle SQLModel instances.
🧬 Monitor Data Pipelines Using Snowflake’s Data Metric Functions: The author emphasizes the importance of data quality in gaining trust with stakeholders and focuses on using Google's Site Reliability Engineering principles to measure the health of data systems. It discusses defining service level indicators and objectives for data quality dimensions and provides a technical implementation example in Snowflake.
Power BI
🧮 Deprecation of AutoML in Power BI using Dataflows V1: The update announces the deprecation of Power BI Automated Machine Learning (AutoML) models for Dataflows V1 in all regions as of the third week of April. Customers are encouraged to migrate to the AutoML solution based on Synapse Data Science in Microsoft Fabric, offering a more customizable AutoML experience with advanced tools and features.
Microsoft Fabric
🧮 Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science: The update introduces code-first automated machine learning (AutoML) and hyperparameter tuning in Public Preview for Fabric Data Science. Users can access both AutoML and Tune capabilities seamlessly within the Fabric 1.2 runtime, enhancing machine learning model optimization and accessibility.
🧮 Fabric Change the Game: Embracing Azure Cosmos DB for NoSQL. The post explores setting up Azure Cosmos DB for NoSQL and leveraging Vector Search capabilities of AI Search Services through Microsoft Fabric's Lakehouse features. It also discusses integrating Cosmos DB Mirror and using Python coding facilitated through Lakehouse, highlighting Fabric's integration capabilities for search or data mirroring.
🧮 Microsoft Fabric April 2024 Update: The April 2024 update brings various enhancements and previews to Microsoft Fabric, including new visuals like the 100% Stacked Area Chart, improvements to reporting, data connectivity, administration features, analytics, real-time analytics, data factory, and data pipelines. Additionally, the update includes the availability of Exam DP-600 for Fabric Analytics Engineer certification and free learning sessions.
AWS BI
🧮 Build SAP Golden AMIs with EC2 Image Builder and Ansible: This blog post guides users on building a reusable Amazon Machine Image (AMI) for deploying Amazon Elastic Compute Cloud (EC2) instances for SAP installations. It covers using Terraform and Ansible to automate the process and provides sample code.
🧮 Transforming Business Experiences: The Impact of Amazon Q and Generative BI for AWS Partners. This post highlights how advances in AI, particularly Amazon Q and generative BI, are transforming business operations. It showcases how AWS partners like ZS Associates, Tiger Analytics, and Compass UOL are leveraging these innovations for industry-specific solutions.
Google Cloud Data
🧮 What’s new with Google Cloud Cortex Framework? The article discusses Google Cloud Cortex Framework, emphasizing its role in unifying enterprise data for AI-driven insights. It highlights new solutions for marketing, sustainability management, and finance, showcasing how Cortex Framework accelerates innovation, enhances decision-making, and drives business efficiency in the AI era.
Tableau
🧮 Einstein Copilot for Tableau: Building the Next Generation of AI-Driven Analytics. The post delves into the development of Einstein Copilot for Tableau, an AI-driven tool revolutionizing data analysis. It highlights the challenges and solutions in building its infrastructure, improving accuracy and efficiency, and enhancing AI and core capabilities through collaboration and continuous improvement.
The role of dbt in analytics engineering
dbt emerged as a solution to the challenges relating to data transformation faced in data analysis. Initially crafted as an open-source Python package, dbt aimed to bring software engineering best practices to the world of analytics.
Over time, dbt matured beyond just a package, becoming a versatile cloud service. While the open-source package remains available and actively supported, dbt now offers a cloud-based version, packed with features such as an integrated development environment (IDE), scheduling tools, data lineage trackers, and hosted documentation. This is especially valuable for analysts who might not have a deep software engineering background.
For more information on dbt’s history, read https://www.getdbt.com/blog/what-exactly-is-dbt. We will use dbt Cloud, which offers a free tier for a single developer: that’s you! You can learn more about its pricing here: https://www.getdbt.com/pricing.
dbt seamlessly integrates into the ELT architecture. It does not store or process data but serves as a bridge between analysts and the data warehouse. dbt’s position in a data stack as an intermediary in the transformation layer.
This is how it works: analysts draft SQL queries, enhanced with dbt’s unique capabilities. dbt then translates this specialized SQL into the native SQL of the data warehouse and dispatches it for execution. All the transformed data and results remain within the data warehouse, making dbt a lightweight yet powerful tool in the analytics toolkit.
Because of dbt’s pivotal position in analytics engineering, we will spend more time discussing its features and zooming in on best practices.
First, we will set up dbt for our use case.
Setting up dbt Cloud
The following steps are required for dbt:
Creating a dbt Cloud account.
Setting up a connection from dbt Cloud to BigQuery.
Testing the connection by querying the data using dbt Cloud.
Follow the step-by-step instructions here: https://github.com/PacktPublishing/Fundamentals-of-Analytics-Engineering/blob/main/chapter_8/guides/setting_up_dbt_cloud.md.
Now, let’s focus on the various data layers in dbt.
Data layers in dbt
It is a widespread practice to separate the data we use for analytics into layers. This helps data practitioners communicate the distinct parts of the data transformation process. Broadly speaking, the process will fall into three layers in dbt, Raw, Preparation and Business.
Let’s take a closer look:
Raw layer: The source data is stored in the form it arrives in. Whenever you receive data, it should be stored as-is so that you have a backup in case something goes wrong during the transformations. When you copied the Excel sheets using Airbyte, they became part of the raw layer inside BigQuery.
Preparation layer: In the second layer, the raw data is cleaned, deduplicated, and transformed to conform to naming conventions and other rules. For our data, this could mean renaming fields for readability and standardizing sales figures from cents to euros.
Business layer: In the final layer, business rules are applied to the prepared data, and different data is joined and modeled into datasets that are ready for consumption by BI tools and stakeholders. In our case, we might add a business rule to disregard negative sales amounts when summing the total stroopwafels sold, as these are likely an error. The resulting data can then be served to the BI tool for dashboarding.
Discover more insights from Fundamentals of Analytics Engineering - By Dumky De Wilde, Fanny Kassapian, Jovan Gligorevic and 4 more. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!
🧠 Datorios unleashes real-time AI with the first observability tool for streaming data: Datorios introduces the first observability tool for Apache Flink, offering deep insights into streaming data processing. It enables faster AI innovation and thorough auditability, providing developers with event visualization, event search, state monitoring, window analysis, and more. Datorios is now publicly available for free.
🧠 Snowflake Launches Arctic: The Most Open, Enterprise-Grade Large Language Model: Snowflake introduces Snowflake Arctic, an open, enterprise-grade large language model (LLM) with a Mixture-of-Experts architecture, optimized for complex enterprise workloads. Arctic sets new openness standards for AI technology, offering weights under an Apache 2.0 license and enhancing AI innovation.
🧠 Introducing Qlik's AI Accelerator - Delivering Tangible Customer Outcomes in Generative AI Integration: Qlik is at the forefront of integrating generative AI, particularly Large Language Models (LLMs), into data analysis and decision-making. They address key challenges like data privacy, technical complexity, and cost, offering seamless integration of popular LLMs and an AI Accelerator program to quickly prove the benefits of AI integration with minimal barriers to entry.
🧠 Avant Technologies Launches Advanced AI Supercomputing: Avant Technologies, an AI company, introduces a supercomputing network and licensable dataset with Wired4Tech, aiming to accelerate AI adoption. The offerings include a versatile AI dataset, dynamic resource scaling, accelerated AI processing, robust security measures, and seamless integration, designed to empower developers and drive innovation across industries.
See you next time!