Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!
👋 Hello,
Welcome to BI-Pro #49, your ultimate guide to data and BI insights! 🚀
⏩ What's Inside?
Python Simplified: Master data validation with Pydantic.
Visualize Like a Pro: 30+ tools for stunning data visuals.
R for Bioinformatics: Custom visuals for bio data.
Interactive Data: JavaScript meets Handsontable.
Seaborn Stories: Craft data tales with line plots.
MetaGPT Insights: Next-gen data solutions unveiled.
🏭 Industry Scoop:
Power BI’s Latest: March's must-know features.
Fabric Innovations: Updates and new tools from Microsoft Fabric.
AWS Well-Architected Data Analytics Lens: Analytics strategies for the real world.
Google Cloud Savings: Cut costs on ETL workflows.
Tableau Journeys: From student to BI analyst.
💎 Expert Takes:
Deep Dive into Python Deep Learning: The latest from Packt.
👉 Community Buzz:
Twitch Chat Analysis, Graph Networks, LLM Data Quality, and Ethical AI: Key conversations this week!
Dive into the trends shaping data and BI today!
📥 Feedback on the Weekly Edition
Take our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."
📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.
We appreciate your input and hope you enjoy the book!
Share your thoughts and opinions here!
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt
Thanks for reading Packt BI-Pro! Subscribe for free to receive new posts and support our work.
Pledge your support
Sign Up | Advertise | Archives
🌀 man-group/ArcticDB: ArcticDB is a high-performance DataFrame database designed for Python Data Science, with a Python-centric API for Pandas DataFrames.
🌀 gradio-app/gradio: Gradio is an open-source Python package for building demos or web apps for ML models or Python functions, with easy sharing via built-in features.
🌀 Sinaptik-AI/pandas-ai: PandasAI is a Python library using generative AI to explore, clean, and analyze data with natural language queries.
🌀 OpenRefine/OpenRefine: OpenRefine is a powerful Java-based tool for loading, understanding, cleaning, reconciling, and augmenting data, accessible from a web browser.
🌀 Kanaries/pygwalker: PyGWalker simplifies Jupyter Notebook workflows by converting pandas dataframes into interactive user interfaces for data analysis and visualization.
🌀 cleanlab/cleanlab: cleanlab aids in data and label cleaning by identifying issues in ML datasets automatically, enabling better model training with real-world data.
Email Forwarded? Join BI-Pro Here!
🌀 Pydantic Tutorial: Data Validation in Python Made Simple. This blog tutorial explains how to use Pydantic, a data validation and serialization library in Python, to validate and serialize data classes, offering support for custom validators and Python's type hints for field validation.
🌀 30+ Data Visualization Libraries, Frameworks and Apps, Mastering Data Presentation: Explore over 30 data visualization tools like Metabase, Gephi, and Grafana, offering a range of features to transform raw data into meaningful visualizations for better decision-making in industries like tech, healthcare, finance, and marketing.
🌀 Mastering Data Visualization in R for Bioinformatics: The article delves into data visualization in R for bioinformatics, stressing its role in understanding complex biological data, communicating findings, hypothesis generation, and decision-making. It also discusses Anscombe's Quartet, highlighting the importance of visualizing data before analysis and the limitations of summary statistics.
🌀 Integrating JavaScript charting libraries with Handsontable: The article guides developers on integrating Highcharts, Recharts, and Chart.js with Handsontable for data visualization. It explains the features of each library and provides demos for creating a stock portfolio with interactive charts.
🌀 Data Visualization with Seaborn Line Plot: The article introduces Seaborn, a Python library for data visualization, built on top of Matplotlib. It covers installation and demonstrates creating single line plots and customizing styles for better presentation of data.
🌀 MetaGPT’s Data Interpreter: SOTA Open Source LLM-based Data Solutions. MetaGPT introduces its Data Interpreter, a new agent for streamlined data interpretation and analysis. The Data Interpreter employs advanced techniques for real-time data adaptability, tool integration, and logical inconsistency identification, showcasing superior performance in machine learning tasks.
🌀 Power BI March 2024 Feature Summary: The Power BI update introduces visual calculation editing, data model editing in the Power BI Service, and report subscription delivery to OneDrive SharePoint. A new Microsoft Fabric certification exam, DP-600, is also available, with free certification opportunities through the Fabric AI Skills Challenge.
🌀 Announcing the Public Preview of Database Mirroring in Microsoft Fabric: Mirroring, now in Public Preview, allows seamless integration of databases into Microsoft Fabric's OneLake, providing real-time insights without ETL. It simplifies data replication and warehousing, enabling easy data access and analysis across different sources, including data lakes and warehouses.
🌀 Get data with Power Query available in Power BI Report Builder (Preview): Power BI Report Builder now allows connecting to 100+ data sources like Snowflake, Databricks, and AWS Redshift. You can transform data using M-Query for paginated reports. Install the latest version and connect from the "Data" tab.
🌀 Microsoft Fabric March 2024 Update: This update brings new features like OneLake File Explorer, Autotune Query Tuning, and Test Framework for Power Query SDK in VS Code to Power BI, enhancing reporting, modeling, service, mobile, and developer experiences.
🌀 Data Factory Adds CI/CD to Fabric Data Pipelines: Fabric engineers with Azure Synapse Analytics and Azure Data Factory experience can now utilize Git integration and built-in Deployment Pipelines in Data Factory data pipelines in Fabric. This public preview offers source control, CI/CD features, and collaborative development environments, enhancing data analytics projects.
🌀 Microsoft Fabric Lifecycle Management – Getting started with Git Integration and Deployment Pipelines: Microsoft Fabric makes Lifecycle Management easy, enabling continuous releases through Git and Deployment Pipelines. Git allows reliable updates for supported items like Lakehouse, Notebooks, and Reports, while Deployment Pipelines clone content between stages like DEV, TEST, UAT, and PROD.
🌀 Announcing the AWS Well-Architected Data Analytics Lens: The Data Analytics Lens helps assess and improve analytics platforms on AWS. It offers best practices, such as building ACID-compliant data lakes and leveraging Serverless for data pipelines, aligned with the AWS Well-Architected Framework's pillars for secure, efficient, and cost-effective solutions.
🌀 Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics. The post discusses how healthcare providers can improve patient care by leveraging AWS services for real-time analytics and personalized healthcare, focusing on a zero-ETL approach to data integration.
🌀 Enrich streaming data in Bigtable with Dataflow: The post discusses the importance of event stream processing in data engineering and introduces Apache Beam's Enrichment transform, which simplifies the process of enriching streaming data with Bigtable, improving data context and enabling more meaningful analysis.
🌀 Dataflow at-least-once vs. exactly-once streaming modes: The post compares exactly-once and at-least-once processing modes in Dataflow Streaming Engine for streaming jobs. It explains the trade-offs between the two modes and provides guidance on choosing the right mode based on use case requirements.
🌀 Data is both art and science - My Tableau Story: Andy Cotgreave. The post highlights Andy Cotgreave's journey from a data analyst at Oxford to becoming a Senior Technical Evangelist at Tableau. It emphasizes the importance of community engagement, innovation, building a portfolio, and having fun in data visualization.
🌀 Student to BI Analyst, How Tableau Can Lead to a Successful Data Career: This blog discusses Karolina Grodzinska's data visualization journey, from discovering Tableau to winning Iron Viz: Student Edition and becoming a Business Intelligence Analyst at Schneider Electric. Karolina emphasizes the importance of an active Tableau Public profile in career development and shares tips for building a strong portfolio and networking with the Tableau Community.
Python Deep Learning - Third Edition - By Ivan Vasilev
Developing NN models for edge devices with TF Lite
TF Lite is a TF-derived set of tools that allows us to run models on mobile, embedded, and edge devices. Its versatility is part of TF’s appeal for industrial applications (as opposed to research applications, where PyTorch dominates).
The key paradigm of TF Lite is that the models run on-device, contrary to client-server architecture, where the model is deployed on remote, more powerful, hardware. This organization has the following implications (both good and bad):
Low-latency execution: The lack of server-round trip significantly reduces the model inference time and allows us to run real-time applications.
Privacy: The user data never leaves the device.
Internet connectivity: Internet connectivity is not required.
Small model size: The devices have limited computational ability, hence the need for small and computationally efficient models. More specifically, TF Lite models are stored in the FlatBuffers (
https://flatbuffers.dev/
) special efficient portable format, identified by the .tflite file extension. Besides its small size, it allows us to access data directly without parsing/unpacking it first.
TF Lite models support a subset of the TF Core operations and allow us to define custom ones:
Low power consumption: The devices often run on battery.
Divergent training and inference: NN training is a lot more computationally intensive compared to inference. Because of this, the model training runs on a different, more powerful, piece of hardware than the actual devices, where the models will run inference.
In addition, TF Lite has the following key features:
Multi-platform and multi-language support, including Android (Java), iOS (Objective-C and Swift) devices, web (JavaScript), and Python for all other environments. Google provides a TF Lite wrapper API called MediaPipe Solutions (https://developers.google.com/mediapipe, https://github.com/google/mediapipe/), which supersedes the previous TF Lite API.
Optimized for performance.
It has end-to-end solution pipelines. TF Lite is oriented toward practical applications, rather than research. Because of this, it includes different pipelines for common ML tasks such as image classification, object detection, text classification, and question answering among others. The computer vision pipelines use modified versions of EfficientNet or MobileNet, and the natural language processing pipelines use BERT-based models.
So, how does TF Lite model development work? First, we’ll select a model in one of the following ways:
An existing pre-trained .tflite model (https://tfhub.dev/s?deployment-format=lite).
Use MediaPipe Model Maker (https://developers.google.com/mediapipe/solutions/model_maker) to apply feature engineering transfer learning on an existing .tflite model with a custom training dataset. Model Maker only works with Python.
Convert a full-fledged TF model into .tflite format.
Discover more insights from 'Python Deep Learning - Third Edition' by Ivan Vasilev. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!
🌀 Real-Time Twitch Chat Sentiment Analysis with Apache Flink: This blog explores building a real-time sentiment analysis application for Twitch chat using Apache Flink. It covers setting up the project, reading Twitch chat messages, performing sentiment analysis, and concludes with a demo.
🌀 Entity Type Prediction with Relational Graph Convolutional Network (PyTorch): This post discusses a Python setup for predicting entity types on heterogeneous graphs using the Relational Graph Convolutional Network (R-GCN) and the RGCNConv module from PyTorch. It explains knowledge graphs, entity type prediction, and the R-GCN model.
🌀 Data Quality Error Detection powered by LLMs: This article explores automating the identification of data errors in tabular datasets using Large Language Models (LLMs). It discusses the Data Dirtiness Score, challenges in data cleaning, and the potential of LLMs in detecting data quality issues.
🌀 Building Ethical AI Starts with the Data Team — Here’s Why: This article discusses the ethical considerations of AI, focusing on model bias, AI usage, and data responsibility. It emphasizes the role of data teams in ensuring ethical AI and suggests steps for data teams to take towards a more ethical future.
See you next time!