Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!
Get the first look at Sigma's new features and functionality at our virtual product launch on May 2nd at 12pm ET/9am PT.
The virtual event will showcase talks and demos from Sigma's CEO, co-founders, and product managers about what's next in the future of analytics.
Don't miss out. See how Sigma is reinventing BI.
👋 Hello,
Welcome to BI-Pro #52: Your Premier Destination for Data and BI Insights! 🌟
In This Edition:
🔮 Data Viz with Python Libraries
Exploring causality with Python.
Meet NiceGUI: Your Soon-to-be Favorite Python UI Library.
Feature Engineering with Microsoft Fabric and PySpark.
10 GitHub Repositories to Master Python.
🔌 Power BI
On-premises data gateway April 2024 release.
Copilot in Power BI expansion.
🛠️ Microsoft Fabric
Introducing Optimistic Job Admission for Fabric Spark.
Introducing Job Queueing for Notebook in Microsoft Fabric.
☁️ AWS BI
Meet Amazon QuickSight expert Sanjeeb Mohapatra.
Handle tables without primary keys for Amazon Aurora MySQL and Amazon RDS for MySQL.
Power analytics with Amazon Redshift.
🌐 Google Cloud Data
Gemini 1.0 Pro Vision in BigQuery.
Gemini in Looker AI-powered BI.
Memorystore for Redis Cluster updates.
📊Tableau
Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools.
Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever.
✨ Expert Insights from Packt Community
ChatGPT for Cybersecurity Cookbook by Clint Bodungen.
💡 What's the Latest Scoop from the BI Community?
Geospatial Data Analysis with Geemap.
Microsoft Fabric Table Maintenance - Checkpoint and Statistics.
Identifying Customer Buying Pattern in Power BI - Part 1.
Full vs. Incremental Loads – Data Engineering with Fabric.
Joining Queries in Azure Data Factory on Cosmos DB Sources.
Feature Engineering with Microsoft Fabric and Dataflow Gen2.
Stay ahead in the ever-evolving landscape of business intelligence with BI-Pro. Unleash the full potential of your data today!
📥 Feedback on the Weekly Edition
Take our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."
📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.
We appreciate your input and hope you enjoy the book!
Share your thoughts and opinions here!
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt
Sign Up | Advertise | Archives
🐾 altair - Vega-Altair is a Python library for statistical visualization, offering simplicity, friendliness, and consistency for creating beautiful and effective visualizations.
🐾 bokeh - Bokeh is a Python library for creating interactive plots and data applications in web browsers, offering elegant and versatile graphics.
🐾 bqplot - bqplot is a 2-D visualization system for Jupyter, based on the Grammar of Graphics, enabling interactive plots with other Jupyter widgets.
🐾 cartopy - Cartopy simplifies map drawing in Python, offering easy projection definitions, point transformations, and integration with Matplotlib for advanced mapping.
🐾 diagrams - Diagrams simplifies cloud system architecture design in Python, supporting major providers and frameworks, allowing prototyping and visualization of existing architectures.
Email Forwarded? Join BI-Pro Here!
🐍 Exploring causality with Python. Difference-in-differences: The series dives into causal inference, crucial in modern analytics, explaining tools like difference-in-differences. It explores how events impact outcomes, using examples such as minimum wage effects on employment. The setup involves treatment and control groups to establish cause-and-effect relationships in diverse real-world scenarios.
🐍 Meet the NiceGUI: Your Soon-to-be Favorite Python UI Library. NiceGUI is a Python UI framework for web and desktop apps, offering a simple interface for small projects, dashboards, and robotics. It simplifies state management and interaction, boasting features like easy layout, visualization tools, and integration with popular libraries.
🐍 Feature Engineering with Microsoft Fabric and PySpark: The post delves into feature engineering in Microsoft Fabric, emphasizing its importance in ML development. It explores PySpark's role in handling large datasets and provides a basic overview and example of using PySpark for feature engineering.
🐍 10 GitHub Repositories to Master Python: The blog explores 10 essential GitHub repositories for mastering Python, emphasizing hands-on experience and real-world projects to enhance skills. It covers a range of topics, from beginner to advanced, including machine learning, web development, and data analysis.
zhiwehu/Python-programming-exercises
practical-tutorials/project-based-learning
avinashkranjan/Amazing-Python-Scripts
Power BI
📊 On-premises data gateway April 2024 release: This update to the on-premises data gateway aligns it with the April 2024 release of Power BI Desktop, ensuring consistency in query execution. Additionally, the gateway now supports refreshes longer than one hour, allowing tokens to be refreshed mid-stream for continuous operation.
📊 Copilot in Power BI: Soon available to more users in your organization. The update introduces changes to Copilot in Power BI, including enabling Copilot by default for all tenants starting May 20th, 2024. It also addresses features reported by customers and community, updates abuse monitoring to not store prompts, and improves geo mapping for EU data boundary customers.
Microsoft Fabric
📊 Introducing Optimistic Job Admission for Fabric Spark: The post introduces Optimistic Job Admission for Spark in Microsoft Fabric, a new feature aimed at improving concurrency and job admission experience. It explains how this feature optimizes resource allocation and increases the number of concurrent jobs that can be admitted to the cluster.
📊 Introducing Job Queueing for Notebook in Microsoft Fabric: Microsoft Fabric introduces Job Queueing for Notebook Jobs to streamline data engineering and data science processes. This feature automatically queues notebook jobs when Fabric capacity is maxed out, eliminating manual retries and improving user experience. Jobs are retried when resources become available, enhancing efficiency for enterprise users.
AWS BI
📊 Meet one of Amazon QuickSight’s Top Community Experts: Sanjeeb Mohapatra. The Amazon QuickSight Community, launched in 2022, is a hub for BI authors and developers to collaborate, ask and answer questions, and learn about QuickSight. Sanjeeb Mohapatra, the top Community Expert for 2023, exemplifies the community's spirit by providing over 1,700 replies and 235 solutions in one year.
📊 Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift: AWS is advancing its zero-ETL vision with Amazon Aurora zero-ETL integration to Amazon Redshift, combining transactional data with analytics capabilities. This integration, along with four new ones announced at re:Invent 2023, empowers customers to implement near real-time analytics for various use cases.
📊 Power analytics as a service capabilities using Amazon Redshift: Analytics as a service (AaaS) leverages cloud-based analytic capabilities to enable cost-effective, scalable solutions for organizations. Amazon Redshift, a cloud data warehouse service, facilitates real-time insights and predictive analytics, empowering AaaS providers to embed rich data analytics capabilities. Delivery models include managed, bring-your-own-Redshift (BYOR), and hybrid options, offering flexibility to meet customer needs.
Google Cloud Data
📊 How to use Gemini 1.0 Pro Vision in BigQuery? BigQuery integrates with Vertex AI to leverage Gemini 1.0 Pro, PaLM, Vision AI, Speech AI, Doc AI, Natural Language AI, enabling analysis of unstructured data like images, audio, and documents. New integrations support multimodal generative AI, enhancing capabilities for object recognition, info seeking, captioning, digital content understanding, and structured content generation, allowing structured data output for deeper analysis.
📊 Get to know BigQuery data canvas: BigQuery Data Canvas simplifies the data-to-insights journey by offering a natural language-driven experience. It centralizes data tasks, accelerates analysis, and fosters collaboration, all within a unified workspace, enabling faster and more efficient data analytics.
📊 Gemini in Looker to bring intelligent AI-powered BI to everyone: Gemini in Looker introduces Conversational Analytics, transforming how businesses engage with data. It offers a natural language-driven experience, simplifying data analytics and fostering collaboration, all within a unified workspace.
📊 Memorystore for Redis Cluster updates at Next ‘24: The article elaborates on the rapid adoption and recent enhancements of Google Cloud's Memorystore for Redis Cluster. It features customer testimonials from companies like Statsig, Character.AI, and AXON Networks, showcasing the service's performance, scalability, and cost-effectiveness. It also highlights new features such as data persistence, new node types, and ultra-fast vector search.
📊 Firestore launches at Next ‘24: Firestore is beloved by developers for its speed in app development. Updates include improved developer productivity, AI-enabled app building, richer queries, and enterprise-level scalability. Gemini Code Assist now supports Firestore, allowing natural language queries and data model definitions, enhancing the development experience. Firestore also supports AI applications and integrations with LangChain and LlamaIndex for generative AI.
Tableau
📊 Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools. The comparison delves into the unique strengths of Tableau and Power BI, showcasing how each excels in different areas of data visualization and analytics. It outlines Tableau's robust visualizations and analytics capabilities, especially for large datasets, contrasting with Power BI's integration with Microsoft services and affordability for small to medium-sized businesses.
📊 Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever: Salesforce is reportedly in advanced talks to acquire Informatica, a data-management software provider, for $11 billion. This aligns with Salesforce's strategy to expand beyond CRM, bolstered by recent AI advancements like Einstein Copilot, complementing Informatica's data integration expertise and potential synergy with Tableau and MuleSoft. Additionally, it aligns with Salesforce's strategy to expand beyond CRM and become a comprehensive data journey platform.
ChatGPT for Cybersecurity Cookbook - By Clint Bodungen
Sending API Requests and Handling Responses with Python
In this recipe, we will explore how to send requests to the OpenAI GPT API and handle the responses using Python. We’ll walk through the process of constructing API requests, sending them, and processing the responses using the openai module.
Getting ready
Ensure you have Python installed on your system.
Install the OpenAI Python module by running the following command in your Terminal or command prompt:
pip install openai
How to do it…
The importance of using the API lies in its ability to communicate with and get valuable insights from ChatGPT in real time. By sending API requests and handling responses, you can harness the power of GPT to answer questions, generate content, or solve problems in a dynamic and customizable way. In the following steps, we’ll demonstrate how to construct API requests, send them, and process the responses, enabling you to effectively integrate ChatGPT into your projects or applications:
Start by importing the required modules:
import openai
from openai import OpenAI
import os
Set up your API key by retrieving it from an environment variable, as we did in the Setting the OpenAI API key as an Environment Variable recipe:
openai.api_key = os.getenv("OPENAI_API_KEY")
Define a function to send a prompt to the OpenAI API and receive a response:
client = OpenAI()
def get_chat_gpt_response(prompt):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
max_tokens=2048,
temperature=0.7
)
return response.choices[0].message.content.strip()
Call the function with a prompt to send a request and receive a response:
prompt = "Explain the difference between symmetric and asymmetric encryption."
response_text = get_chat_gpt_response(prompt)
print(response_text)
How it works…
First, we import the required modules. The openai module is the OpenAI API library, and the os module helps us retrieve the API key from an environment variable.
We set up the API key by retrieving it from an environment variable using the os module.
Next, we define a function called get_chat_gpt_response() that takes a single argument: the prompt. This function sends a request to the OpenAI API using the openai.Completion.create() method. This method has several parameters:
engine: Here, we specify the engine (in this case, chat-3.5-turbo).
prompt: The input text for the model to generate a response.
max_tokens: The maximum number of tokens in the generated response. A token can be as short as one character or as long as one word.
n: The number of generated responses you want to receive from the model. In this case, we’ve set it to 1 to receive a single response.
stop: A sequence of tokens that, if encountered by the model, will stop the generation process. This can be useful for limiting the response’s length or stopping at specific points, such as the end of a sentence or paragraph.
temperature: A value that controls the randomness of the generated response. A higher temperature (for example, 1.0) will result in more random responses, while a lower temperature (for example, 0.1) will make the responses more focused and deterministic.
Discover more insights from ChatGPT for Cybersecurity Cookbook - By Clint Bodungen. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!
🧠 Geospatial Data Analysis with Geemap: This article introduces geospatial data analysis, focusing on raster data from Google Earth Engine, accessed and analyzed using the Geemap Python library. Earth Engine offers a vast catalog of geospatial datasets, and Geemap simplifies access and analysis, making it easier to work with such data in Python.
🧠 Microsoft Fabric Table Maintenance - Checkpoint and Statistics: This article discusses the maintenance requirements for warehouse tables in Microsoft Fabric, particularly focusing on tasks like updating statistics, removing fragmentation, and managing log files. While some maintenance tasks, such as data compaction and log file checkpointing, are automated, others, like managing statistics, may require manual intervention.
🧠 Identifying Customer Buying Pattern in Power BI - Part 1: This article is part 1 of a retail analytics analysis in Power BI, focusing on customer purchasing frequency for various products over the years. It includes identifying data elements, creating calculated columns, and analyzing trends to aid in business decision-making.
🧠 Full vs. Incremental Loads – Data Engineering with Fabric: This article discusses using Apache Spark in Microsoft Fabric to achieve data quality zones (bronze and silver) in a data lake. It explores loading weather data, transforming it with Spark SQL and DataFrames, and implementing full and incremental load patterns.
🧠 Joining Queries in Azure Data Factory on Cosmos DB Sources: This article provides a detailed guide on joining two queries in Azure Data Factory (ADF). It covers prerequisites, creation of data sources, defining queries for each dataset, and using the "Join" transformation in ADF to merge data. Different join types such as inner, left outer, right outer, and full outer joins are explained.
🧠 Feature Engineering with Microsoft Fabric and Dataflow Gen2: This article introduces Dataflow Gen2 as a low-code data transformation and integration engine for creating data pipelines in Microsoft Fabric. It focuses on using Dataflow Gen2 to create features needed for training a machine learning model with college basketball game data, offering different approaches from no code to all code.
See you next time!