What do we expect from a data analyst?
Every company is looking for different insights and is working with different types and sets of data. Even though you will find data analysts across several organizations, their actual day-to-day work can greatly differ. When reading this book, you will pick up whatever is useful to you, and you will probably skip over irrelevant parts. Nevertheless, it is good to go over what we expect you to know and be familiar with.
First, we will discuss what it means to be a data analyst, the assumptions we make, and why we chose to use this job title. Next, we will go over what you should already know about Power BI and where you can find this information if you feel you need to brush up on that knowledge.
What is a data analyst?
You could call yourself a business intelligence engineer, a business intelligence specialist, a database administrator, or simply a data analyst. Whatever your job title is, you picked up this book because you work with Power BI and want to learn more about it. With all these different titles nowadays, it becomes challenging to understand what your base knowledge should be. For simplicity and consistency, we will refer to a person working with Power BI as a data analyst.
Why a data analyst? Because in this book, we will assume that you are familiar with working with data in Power BI and are able to do the following:
- Prepare data
- Model data (creating a data model in Power BI, not a machine learning model)
- Visualize data
- Analyze data
- Deploy and maintain Power BI deliverables
On the other hand, we'll assume you're unfamiliar with the field of data science. We'll approach all AI features in Power BI from the point of view of this persona. In this book, we'll introduce the machine learning models behind the AI features, to make sure enough is understood to use the features correctly. However, we won't go into the full complexities of all models, as this is not a book targeted at the data scientist, who already has a lot of knowledge about mathematical and statistical methods used in AI.
There are two main skills important for when you are venturing into AI in Power BI: connecting to and visualizing data. Let's elaborate a little bit more on these two topics so that you know what is expected of you before you continue.
Connecting to data
So, assuming we are all data analysts, let's look at our core tasks. The very first thing we need to do to work with data, is get access to data. From a technical perspective, we can very easily connect Power BI to various data sources, whether we have data stored in cloud databases, Azure or otherwise, or whether we have local files we want to connect to. Power BI will allow us to do so and will even allow us to schedule an automatic refresh to visualize new data as long as we set up a gateway connection between the network within which the data resides and the Power BI service.
What kind of data can you work with? Any kind! You can connect to structured data, formatted nicely in tables, semi-structured data, often in the form of JSON, or even unstructured data to insert images into your Power BI reports. This also means data can come from a variety of sources. You can collect Twitter data (semi-structured) which contains tweet text, date of creation, number of retweets, likes, and hashtags used. You can collect sales and marketing data to understand which products you have sold, when you sold them, and which ad campaigns you were running that may have had an effect on your sales. Or maybe you are looking at the supply and demand for your warehouses and stores to make sure you plan the logistics of stocking your stores accordingly.
Since data can be generated by so many different sources, and can come in so many different formats, we also want to think about how we extract that data and get it ready to build reports on. Power BI has a lot of standard connectors to allow you to connect to data. A best practice here, however, is that you have a pipeline handling data orchestration before you even connect Power BI to it. Such a process is often called an ETL (Extract-Transform-Load) or ELT (Extract-Load-Transform) pipeline in which we connect to our sources generating data, extract the data, load it into a database, and transform it if necessary. Although similar tasks can be done by Power BI, we prefer working with ETL tools such as Azure Data Factory to handle these kind of orchestration pipelines when we are working with large amounts of data.
ETL or ELT?
ETL is an established and commonly used approach to extract data from sources. The purpose is often to transform it and load it into a structured database such as an Azure SQL Database and fit the data nicely into columns and rows. This is very well suited for transactional data, where we want to get quick results. However, with the emergence of cloud, the newer approach, ETL, is getting more traction. By extracting and loading the data into a data storage that can store unstructured data, such as an Azure Storage Account or Azure Data Lake, we can dump data in the cloud without having to worry about schemas. This also makes it easier to reuse the same data and transform it in different ways, depending on the insights you want to extract.
Long story short, a lot can and probably should already happen to your data before you even open Power BI. Be aware of the process that may have occurred before data enters Power BI. Whatever source we use for Power BI will influence the options we have within Power BI, as well as the performance of the reports we create. In this book, we will work mostly with preprocessed data, available through files stored on public websites or cloud databases. For some projects with AI, however, we will need to process data before bringing it into Power BI to make sure we can work with it.
Visualizing data
Lastly, the reason we are using Power BI is because we want to tell a story with our data. We want to convert data into valuable and intuitive insights that everyone within our organization can read for their own use. In the context of AI, this may be one of the most important skills of a data analyst, as will be explained in later sections.
So, what do we mean by telling a story? Humans are much more susceptible to stories than listening to data. If we say there is a 70% chance it is going to rain, do you bring an umbrella? It's hard to decide. If someone tells you to bring an umbrella, you will probably do it. Even though we want to make more data-driven decisions, humans are not naturally driven by data. We are driven by stories, which we find more intuitive. That also means that we cannot just give AI insights to humans. We need to translate the output of an AI model to make sure it is understandable for people. That means we need to use the data to tell stories.
The way we can do it within Power BI is by making use of the visuals it offers. We have the option to use standard visuals, import visuals from the marketplace, or create our own visuals using Python or R. Understanding when to use what and how to combine different visuals into a report is an important skill of a data analyst, and a skill we assume you have when reading this book. Throughout the book, remember that to gain people's trust, we need to speak their language and not just throw numbers at them to make them change their behavior.
In this book, we will focus on how we can use Power BI to perform AI. That means that you, as a data analyst, are already familiar with the different types of data you can get into Power BI from the different sources available to you. You should already be familiar with how to create visuals to form a report in Power BI. In later chapters, we will highlight features that are relevant when preparing your data for AI and to actually implement AI. But first, let's talk more about what AI is.