Why Microsoft Azure?
Microsoft Azure is an enterprise-grade set of cloud computing services created by Microsoft using their own managed data centers. Azure is the only cloud with a true end-to-end analytics solution. With Azure, analysts can derive insights in seconds from all enterprise data. Azure provides a mature and robust data flow without limitations on concurrency.
Azure supports Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and SaaS. Many government institutions across the world, as well as 95% of Fortune 500 companies, use Azure, ranging from industries such as healthcare and financial services to retail and manufacturing.
Microsoft is a technology conglomerate that has empowered many people to achieve more with less for decades with their software, tools, and platforms. Azure provides flexibility. Familiar Microsoft tools and infrastructures (such as SQL Server, Windows Server, Internet Information Services (IIS), and .NET) or tools such as MySQL, Linux, PHP, Python, Java, or any other open source technologies can all run on the Azure cloud. Gone are the days when you could only work on a walled-garden set of tools and technologies.
Azure provides you with various products and services, depending on your needs. You have the option of doing everything in a bespoke way, from managing your IaaS by spinning up Windows Server virtual machines with Enterprise SQL Server installed, to using a managed PaaS offering such as Azure Synapse Analytics.
Figure 1.2 shows the wide range of data-specific Azure tools and services that can be used to create end-to-end data pipelines:
Figure 1.2: Microsoft Azure data-related services
Azure grants you the flexibility to choose the best approach to solve a problem for yourself, rather than being forced to bend a less adaptable product to perform an unnatural function. You're not just limited to SQL Server, either. You also have the flexibility to choose other types of databases or storage, whether through a service installed on a Linux server or containerized solution, or a managed platform (such as Azure Cosmos DB for your Cassandra and MongoDB instances). This is very important because, in the real world, different scenarios require different solutions, tools, and products.
Microsoft Azure provides you with an end-to-end platform, from Azure Active Directory for managing your user identity and access to Azure IoT offerings (such as IoT Hub) for gathering data from hundreds and thousands of IoT devices. It also provides services such as development tools and cloud hosting options for getting your developers up to speed, as well as various analytics and machine learning tools that enable data scientists, data engineers, and data analysts to be more productive (more on this in Chapter 3, Processing and visualizing data).
The full spectrum of Azure services is too wide to cover here, so instead, this book will focus on the key data warehousing and business intelligence suite of products: Azure Data Lake, Azure Synapse Analytics, Power BI, and Azure Machine Learning.
Security
Microsoft views security as the top priority. When it comes to data, privacy and security are non-negotiable; there will always be threats. Azure has the most advanced security and privacy features in the analytics space. Azure services support data protection through Virtual Networks (VNets) so that, even though they are in the cloud, data points cannot be accessed by the public internet. Only the users in the same VNet can communicate with each other. For web applications, you get a Web Application Firewall (WAF) provided by Azure Application Gateway, which ensures that only valid requests can get into your network.
With role-based access control (authorization), you can ensure that only those with the right roles, such as administrators, have access to specific components and the capabilities of different resources. Authentication, on the other hand, ensures that if you don't have the right credentials (such as passwords), you will not be able to access a resource. Authorization and authentication are built into various services and components of Microsoft Azure with the help of Azure Active Directory.
Azure also provides a service called Azure Key Vault. Key Vault allows you to safely store and manage secrets and passwords, create encryption keys, and manage certificates so that applications do not have direct access to private keys. By following this pattern with Key Vault, you do not have to hardcode your secrets and passwords in your source code and script repository.
Azure Synapse Analytics uses ML and AI to protect your data. In Azure SQL, Microsoft provides advanced data security to ensure that your data is protected. This includes understanding if your database has vulnerabilities, such as port numbers that are publicly available. These capabilities also allow you to be more compliant with various standards, such as General Data Protection Regulation (GDPR), by ensuring that customer data that are considered sensitive are classified. Azure SQL also recently announced their new features, row-level security (RLS) and column-level security (CLS), to control access to rows and columns in a database table, based on the user characteristics.
Microsoft invests at least $1 billion each year in the cybersecurity space, including the Azure platform. Azure holds various credentials and awards from independent assessment bodies, which means that you can trust Azure in all security aspects, from physical security (such that no unauthorized users can get physical access to data centers) to application-level security.
These are a few security features that you need to consider if you are maintaining your own data center.
Cloud scale
Azure changed the industry by making data analytics cost-efficient. Before the mass adoption of cloud computing, in order to plan for data analytics with terabytes, or even petabytes, of data, you needed to properly plan things and ensure that you had the capital expenditure to do it. This would mean very high upfront infrastructure and professional services costs, just to get started. But with Azure, you can start small (many of the services have free tiers). You can scale your cloud resources effortlessly up or down, in or out, in minutes. Azure has democratized scaling capability by making it economically viable and accessible for everyone, as shown in Figure 1.3:
Figure 1.3: Microsoft Azure regions
Microsoft Azure currently has over 60 data center regions supporting over 140 countries. Some enterprises and business industries require that your data is hosted within the same country as business operations. With the availability of different data centers worldwide, it is easy for you to expand to other regions. This multi-region approach is also beneficial in terms of making your applications highly available.
The true power of the cloud is its elasticity. This allows you to not only scale resources up but also scale them down when necessary. In data science, this is very useful because data science entails variable workloads. When data scientists and engineers are analyzing a dataset, for instance, there is a need for more computation. Azure, through services such as Azure Machine Learning (more on this in Chapter 3, Processing and visualizing data), allows you to scale according to demand. Then, during off-peak times (such as weekends, and 7 PM to 7 AM on weekdays), when the scientists and engineers don't need the processing power to analyze data, you can scale down your resources so that you don't have to pay for running resources 24/7. Azure basically offers a pay-as-you-go or pay-for-what-you-use service.
Azure also provides a Service Level Agreement (SLA) for their services as their commitments to ensure uptime and connectivity for their production customers. If downtime or an incident occurs, they will apply service credits (rebates) to the resources that were affected. This will give you peace of mind as your application will always be available with a minimal amount of downtime.
There are different scaling approaches and patterns that Microsoft Azure provides:
- Vertical scaling: This is when more resources are added to the same instance (server or service). An example of this is when a virtual machine is scaled up from 4 GB of RAM to 16 GB of RAM. This is a simple and straightforward approach to take when your application needs to scale. However, there is a technical maximum limit on how much an instance can be scaled up, and it is the most expensive scaling approach.
- Horizontal scaling: This is when you deploy your application to multiple instances. This would logically mean that you can scale your application infinitely because you don't use a single machine to perform your operations. This flexibility also introduces some complexities. These complexities are usually addressed by using various patterns and different orchestration technologies, such as Docker and Kubernetes.
- Geographical scaling: This is when you scale your applications to different geographical locations for two major reasons: resilience and reduced latency. Resilience allows your application to freely operate in that region without all resources being connected to a master region. Reduced latency would mean users of that region can get their web requests faster because of their proximity to the data center.
- Sharding: This is one of the techniques for distributing huge volumes of related, structured data onto multiple independent databases.
- Development, Testing, Acceptance, and Production (DTAP): This is the approach of having multiple instances living in different logical environments. This is usually done to separate development and test servers from staging and production servers. Azure DevTest Labs offers a development and testing environment that can be configured with group policies.
Another advantage of your business being in the cloud is the availability of your services. With Azure, it is easier to make your infrastructure and resources geo-redundant—that is, available to multiple regions and data centers across the world. Say you want to expand your business from Australia to Canada. You can achieve that by making your SQL Server geo-redundant so that Canadian users do not need to query against the application and database instance in Australia.
Azure, despite being a collective suite of products and service offerings, does not force you to go "all in." This means that you can start by implementing a hybrid architecture of combined on-premises data centers and cloud (Azure). There are different approaches and technologies involved in a hybrid solution, such as using Virtual Private Networks (VPNs) and Azure ExpressRoute, if you need dedicated access.
With Azure Synapse Analytics, through data integrations, Azure allows you to get a snapshot of data sources from your on-premises SQL Server. The same concept applies when you have other data sources from other cloud providers or SaaS products; you have the flexibility to get a copy of that data to your Azure data lake. This flexibility is highly convenient because it does not put you in a vendor lock-in position where you need to do a full migration.