How-To Tutorials

article-image-managing-ai-security-risks-with-zero-trust-a-strategic-guide

29 Nov 2024

15 min read

Managing AI Security Risks with Zero Trust: A Strategic Guide

29 Nov 2024

This article is an excerpt from the book, "Zero Trust Overview and Playbook Introduction", by Mark Simos, Nikhil Kumar. Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.IntroductionIn today’s rapidly evolving technological landscape, artificial intelligence (AI) is both a powerful tool and a significant security risk. Traditional security models focused on static perimeters are no longer sufficient to address AI-driven threats. A Zero Trust approach offers the agility and comprehensive safeguards needed to manage the unique and dynamic security risks associated with AI. This article explores how Zero Trust principles can be applied to mitigate AI risks and outlines the key priorities for effectively integrating AI into organizational security strategies.How can Zero Trust help manage AI security risk?A Zero Trust approach is required to effectively manage security risks related to AI. Classic network perimeter-centric approaches are built on more than 20-year-old assumptions of a static technology environment and are not agile enough to keep up with the rapidly evolving security requirements of AI.The following key elements of Zero Trust security enable you to manage AI risk:Data centricity: AI has dramatically elevated the importance of data security and AI requires a data-centric approach that can secure data throughout its life cycle in any location.Zero Trust provides this data-centric approach and the playbooks in this series guide the roles in your organizations through this implementation.Coordinated management of continuous dynamic risk: Like modern cybersecurity attacks, AI continuously disrupts core assumptions of business, technical, and security processes. This requires coordinated management of a complex and continuously changing security risk.Zero Trust solves this kind of problem using agile security strategies, policies, and architecture to manage the continuous changes to risks, tooling, processes, skills, and more. The playbooks in this series will help you make AI risk mitigation real by providing specific guidance on AI security risks for all impacted roles in the organization. Let’s take a look at which specific elements of Zero Trust are most important to managing AI risk.Zero Trust – the top four priorities for managing AI riskManaging AI risk requires prioritizing a few key areas of Zero Trust to address specific unique aspects of AI. The role of specific guidance in each playbook provides more detail on how each role will incorporate AI considerations into their daily work.These priorities follow the simple themes of learn it, use it, protect against it, and work as a team. This is similar to a rational approach for any major disruptive change to any other type of competition or conflict (a military organization learning about a new weapon, professional sports players learning about a new type of equipment or rule change, and so on).The top four priorities for managing AI risk are as follows:1. Learn it – educate everyone and set realistic expectations: The AI capabilities available today are very powerful, affect everyone, and are very different than what people expect them to be. It’s critical to educate every role in the organization, from board members and CEOs to individual contributors, as they all must understand what AI is, what AI really can and cannot do, as well as the AI usage policy and guidelines. Without this, people’s expectations may be wildly inaccurate and lead to highly impactful mistakes that could have easily been avoided.Education and expectation management is particularly urgent for AI because of these factors:Active use in attacks: Attackers are already using AI to impersonate voices, email writing styles, and more.Active use in business processes: AI is freely available for anyone to use. Job seekers are already submitting AI-generated resumes for your jobs that use your posted job descriptions, people are using public AI services to perform job tasks (and potentially disclosing sensitive information), and much more.Realism: The results are very realistic and convincing, especially if you don’t know how good AI is at creating fake images, videos, and text.How can Zero Trust help manage AI security risk?Confusion: Many people don’t have a good frame of reference for it because of the way AI has been portrayed in popular culture (which is very different from the current reality of AI).2. Use it – integrate AI into security: Immediately begin evaluating and integrating AI into your security tooling and processes to take advantage of their increased effectiveness and efficiency. This will allow you to quickly take advantage of this powerful technology to better manage security risk. AI will impact nearly every part of security, including the following:Security risk discovery, assessment, and management processesThreat detection and incident response processesArchitecture and engineering security defensesIntegrating security into the design and operation of systems…and many more3. Protect against it – update the security strategy, policy, and controls: Organizations must urgently update their strategy, policy, architecture, controls, and processes to account for the use of AI technology (by business units, technology teams, security teams, attackers, and more). This helps enable the organization to take full advantage of AI technology while minimizing security risk.The key focus areas should include the following:Plan for attacker use of AI: One of the first impacts most organizations will experience is rapid adoption by attackers to trick your people. Attackers are using AI to get an advantage on target organizations like yours, so you must update your security strategy, threat models, architectures, user education, and more to defend against attackers using AI or targeting you for your data. This should change the organization’s expectations and assumptions for the following aspects:Attacker techniques: Most attackers will experiment with and integrate AI capabilities into their attacks, such as imitating the voices of your colleagues on phone calls, imitating writing styles in phishing emails, creating convincing fake social media pictures and profiles, creating convincing fake company logos and profiles, and more.Attacker objectives: Attackers will target your data, AI systems, and other related assets because of their high value (directly to the attacker and/or to sell it to others). Your human-generated data is a prized high-value asset for training and grounding AI models and your innovative use of AI may be potentially valuable intellectual property, and more.Secure the organization’s AI usage: The organization must update its security strategy, plans, architecture, processes, and tooling to do the following:Secure usage of external AI: Establish clear policies and supporting processes and technology for using external AI systems safelySecure the organization’s AI and related systems: Protect the organization’s AI and related systems against attackersIn addition to protecting against traditional security attacks, the organization will also need to defend against AI-specific attack techniques that can extract source data, make the model generate unsafe or unintended results, steal the design of the AI model itself, and more. The playbooks include more details for each role to help them manage their part of this risk.Take a holistic approach: It’s important to secure the full life cycle and dependencies of the AI model, including the model itself, the data sources used by the model, the application that uses the model, the infrastructure it’s hosted on, third-party operators such as AI platforms, and other integrated components. This should also take a holistic view of the security life cycle to consider identification, protection, detection, response, recovery, and governance.Update acquisition and approval processes: This must be done quickly to ensure new AI technology (and other technology) meets the security, privacy, and ethical practices of the organization. This helps avoid extremely damaging avoidable problems such as transferring ownership of the organization’s data to vendors and other parties. You don’t want other organizations to grow and capture market share from you by using your data. You also want to avoid expensive privacy incidents and security incidents from attackers using your data against you.This should include supply chain risk considerations to mitigate both direct suppliers and Nth party risk (components of direct suppliers that have been sourced from other organizations). Finding and fixing problems later in the process is much more difficult and expensive than correcting them before or during acquisition, so it is critical to introduce these risk mitigations early.4. Work as a team – establish a coordinated AI approach: Set up an internal collaboration community or a formal Center of Excellence (CoE) team to ensure insights, learning, and best practices are being shared rapidly across teams. AI is a fast-moving space and will drive rapid continuous changes across business, technology, and security teams. You must have mechanisms in place to coordinate and collaborate across these different teams in your organization.How will AI impact Zero Trust?Each playbook describes the specific AI impacts and responsibilities for each affected role.AI shared responsibility model: Most AI technology will be a partnership with AI providers, so managing AI and AI security risk will follow a shared responsibility model between you and your AI providers. Some elements of AI security will be handled by the AI provider and some will be the responsibility of your organization (their customer).This is very similar to how cloud responsibility is managed today (and many AI providers are also cloud providers). This is also similar to a business that outsources some or all of its manufacturing, logistics, sales (for example, channel sales), or other business functions.Now, let’s take a look at how AI impacts Zero Trust.How will AI impact Zero Trust?AI will accelerate many aspects of Zero Trust because it dramatically improves the security tooling and people’s ability to use it. AI promises to reduce the burden and effort for important but tedious security tasks such as the following:Helping security analysts quickly query many data sources (without becoming an expert in query languages or tool interfaces)Helping writing incident response reportsIdentifying common follow-up actions to prevent repeat incidentSimplifying the interface between people and the complex systems they need to use for security will enable people with a broad range of skills to be more productive. Highly skilled people will be able to do more of what they are best at without repetitive and distracting tasks. People earlier in their careers will be able to quickly become more productive in a role, perform tasks at an expert level more quickly, and help them learn by answering questions and providing explanations.AI will NOT replace the need for security experts, nor the need to modernize security. AI will simplify many security processes and will allow fewer security people to do more, but it won’t replace the need for a security mindset or security expertise.Even with AI technology, people and processes will still be required for the following aspects:Ask the right security questions from AI systemsInterpret the results and evaluate their accuracyTake action on the AI results and coordinate across teamsPerform analysis and tasks that AI systems currently can’t cover:Identify, manage, and measure security risk for the organizationBuild, execute, and monitor a strategy and policyBuild and monitor relationships and processes between teamsIntegrate business, technical, and security capabilitiesEvaluate compliance requirements and ensure the organization is meeting them in good faithEvaluate the security of business and technical processesEvaluate the security posture and prioritize mitigation investmentsEvaluate the effectiveness of security processes, tools, and systemsPlan and implement security for technical systemsPlan and implement security for applications and productsRespond to and recover from attacksIn summary, AI will rapidly transform the attacks you face as well as your organization’s ability to manage security risk effectively. AI will require a Zero Trust approach and it will also help your teams do their jobs faster and more efficiently.The guidance in the Zero Trust Playbook Series will accelerate your ability to manage AI risk by guiding everyone through their part. It will help you rapidly align security to business risks and priorities and enable the security agility you need to effectively manage the changes from AI.Some of the questions that naturally come up are where to start and what to do first.ConclusionAs AI reshapes the cybersecurity landscape, adopting a Zero Trust framework is critical to effectively manage the associated risks. From securing data lifecycles to adapting to dynamic attacker strategies, Zero Trust principles provide the foundation for agile and robust AI risk management. By focusing on education, integration, protection, and collaboration, organizations can harness the benefits of AI while mitigating its risks. The Zero Trust Playbook Series offers practical guidance for all roles, ensuring security remains aligned with business priorities and prepared for the challenges AI introduces. Now is the time to embrace this transformative approach and future-proof your security strategies.Author BioMark Simos helps individuals and organizations meet cybersecurity, cloud, and digital transformation goals. Mark is the Lead Cybersecurity Architect for Microsoft where he leads the development of cybersecurity reference architectures, strategies, prescriptive planning roadmaps, best practices, and other security and Zero Trust guidance. Mark also co-chairs the Zero Trust working group at The Open Group and contributes to open standards and other publications like the Zero Trust Commandments. Mark has presented at numerous conferences including Black Hat, RSA Conference, Gartner Security & Risk Management, Microsoft Ignite and BlueHat, and Financial Executives International.Nikhil Kumar is Founder at ApTSi with prior leadership roles at Price Waterhouse and other firms. He has led setup and implementation of Digital Transformation and enterprise security initiatives (such as PCI Compliance) and built out Security Architectures. An Engineer and Computer Scientist with a passion for biology, Nikhil is an expert in Security, Information, and Computer Architecture. Known for communicating to the board and implementing with engineers and architects, he is an MIT mentor, innovator and pioneer. Nikhil has authored numerous books, standards, and articles, and presented at conferences globally. He co-chairs The Zero Trust Working Group, a global standards initiative led by the Open Group.

0
0
5

article-image-mastering-transfer-learning-fine-tuning-bert-and-vision-transformers

Sinan Ozdemir

27 Nov 2024

15 min read

Mastering Transfer Learning: Fine-Tuning BERT and Vision Transformers

Sinan Ozdemir

27 Nov 2024

15 min read

0
0
534

article-image-supabase-unleashed-advanced-features-for-typescript-frameworks-and-direct-database-connections

David Lorenz

26 Nov 2024

15 min read

Supabase Unleashed: Advanced Features for TypeScript, Frameworks, and Direct Database Connections

David Lorenz

26 Nov 2024

15 min read

This article is an excerpt from the book, "Building Production-Grade Web Applications with Supabase", by David Lorenz. Supabase supercharges web development with scalable backend solutions. With this book, you'll build secure, real-time apps of any size by leveraging Supabase's powerful Row Level Security and eliminating the need for separate backend development.IntroductionSupabase is a powerful platform that integrates Postgres databases with modern developer tools to simplify backend development. While the Supabase client is the recommended approach for interacting with its database, understanding how to establish a direct database connection can expand your options and offer greater flexibility. This section explores the scenarios in which direct access might be necessary, how to configure such connections, and their implications in different project setups. By mastering this complementary skill, you'll unlock additional possibilities for extending and optimizing your applications.Connecting directly to the databaseNote: Building a raw database connection is helpful but complementary knowledge. In this book’s project, we will use the Supabase client and not a direct database connection.At the end of the day, Supabase comes down to just being a Postgres database with additional services surrounding it like a galaxy. Hence, you can also directly access the database. But why would you ever want to do this?When you work with platforms such as Supabase that make your life easier by providing data storage, file storage, authentication, and more, you often don’t get direct access to the underlying database or your access is extremely limited. The reason is that providers of such platforms often want to safeguard you and themselves from scrapping the project in a way that will break it irrevocably.Having no or limited direct access to your database also means that you cannot extend it with additional features or use libraries of any kind that need direct access (such as sequelize, drizzle, or pg_dump). But with Supabase, you can. So, let’s have a look at how we can connect directly.On a supabase.com project, within the Dashboard (Studio) area, you’ll find the database connection URI of the Postgres database in the Project Settings | Database section. In your local instance, the complete connection URI is shown in the Terminal after running npx supabase start or, for a running instance, when calling npx supabase status. It already contains the username and password, separated with a colon (on your local instance, this is usually postgresql://postgres:postgres@localhost:54322/postgres).Then, you can connect to it with whichever tool you like – for example, via GUIs for databases such as DBeaver (https://dbeaver.io/).To test if the connection to the database works, I prefer the psql command-line tool. For my local instance, I can simply use one of the following commands:The most minimal way to test a connection is by calling plsql with the connection string in postgresql://username:password@host:port/postgres format, like so:psql 'postgresql://postgres:postgres@localhost:54322/postgres'For real connections (not just local ones), you should prefer a more verbose form that doesn’t keep the password in cleartext in the Terminal and prompts you for the password:psql -h localhost -p 54322 -d postgres -U postgresThis is equal to the longest form where the parameter meanings become self-explanatory:psql --host=localhost --port=54322 --dbname=postgres --username=postgresWith that, you know how to connect to the database if needed. Please be aware that connecting to the database directly and changing data there can be dangerous if you don’t know what you’re doing as there’s no protection layer in between.Next, you’ll learn what you need to do to get immediate TypeScript support with Supabase.Using Supabase with TypeScriptMany projects nowadays use TypeScript instead of JavaScript. In this book, we’ll focus on using Supabase with JavaScript instead of TypeScript. But still, I want to show you how easily it can be used in combination with the Supabase JavaScript clients, and which benefits it brings.Supabase’s npm library comes with TypeScript support out of the box. However, with TypeScript, Supabase can also tell you that the expected data from your database doesn’t exist or help you find the correct table name for your database via autocompletion in your editor.All you need for this is a specific TypeScript file that is generated specifically for your Supabase project. The following steps show how to trigger the Supabase CLI so that it creates such a supabase.ts file containing the needed types for TypeScript – depending on whether you want the types from a supabase.com project, a local instance, or an instance hosted somewhere else than supabase.com:If you want types for a project based on supabase.com, follow these steps to get a supabase.ts file:I. Go to https://supabase.com/dashboard/account/tokens and create an access token.II. Run npx supabase login. You’ll be asked for the access token you just generated.After pressing Enter, it will tell you that the login process has succeeded.III. Now, open your project via supabase.com; you’ll see a link in your browser that looks like https://supabase.com/dashboard/project/YOUR_PROJECT_ID/.... You’ll also find the same project ID as part of your API URL in the Settings | API section. Copy this project ID.IV. Generate your custom supabase.ts file by running npx supabase gen types typescript --schema public --project-id YOUR_PROJECT_ID > supabase.tsIf you’re running a local instance, which you should have by now, and want to grab the types from there, you don’t need an access key. You only need to run the following command in your project folder (this is where we ran npx supabase init previously in this chapter):npx supabase gen types typescript --schema public --local > supabase.tsNote that if you run it outside of the project folder, it won’t know which local instance you’re referring to and fail.If you have a n instance that’s self-hosted on a remote server or running with a provider other than supabase.com, then the previous steps won’t work and you’ll need the generalized variant of fetching types with a direct database connection. To do that, you must generate the supabase.ts file, as follows:I. Find your database URL (see the Connecting directly to the database section). For example, in your local instance, you’ll find it in the Terminal output after starting Supabase withnpx supabase start. It will be in the following format: postgresql:// USER:PASSWORD@DB_HOST:PORT/postgres.II. Run npx supabase gen types typescript --schema public --db-url postgresql://USER:PASSWORD @DB_HOST:PORT/postgres > supabase. ts. You’ll receive the file.With this supabase.ts file, it’s easy to make your client type-safe and get proper type hints – simply import the Database type from supabase.ts and pass it to the client creation process. For example, if you want to make the createReqResSupabase({req,res}) function type-safe, you just pass the <Database> type when creating the client:import type { Database } from './supabase'; export const getSupabaseReqResClient = ({ req, res }) => { return createServerClient<Database>(...); }; With that, your Supabase client is type-safe. But let’s understand what that means and what it implies. Say, for example, you’re fetching data from a specific table of your database: the Supabase client will exactly know which columns to fetch and provide proper type support for the returned data.But what happens when I change anything in my instance? Won’t it be outdated immediately as my supabase.ts fi le contains outdated types?Let me try to answer this question with another question: How can you use a new feature on your smartphone if the new feature is only available in a newer software version? The simple answer is that you update the software version.The same goes for the Supabase types. Anytime you change something in your Supabase project and it doesn’t give you the proper TypeScript hints, run npx supabase gen types typescript ... again and you’ll be all set.With this, you can use Supabase in a TypeScript-based project. Before finishing up this chapter, we’ll have a look at some samples of how a Supabase client can be used with other frameworks so that you’re familiar with Supabase’s flexibility.Connecting Supabase to other frameworksImagine that you’ve set up an awesome project with Next.js and Supabase. However, one day, you want to add another feature to your project – an extremely fast API that does complex calculations based on data from your Supabase instance. You notice that JavaScript won’t be the best choice and decide to build a small Python server for this feature that can be called from your primary project.This is what I did in one of my projects at Wahnsinn Design GmbH where the web application, with Supabase at its heart, was built with Next.js. However, a new feature was added using another project with Python. Since there is a Python library for Supabase, the connection was seamless.Since Supabase is not framework-dependent, since it’s just REST APIs, the options for integrations are endless, from C#, Swift, and Kotlin, to JavaScript-based frameworks such as Nuxt or refi ne (you’ll find the most recent list at https://supabase.com/docs).Although we will focus on JavaScript with Next.js in this book, you can use most samples, especially in the upcoming chapters, and translate them into other languages or frameworks with ease. This is because using the Supabase client for the different languages will have similar syntax (as far as the language allows).Let’s have a brief look at how to connect Supabase in Nuxt and Python.Nuxt 3Nuxt is the Vue-based full-stack competitor to Next.js. Connecting with Nuxt comes down to installing the @nuxtjs/supabase package – which, again, is just a convenient wrapper for the @supabase/ supabase-js package.Once installed with npm install @nuxtjs/supabase, add the module to your Nuxt configuration, like so:export default defineNuxtConfig({ modules: ['@nuxtjs/supabase'], })Similar to our Next.js application, add the anon key as SUPABASE_KEY and your API URL as SUPABASE_URL to the .env file of your Nuxt project.Now, you can use the client in Vue composables, like so:<script setup lang="ts"> const supabase = useSupabaseClient(); </script>Alternatively, you can use proper TypeScript types, as we’ve already learned, like so:<script setup lang="ts"> import type { Database } from '~/supabase'; const client = useSupabaseClient<Database>(); </script>You can find a detailed explanation of Nuxt 3 at https://supabase.nuxtjs.org/get-started.PythonPython is fast and has become more popular than ever with many AI applications. This is because it is convenient to use for scientific calculations.The Python Supabase package is one of the easiest to use:1. Install the Supabase package and the dotenv package with pip install supabase and pip install python-dotenv, respectively.2. Create a .env file with two lines, one being your SUPABASE_ANON_KEY=... value and the other being your SUPABASE_URL=... value.3. Initialize the Supabase client in a file such as supabase_client.py, as follows:import os from dotenv import load_dotenv from supabase import create_client, Client load_dotenv() supabase_url: str = os.getenv("SUPABASE_URL") supabase_anon_key: str = os.getenv("SUPABASE_ANON_KEY") my_supabase: Client = create_client(supabase_url, supabase_anon_ key) 4. Use it in any file via import:from supabase_client import my_supabase ...You can find the full Python documentation here: https://supabase.com/docs/reference/ python.I’d be lying if I said all frameworks and languages are equal concerning updates and support within the Supabase community. On the web, there is a general trend toward JavaScript-based environments (Vue, Next, React, Nuxt, Remix, Svelte, Deno, you name it) and at the time of writing this book, several client libraries exist, including JavaScript, Flutter, Python, C#, Swift, and Kotlin.However, it is extremely important to keep in mind that Supabase can be used in any framework or language due to its REST-based nature and that Supabase is also very keen on contributions. Lastly, you can always just use the direct database connection – but with that, you’d be bypassing all authentication and permissions.With this at hand, you are well-positioned to tackle any project with Supabase, no matter if you are using a framework-specific client, the RESTful API, or the direct database connection.ConclusionIn this article, we explored the fundamentals of connecting directly to a Supabase database and the practical use cases it enables. While the Supabase client provides a robust and secure interface, direct access empowers you to extend functionality, integrate with various libraries, and handle advanced operations. We also discussed integrating Supabase with TypeScript and other frameworks like Nuxt and Python, demonstrating its versatility across languages and ecosystems. With these tools and insights, you're equipped to harness Supabase's full potential, whether working within its client or venturing into direct database interactions.Author BioDavid Lorenz is a web software architect and lecturer who began programming at age 11. Before completing university in 2014, he had built a CRM system that automated an entire company and worked with numerous agencies through his own company. In 2015, he secured his first employment as a senior web developer, where he played a pioneering role in using cutting-edge technology and was an early adopter of progressive web apps. In 2017, he became the leading frontend architect and team lead for one of the largest projects at Mercedes-Benz.io, involving massive-scale architecture. Today, David provides valuable insights and guidance to clients across various industries, using his extensive experience and exceptional problem-solving abilities.

0
0
1202

article-image-how-to-integrate-ai-into-software-development-teams

Anderson Soares Furtado Oliveira

21 Nov 2024

15 min read

How to Integrate AI into Software Development Teams

Anderson Soares Furtado Oliveira

21 Nov 2024

15 min read

This article is an excerpt from the book, "AI Strategies for Web Development", by Anderson Soares Furtado Oliveira. Embark on an enlightening AI journey by understanding its role and its fundamentals, crafting cutting-edge applications, and navigating ethical challenges. You’ll also explore strategic tools and gain foresight into future trends.IntroductionIntegrating AI into software development teams is no longer a futuristic concept; it is a strategic necessity in today's digital era. AI has the potential to revolutionize software development by optimizing processes, solving complex problems, improving user experience, and driving business value. However, harnessing the power of AI requires more than just adopting new tools—it demands a shift in mindset, processes, skills, and team culture. In this article, we explore actionable strategies for software engineering leaders to successfully integrate AI into their teams, drawing from Gartner’s recommendations and industry best practices. From fostering collaboration and upskilling teams to implementing data pipelines and AI solutions, these steps will help organizations fully leverage AI's transformative potential.How to integrate AI into software development teamsAI is a technology that can transform the way we create and use software applications. It can help us solve complex problems, optimize processes, improve UX, and generate value for businesses. However, for us to fully leverage the potential of AI, it needs to be effectively integrated into software development teams. In this section, we will present some actions that software engineering leaders should consider so that they can achieve this goal, based on Gartner’s recommendations (https://www.gartner. com/en/articles/set-up-now-for-ai-to-augment-software-development).Let’s start:Adopt an AI mindset from the start: The first action is to adopt an AI mindset from the start of the project, encouraging the exploration of AI techniques to improve application development. This means that developers should be open to learning about the possibilities and challenges of AI and seek innovative solutions that use this technology. In addition, leaders should set clear and measurable goals for the use of AI and align expectations with project stakeholders. So, encourage teams to explore AI by initiating projects that directly involve AI technologies. For instance, a development team could be tasked with creating a chatbot to streamline customer service interactions, encouraging them to learn and apply NLP techniques.Provide a framework to identify AI opportunities: The second action is to provide a framework to identify when and where AI can yield better results. This involves analyzing the needs and requirements of the project, and assessing whether AI can offer benefits in terms of quality, efficiency, scalability, security, or other aspects. It is also important to consider the costs and risks associated with implementing AI and compare them with available alternatives. The framework should guide developers in choosing the most suitable AI techniques for each case, such as ML, NLP, and computer vision. Develop a decision matrix to help identify opportunities for AI integration that can enhance project outcomes. This matrix could evaluate factors such as potential improvements in efficiency and quality against the costs and complexity of implementing AI solutions, helping to pinpoint where tools such as ML could be most beneficial.Invest in dedicated AI solutions: The third action is to invest in dedicated AI solutions to support various roles and tasks in software engineering. These solutions can be tools, platforms, services, or libraries that use AI to facilitate or automate activities such as design, coding, testing, debugging, integration, deployment, and monitoring. These solutions can increase the productivity, quality, and creativity of developers, as well as reduce errors and rework. Some examples of AI solutions for software engineering are intelligent assistants, code generators, code analyzers, and automatic testers. For example, implementing platforms such as TensorFlow or PyTorch for ML projects can aid in tasks ranging from predictive analytics to automated testing, thus boosting productivity and reducing the likelihood of errors.Expand the data engineering pipeline: The fourth action is to expand the data engineering pipeline to leverage AI enrichment and enable intelligent applications. Th is means that developers should collect, store, process, analyze, and visualize data efficiently and securely, using AI to extract insights and value from data. In addition, developers should integrate the data with AI models, and use these models to provide intelligent features to applications, such as recommendations, customizations, predictions, and detections. Intelligent applications can improve performance, usability, and end-user satisfaction. By integrating comprehensive data management tools such as Apache Kafka for real-time data streaming and processing, teams can enhance their applications with features such as real-time analytics and dynamic UX customization.Foster collaboration between development and model-building teams: The fifth action is to foster collaboration between development teams and model-building teams to avoid overlapping responsibilities and ensure smooth deployment. This involves creating a culture of collaboration and communication, where both teams understand their roles and responsibilities, and work together to implement AI solutions. This can help avoid conflicts, reduce delays, and ensure that the AI models are correctly integrated into the soft ware applications. Establish regular sync-up meetings between software developers and AI model builders to ensure alignment and seamless integration of AI capabilities into applications. These meetings can help clarify responsibilities, share insights, and quicken the pace of development.Continuously train and upskill the team: The sixth action is to continuously train and upskill the team in AI technologies. This involves providing regular training sessions, workshops, and resources to help developers learn about the latest AI techniques and tools. It also involves creating a learning culture, where developers are encouraged to learn and share their knowledge with others. This can help to build a team of skilled AI practitioners, who can effectively use AI to improve software development. Create ongoing educational programs and provide access to courses from platforms such as Coursera or Udemy that cover advanced AI topics. Encouraging participation in hackathons or internal projects focused on AI can also foster practical experience and innovation.Effectively integrating AI into software development teams is a complex task that requires a strategic and diligent approach. It’s not just about adopting new tools or technologies but transforming the mindset, processes, skills, and culture of the team. To navigate this transformation successfully, a structured checklist can serve as a valuable guide, ensuring that every critical aspect is addressed systematically:1. Assessment and planning:Identify objectives: Define clear objectives for integrating AI into your development processes. Determine what problems you aim to solve or what improvements you want to achieve.Evaluate readiness: Assess your team’s current capabilities, infrastructure, and tools to determine readiness for AI integration. Stakeholder alignment: Ensure all stakeholders understand the benefits and implications of AI integration. Secure their support and alignment with the project goals.2. Data collection and management: Identify data sources: Determine the types of data that will be valuable for AI-driven insights (e.g., source code data, user interaction data, performance data). Set up data pipelines: Implement data pipelines using tools such as Apache Kafka for real-time data collection and streaming. Ensure data quality: Establish processes for data cleaning, normalization, and validation to maintain high data quality.3. Infrastructure and tools:Select AI tools: Choose appropriate AI-powered tools for different stages of the development process, such as GitHub Copilot for code generation, Testim for automated testing, and Dynatrace for performance monitoring.Scalable storage solutions: Implement scalable storage solutions such as Amazon S3 or Google Cloud Storage to handle large volumes of data.Processing frameworks: Utilize data processing frameworks such as Apache Spark or Flink for efficient data processing.4. Model development and integration:Build AI models: Use ML frameworks such as TensorFlow, PyTorch, and scikit-learn to develop AI models that can analyze data and generate insights.Integrate AI models: Integrate AI models into your development environment to provide intelligent features such as code suggestions, anomaly detection, and predictive analytics.5. Testing and validation:Automated testing tools: Implement AI-powered automated testing tools such as Testim to create and maintain test cases, ensuring the software remains robust and error-free.Continuous integration: Set up continuous integration (CI) pipelines to automatically run tests and validate code changes.Performance monitoring: Use tools such as New Relic AI and Dynatrace to monitor application performance and detect issues in real-time.6. Security and compliance:Vulnerability scanning: Use AI-powered security tools such as Snyk and Veracode to identify and fix vulnerabilities in the code. Compliance checks: Ensure that AI models and data processing adhere to relevant regulations and standards, such as General Data Protection Regulation (GDPR).7. Deployment and maintenance:Automated deployment: Set up automated deployment pipelines to streamline the release process.Real-time monitoring: Continuously monitor the application in production using tools such as Amazon CloudWatch and Splunk for anomaly detection.Feedback loop: Establish a feedback loop to collect user feedback and performance data, using this information to continuously improve the AI models and development processes.By following these actions, software engineering leaders can effectively integrate AI into their teams and leverage its potential to create innovative, high-quality, and intelligent software applications. This can lead to significant improvements in productivity, quality, creativity, and user satisfaction, as well as provide a competitive edge in today’s increasingly digital and data-driven market.However, it’s important to remember that AI is just a tool that can help solve problems and generate value. The ultimate success of the project depends on the team’s ability to understand user needs, create effective and innovative solutions, and deliver high-quality software. Therefore, AI should be integrated in a way that supports and enhances these goals, rather than replacing them.ConclusionIntegrating AI into software development teams is a multifaceted process that goes beyond adopting cutting-edge tools. It involves fostering a culture of collaboration, continuous learning, and innovation, as well as ensuring robust data management, security, and compliance frameworks. By following a structured approach—starting with clear objectives and readiness assessments, implementing advanced tools and frameworks, and maintaining continuous validation and feedback loops—software engineering leaders can unlock AI's full potential. This integration will not only enhance productivity and quality but also empower teams to create intelligent, high-performing applications that meet user needs and provide a competitive edge. Ultimately, AI should be a powerful enabler, complementing human creativity and expertise to deliver software solutions that truly excel.Author BioAnderson Soares Furtado Oliveira is an experienced executive, AI strategist, and machine learning engineer specializing in AI governance, risk management, and compliance. As a board member at The Global Center for Risk and Innovation (GCRI) and an AI strategy consultant at G³ AI Global, he co-authored the book PgM Canvas: Transforming Vision into Real Benefits - A Program Management Guide for Leaders and Managers. With over a decade of experience in IT governance (CGEIT) and a focus on integrating AI technologies to drive business growth, he has led numerous AI projects and developed AI governance frameworks. His expertise in digital transformation and national development has equipped him to create innovative solutions and ethical AI applications. Anderson is a PhD student in Computer Science and Computational Mathematics at the University of São Paulo and holds an MBA in Software Engineering Project Management.

0
0
1034

article-image-airflow-ops-best-practices-observation-and-monitoring

Dylan Intorf, Kendrick van Doorn, Dylan Storey

12 Nov 2024

15 min read

Airflow Ops Best Practices: Observation and Monitoring

Dylan Intorf, Kendrick van Doorn, Dylan Storey

12 Nov 2024

15 min read

This article is an excerpt from the book, "Apache Airflow Best Practices", by Dylan Intorf, Kendrick van Doorn, Dylan Storey. With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering.IntroductionIn this article, we will continue to explore the application of modern “ops” practices within Apache Airflow, focusing on the observation and monitoring of your systems and DAGs after they’ve been deployed.We’ll divide this observation into two segments – the core Airflow system and individual DAGs. Each segment will cover specific metrics and measurements you should be monitoring for alerting and potential intervention.When we discuss monitoring in this section, we will consider two types of monitoring – active and suppressive.In an active monitoring scenario, a process will actively check a service’s health state, recording its state and potentially taking action directly on the return value.In a suppressive monitoring scenario, the absence of a state (or state change) is usually meaningful. In these scenarios, the monitored application sends an active schedule to a process to inform it that it is OK, usually suppressing an action (such as an alert) from occurring.This chapter covers the following topics:Monitoring core Airflow componentsMonitoring your DAGsTechnical requirementsBy now, we expect you to have a good understanding of Airflow and its core components, along with functional knowledge in the deployment and operation of Airflow and Airflow DAGs.We will not be covering specific observability aggregators or telemetry tools; instead, we will focus on the activities you should be keeping an eye on. We strongly recommend that you work closely with your ops teams to understand what tools exist in your stack and how to configure them for capture and alerting your deployments.Monitoring core Airflow componentsAll of the components we will discuss here are critical to ensuring a functioning Airflow deployment. Generally, all of them should be monitored with a bare minimum check of Is it on? and if a component is not, an alert should surface to your team for investigation. The easiest way to check this is to query the REST API on the web server at `/health/`; this will return a JSON object that can be parsed to determine whether components are healthy and, if not, when they were last seen.SchedulerThis component needs to be running and working effectively in order for tasks to be scheduled for execution.When the scheduler service is started, it also starts a `/health` endpoint that can be checked by an external process with an active monitoring approach.The returned signal does not always indicate that the scheduler is working properly, as its state is simply indicative that the service is up and running. There are many scenarios where the scheduler may be operating but unable to schedule jobs; as a result, many deployments will include a canary dag to their deployment that has a single task, acting to suppress an external alert from going off.Import metrics that airflow exposes for you include the following:scheduler.scheduler_loop_duration: This should be monitored to ensure that your scheduler is able to loop and schedule tasks for execution. As this metric increases, you will see tasks beginning to schedule more slowly, to the point where you may begin missing SLAs because tasks fail to reach a schedulable state.scheduler.tasks.starving: This indicates how many tasks cannot be scheduled because there are no slots available. Pools are a mechanism that Airflow uses to balance large numbers of submitted task executions versus a finite amount of execution throughput. It is likely that this number will not be zero, but being high for extended periods of time may point to an issue in how DAGs are being written to schedule work.scheduler.tasks.executable: This indicates how many tasks are ready for execution (i.e., queued). This number will sometimes not be zero, and that is OK, but if the number increases and stays high for extended periods of time, it indicates that you may need additional computer resources to handle the load. Look at your executor to increase the number of workers it can run. Metadata databaseThe metadata database is used to store and track all of the metadata for your Airflow deployments’ previous DAG/task executions, along with information about your environment’s roles and permissions. Losing data from this database can interrupt normal operations and cause unintended consequences, with DAG runs being repeated.While critical, because it is architecturally ubiquitous, the database is also least likely to encounter issues, and if it does, they are absolutely catastrophic in nature.We generally suggest you utilize a managed service for provisioning and operating your backing database, ensuring that a disaster recovery plan for your metadata database is in place at all times.Some active areas to monitor on your database include the following:Connection pool size/usage: Monitor both the connection pool size and usage over time to ensure appropriate configuration, and identify potential bottlenecks or resource contention arising from Airflow components’ concurrent connections.Query performance: Measure query latency to detect inefficient queries or performance issues, while monitoring query throughput to ensure effective workload handling by the database.Storage metrics: Monitor the disk space utilization of the metadata database to ensure that it has sufficient storage capacity. Set up alerts for low disk space conditions to prevent database outages due to storage constraints.Backup status: Monitor the status of database backups to ensure that they are performed regularly and successfully. Verify backup integrity and retention policies to mitigate the risk of data loss if there is a database failure.TriggererThe Triggerer instance manages all of the asynchronous operations of deferrable operators in a deferred state. As such, major operational concerns generally relate to ensuring that individual deferred operators don’t cause major blocking calls to the event loop. If this occurs, your deferrable tasks will not be able to check their state changes as frequently, and this will impact scheduling performance.Import metrics that airflow exposes for you include the following:triggers.blocked_main_thread: The number of triggers that have blocked the main thread. This is a counter and should monotonically increase over time; pay attention to large differences between recording (or quick acceleration) counts, as it’s indicative of a larger problem.triggers.running: The number of triggers currently on a triggerer instance. This metric should be monitored to determine whether you need to increase the number of triggerer instances you are running. While the official documentation claims that up to tens of thousands of triggers can be on an instance, the common operational number is much lower. Tune at your discretion, but depending on the complexity of your triggers, you may need to add a new instance for every few hundred consistent triggers you run.Executors/workersDepending on the executor you use, you will need to monitor your executors and workers a bit differently.The Kubernetes executor will utilize the Kubernetes API to schedule tasks for execution; as such, you should utilize the Kubernetes events and metrics servers to gather logs and metrics for your task instances. Common metrics to collect on an individual task are CPU and memory usage. This is crucial for tuning requests or mutating individual task resource requests to ensure that they execute safely.The Celery worker has additional components and long-lived processes that you need to metricize. You should monitor an individual Celery worker’s memory and CPU utilization to ensure that it is not over- or under-provisioned, tuning allocated resources accordingly. You also need to monitor the message broker (usually Redis or RabbitMQ) to ensure that it is appropriately sized. Finally, it is critical to measure the queue length of your message broker and ensure that too much “back pressure” isn’t being created in the system. If you find that your tasks are sitting in a queued state for a long period of time and the queue length is consistently growing, it’s a sign that you should start an additional Celery worker to execute on scheduled tasks. You should also investigate using the native Celery monitoring tool Flower (https://flower.readthedocs.io/en/latest/) for additional, more nuanced methods of monitoring.Web serverThe Airflow web server is the UI for not just your Airflow deployment but also the RESTful interface. Especially if you happen to be controlling Airflow scheduling behavior with API calls, you should keep an eye on the following metrics:Response time: Measure the time taken for the API to respond to requests. This metric indicates the overall performance of the API and can help identify potential bottlenecks.Error rate: Monitor the rate of errors returned by the API, such as 4xx and 5xx HTTP status codes. High error rates may indicate issues with the API implementation or underlying systems.Request rate: Track the rate of incoming requests to the API over time. Sudden spikes or drops in request rates can impact performance and indicate changes in usage patterns.System resource utilization: Monitor resource utilization metrics such as CPU, memory, disk I/O, and network bandwidth on the servers hosting the API. High resource utilization can indicate potential performance bottlenecks or capacity limits.Throughput: Measure the number of successful requests processed by the API per unit of time. Throughput metrics provide insights into the API’s capacity to handle incoming traffic.Now that you have some basic metrics to collect from your core architectural components and can monitor the overall health of an application, we need to monitor the actual DAGs themselves to ensure that they function as intended.Monitoring your DAGsThere are multiple aspects to monitoring your DAGs, and while they’re all valuable, they may not all be necessary. Take care to ensure that your monitoring and alerting stack match your organizational needs with regard to operational parameters for resiliency and, if there is a failure, recovery times. No matter how much or how little you choose to implement, knowing that your DAGs work and if and how they fail is the first step in fixing problems that will arise.LoggingAirflow writes logs for tasks in a hierarchical structure that allows you to see each task’s logs in the Airflow UI. The community also provides a number of providers to utilize other services for backing log storage and retrieval. A complete list of supported providers is available at https://airflow.apache.org/docs/apache-airflow-providers/core-extensions/logging.html.Airflow uses the standard Python logging framework to write logs. If you’re writing custom operators or executing Python functions with a PythonOperator, just make sure that you instantiate a Python logger instance, and then the associated methods will handle everything for you.AlertingAirflow provides mechanisms for alerting on operational aspects of your executing workloads that can be configured within your DAG:Email notifications: Email notifications can be sent if a task is put into a marked or retry state with the `email_on_failure` or `email_on_retry` state, respectively. These arguments can be provided to all tasks in the DAG with the `default_args` key work in the DAG, or individual tasks by setting the keyword argument individually.Callbacks: Callbacks are special actions that are executed if a specific state change occurs. Generally, these callbacks should be thoughtfully leveraged to send alerts that are critical operationally:on_success_callback: This callback will be executed at both the task and DAG levels when entering a successful state. Unless it is critical that you know whether something succeeds, we generally suggest not using this for alerting.on_failure_callback: This callback is invoked when a task enters a failed state. Generally, this callback should always be set and, in critical scenarios, alert on failures that require intervention and support.on_execute_callback: This is invoked right before a task executes and only exists at the task level. Use sparingly for alerting, as it can quickly become a noisy alert when overused.on_retry_callback: This is invoked when a task is placed in a retry state. This is another callback to be cautious about as an alert, as it can become noisy and cause false alarms.sla_miss_callback: This is invoked when a DAG misses its defined SLA. This callback is only executed at the end of a DAG’s execution cycle so tends to be a very reactive notification that something has gone wrong.SLA monitoringAs awesome of a tool as Airflow is, it is a well-known fact in the community that SLAs, while largely functional, have some unfortunate details with regard to implementation that can make them problematic at best, and they are generally regarded as a broken feature in Airflow. We suggest that if you require SLA monitoring on your workflows, you deploy a CRON job monitoring tool such as healthchecks (https://github.com/healthchecks/healthchecks) that allows you to create suppressive alerts for your services through its rest API to manage SLAs. By pairing this third- party service with either HTTP operators or simple requests from callbacks, you can ensure that your most critical workflows achieve dynamic and resilient SLA alerting.Performance profilingThe Airflow UI is a great tool for profiling the performance of individual DAGs:The Gannt chart view: This is a great visualization for understanding the amount of time spent on individual tasks and the relative order of execution. If you’re worried about bottlenecks in your workflow, start here.Task duration: This allows you to profile the run characteristics of tasks within your DAG over a historical period. This tool is great at helping you understand temporal patterns in execution time and finding outliers in execution. Especially if you find that a DAG slows down over time, this view can help you understand whether it is a systemic issue and which tasks might need additional development.Landing times: This shows the delta between task completion and the start of the DAG run. This is an un-intuitive but powerful metric, as increases in it, when paired with stable task durations in upstream tasks, can help identify whether a scheduler is under heavy load and may need tuning.Additional metrics that have proven to be useful (but may need to be calculated) include the following:Task startup time: This is an especially useful metric when operating with a Kubernetes executor. To calculate this, you will need to calculate the difference between `start_date` and `execution_date` on each task instance. This metric will especially help you identify bottlenecks outside of Airflow that may impact task run times.Task failure and retry counts: Monitoring the frequency of task failures and retries can help identify information about the stability and robustness of your environment. Especially if these types of failure can be linked back to patterns in time or execution, it can help debug interactions with other services.DAG parsing time: Monitoring the amount of time a DAG takes to parse is very important to understand scheduler load and bottlenecks. If an individual DAG takes a long time to load (either due to heavy imports or long blocking calls being executed during parsing), it can have a material impact on the timeliness of scheduling tasks.ConclusionIn this article, we covered some essential strategies to effectively monitor both the core Airflow system and individual DAGs post-deployment. We highlighted the importance of active and suppressive monitoring techniques and provided insights into the critical metrics to track for each component, including the scheduler, metadata database, triggerer, executors/workers, and web server. Additionally, we discussed logging, alerting mechanisms, SLA monitoring, and performance profiling techniques to ensure the reliability, scalability, and efficiency of Airflow workflows. By implementing these monitoring practices and leveraging the insights gained, operators can proactively manage and optimize their Airflow deployments for optimal performance and reliability.Author BioDylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.

2
0
1824

article-image-mastering-threat-detection-with-virustotal-a-guide-for-soc-analysts

Mostafa Yahia

11 Nov 2024

15 min read

Mastering Threat Detection with VirusTotal: A Guide for SOC Analysts

Mostafa Yahia

11 Nov 2024

15 min read

0
0
1973

article-image-mastering-promql-a-comprehensive-guide-to-prometheus-query-language

Rob Chapman, Peter Holmes

07 Nov 2024

15 min read

Mastering PromQL: A Comprehensive Guide to Prometheus Query Language

Rob Chapman, Peter Holmes

07 Nov 2024

15 min read

This article is an excerpt from the book, "Observability with Grafana", by Rob Chapman, Peter Holmes. This book provides a holistic understanding of observability concepts using the Grafana Labs tools, teaching you how to fully leverage the LGTM stack.Introduction PromQL, or Prometheus Query Language, is a powerful tool designed to work with Prometheus, an open-source systems monitoring and alerting toolkit. Initially developed by SoundCloud in 2012 and later accepted by the Cloud Native Computing Foundation in 2016, Prometheus has become a crucial component of modern infrastructure monitoring. PromQL allows users to query data stored in Prometheus, enabling the creation of insightful dashboards and setting up alerts based on the performance metrics of applications and systems. This article will explore the core functionalities of PromQL, including how it interacts with metrics data and how it can be used to effectively monitor and analyze system performance. Introducing PromQL Prometheus was initially developed by SoundCloud in 2012; the project was accepted by the Cloud Native Computing Foundation in 2016 as the second incubated project (after Kubernetes), and version 1.0 was released shortly after. PromQL is an integral part of Prometheus, which is used to query stored data and produce dashboards and alerts. Before we delve into the details of the language, let’s briefly look at the following ways in which Prometheus-compatible systems interact with metrics data: Ingesting metrics: Prometheus-compatible systems accept a timestamp, key-value labels, and a sample value. As the details of the Prometheus Time Series Database (TSDB) are quite complicated, the following diagram shows a simplified example of how an individual sample for a metric is stored once it has been ingested: Figure 5.1 – A simplified view of metric data stored in the TSDB The labels or dimensions of a metric: Prometheus labels provide metadata to identify data of interest. These labels create metrics, time series, and samples: * Each unique __name__ value creates a metric. In the preceding figure, the metric is app_ frontend_requests. * Each unique set of labels creates a time series. In the preceding figure, the set of all labels is the time series. * A time series will contain multiple samples, each with a unique timestamp. The preceding figure shows a single sample, but over time, multiple samples will be collected for each time series. * The number of unique values for a metric label is referred to as the cardinality of the l abel. Highly cardinal labels should be avoided, as they signifi cantly increase the storage costs of the metric. The following diagram shows a single metric containing two time series and five samples: Figure 5.2 – An example of samples from multiple time series In Grafana, we can see a representation of the time series and samples from a metric. To do this, follow these steps: 1. In your Grafana instance, select Explore in the menu. 2. Choose your Prometheus data source, which will be labeled as grafanacloud-<team>prom (default). 3. In the Metric dropdown, choose app_frontend_requests_total, and under Options, set Format to Table, and then click on Run query. Th is will show you all the samples and time series in the metric over the selected time range. You should see data like this: Figure 5.3 – Visualizing the samples and time series that make up a metric Now that we understand the data structure, let’s explore PromQL. An overview of PromQL features In this section, we will take you through the features that PromQL has. We will start with an explanation of the data types, and then we will look at how to select data, how to work on multiple datasets, and how to use functions. As PromQL is a query language, it’s important to know how to manipulate data to produce alerts and dashboards. Data types PromQL offers three data types, which are important, as the functions and operators in PromQL will work diff erently depending on the data types presented: Instant vectors are a data type that stores a set of time series containing a single sample, all sharing the same timestamp – that is, it presents values at a specifi c instant in time: Figure 5.4 – An instant vector Range vectors store a set of time series, each containing a range of samples with different timestamps: Figure 5.5 – Range vectors Scalars are simple numeric values, with no labels or timestamps involved. Selecting data PromQL offers several tools for you to select data to show in a dashboard or a list, or just to understand a system’s state. Some of these are described in the following table: Table 5.1 – The selection operators available in PromQL In addition to the operators that allow us to select data, PromQL offers a selection of operators to compare multiple sets of data. Operators between two datasets Some data is easily provided by a single metric, while other useful information needs to be created from multiple metrics. The following operators allow you to combine datasets. Table 5.2 – The comparison operators available in PromQL Vector matching is an initially confusing topic; to clarify it, let’s consider examples for the three cases of vector matching – one-to-one, one-to-many/many-to-one, and many-to-many. By default, when combining vectors, all label names and values are matched. This means that for each element of the vector, the operator will try to find a single matching element from the second vector. Let’s consider a simple example: Vector A: 10{color=blue,smell=ocean} 31{color=red,smell=cinnamon} 27{color=green,smell=grass} Vector B: 19{color=blue,smell=ocean} 8{color=red,smell=cinnamon} 14{color=green,smell=jungle} A{} + B{}: 29{color=blue,smell=ocean} 39 {color=red,smell=cinnamon} A{} + on (color) B{} or A{} + ignoring (smell) B{}: 29{color=blue} 39{color=red} 41{color=green} When color=blue and smell=ocean, A{} + B{} gives 10 + 19 = 29, and when color=red and smell=cinnamon, A{} + B{} gives 31 + 8 = 29. The other elements do not match the two vectors so are ignored. When we sum the vectors using on (color), we will only match on the color label; so now, the two green elements match and are summed. This example works when there is a one-to-one relationship of labels between vector A and vector B. However, sometimes there may be a many-to-one or one-to-many relationship – that is, vector A or vector B may have more than one element that matches the other vector. In these cases, Prometheus will give an error, and grouping syntax must be used. Let’s look at another example to illustrate this: Vector A: 7{color=blue,smell=ocean} 5{color=red,smell=cinamon} 2{color=blue,smell=powder} Vector B: 20{color=blue,smell=ocean} 8{color=red,smell=cinamon} 14{color=green,smell=jungle} A{} + on (color) group_left B{}: 27{color=blue,smell=ocean} 13{color=red,smell=cinamon} 22{color=blue,smell=powder} Now, we have two different elements in vector A with color=blue. The group_left command will use the labels from vector A but only match on color. This leads to the third element of the combined vector having a value of 22, when the item matching in vector B has a different smell. The group_right operator will behave in the opposite direction. The final option is a many-to-many vector match. These matches use the logical operators and, unless, and or to combine parts of vectors A and B. Let’s see some examples: Vector A: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} 27{color=green,smell=grass} Vector B: 19{color=blue,smell=ocean} 8{color=red,smell=cinamon} 14{color=green,smell=jungle} A{} and B{}: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} A{} unless B{}: 27{color=green,smell=grass} A{} or B{}: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} 27{color=green,smell=grass} 14{color=green,smell=jungle} Unlike the previous examples, mathematical operators are not being used here, so the values of the elements are the values from vector A, but only the elements of A that match the logical condition in B are returned. ConclusionPromQL is an essential component of Prometheus, offering users a flexible and powerful means of querying and analyzing time-series data. By understanding its data types and operators, users can craft complex queries that provide deep insights into system performance. The language supports a variety of data selection and comparison operations, allowing for precise monitoring and alerting. Whether working with instant vectors, range vectors, or scalars, PromQL enables developers and operators to optimize their use of Prometheus for monitoring and alerting, ensuring systems remain performant and reliable. As organizations continue to embrace cloud-native architectures, mastering PromQL becomes increasingly vital for maintaining robust and efficient systems. Author BioRob Chapman is a creative IT engineer and founder at The Melt Cafe, with two decades of experience in the full application life cycle. Working over the years for companies such as the Environment Agency, BT Global Services, Microsoft, and Grafana, Rob has built a wealth of experience on large complex systems. More than anything, Rob loves saving energy, time, and money and has a track record for bringing production-related concerns forward so that they are addressed earlier in the development cycle, when they are cheaper and easier to solve. In his spare time, Rob is a Scout leader, and he enjoys hiking, climbing, and, most of all, spending time with his family and six children.Peter Holmes is a senior engineer with a deep interest in digital systems and how to use them to solve problems. With over 16 years of experience, he has worked in various roles in operations. Working at organizations such as Boots UK, Fujitsu Services, Anaplan, Thomson Reuters, and the NHS, he has experience in complex transformational projects, site reliability engineering, platform engineering, and leadership. Peter has a history of taking time to understand the customer and ensuring Day-2+ operations are as smooth and cost-effective as possible.

0
0
1212

article-image-mastering-the-api-life-cycle-a-comprehensive-guide-to-design-implementation-release-and-maintenance

Bruno Pedro

06 Nov 2024

15 min read

Mastering the API Life Cycle: A Comprehensive Guide to Design, Implementation, Release, and Maintenance

Bruno Pedro

06 Nov 2024

15 min read

0
0
907

article-image-automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide

Agnieszka Koziorowska, Wojciech Marusiak

05 Nov 2024

15 min read

Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide

Agnieszka Koziorowska, Wojciech Marusiak

05 Nov 2024

15 min read

This article is an excerpt from the book, "Google Cloud Associate Cloud Engineer Certification and Implementation Guide", by Agnieszka Koziorowska, Wojciech Marusiak. This book serves as a guide for students preparing for ACE certification, offering invaluable practical knowledge and hands-on experience in implementing various Google Cloud Platform services. By actively engaging with the content, you’ll gain the confidence and expertise needed to excel in your certification journey.Introduction In this article, we will walk you through an example of implementing Google Cloud Functions for optical character recognition (OCR) on Google Cloud Platform. This tutorial will demonstrate how to automate the process of extracting text from an image, translating the text, and storing the results using Cloud Functions, Pub/Sub, and Cloud Storage. By leveraging Google Cloud Vision and Translation APIs, we can create a workflow that efficiently handles image processing and text translation. The article provides detailed steps to set up and deploy Cloud Functions using Golang, covering everything from creating storage buckets to deploying and running your function to translate text. Google Cloud Functions Example Now that you’ve learned what Cloud Functions is, I’d like to show you how to implement a sample Cloud Function. We will guide you through optical character recognition (OCR) on Google Cloud Platform with Cloud Functions. Our use case is as follows: 1. An image with text is uploaded to Cloud Storage. 2. A triggered Cloud Function utilizes the Google Cloud Vision API to extract the text and identify the source language. 3. The text is queued for translation by publishing a message to a Pub/Sub topic. 4. A Cloud Function employs the Translation API to translate the text and stores the result in the translation queue. 5. Another Cloud Function saves the translated text from the translation queue to Cloud Storage. 6. The translated results are available in Cloud Storage as individual text files for each translation. We need to download the samples first; we will use Golang as the programming language. Source files can be downloaded from – https://github.com/GoogleCloudPlatform/golangsamples. Before working with the OCR function sample, we recommend enabling the Cloud Translation API and the Cloud Vision API. If they are not enabled, your function will throw errors, and the process will not be completed. Let’s start with deploying the function: 1. We need to create a Cloud Storage bucket. Create your own bucket with unique name – please refer to documentation on bucket naming under following link: https://cloud.google.com/storage/docs/buckets We will use the following code: gsutil mb gs://wojciech_image_ocr_bucket 2. We also need to create a second bucket to store the results: gsutil mb gs://wojciech_image_ocr_bucket_results 3. We must create a Pub/Sub topic to publish the finished translation results. We can do so with the following code: gcloud pubsub topics create YOUR_TOPIC_NAME. We used the following command to create it: gcloud pubsub topics create wojciech_translate_topic 4. Creating a second Pub/Sub topic to publish translation results is necessary. We can use the following code to do so: gcloud pubsub topics create wojciech_translate_topic_results 5. Next, we will clone the Google Cloud GitHub repository with some Python sample code: git clone https://github.com/GoogleCloudPlatform/golang-samples 6. From the repository, we need to go to the golang-samples/functions/ocr/app/ file to be able to deploy the desired Cloud Function. 7. We recommend reviewing the included go files to review the code and understand it in more detail. Please change the values of your storage buckets and Pub/Sub topic names. 8. We will deploy the first function to process images. We will use the following command: gcloud functions deploy ocr-extract-go --runtime go119 --trigger-bucket wojciech_image_ocr_bucket --entry-point ProcessImage --set-env-vars "^:^GCP_PROJECT=wmarusiak-book- 351718:TRANSLATE_TOPIC=wojciech_translate_topic:RESULT_ TOPIC=wojciech_translate_topic_results:TO_LANG=es,en,fr,ja" 9. After deploying the first Cloud Function, we must deploy the second one to translate the text. We can use the following code snippet: gcloud functions deploy ocr-translate-go --runtime go119 --trigger-topic wojciech_translate_topic --entry-point TranslateText --set-env-vars "GCP_PROJECT=wmarusiak-book- 351718,RESULT_TOPIC=wojciech_translate_topic_results" 10. The last part of the complete solution is a third Cloud Function that saves results to Cloud Storage. We will use the following snippet of code to do so: gcloud functions deploy ocr-save-go --runtime go119 --triggertopic wojciech_translate_topic_results --entry-point SaveResult --set-env-vars "GCP_PROJECT=wmarusiak-book-351718,RESULT_ BUCKET=wojciech_image_ocr_bucket_results" 11. We are now free to upload any image containing text. It will be processed first, then translated and saved into our Cloud Storage bucket. 12. We uploaded four sample images that we downloaded from the Internet that contain some text. We can see many entries in the ocr-extract-go Cloud Function’s logs. Some Cloud Function log entries show us the detected language in the image and the other extracted text: Figure 7.22 – Cloud Function logs from the ocr-extract-go function 13. ocr-translate-go translates detected text in the previous function: Figure 7.23 – Cloud Function logs from the ocr-translate-go function 14. Finally, ocr-save-go saves the translated text into the Cloud Storage bucket: Figure 7.24 – Cloud Function logs from the ocr-save-go function 15. If we go to the Cloud Storage bucket, we’ll see the saved translated files: Figure 7.25 – Translated images saved in the Cloud Storage bucket 16. We can view the content directly from the Cloud Storage bucket by clicking Download next to the file, as shown in the following screenshot: Figure 7.26 – Translated text from Polish to English stored in the Cloud Storage bucket Cloud Functions is a powerful and fast way to code, deploy, and use advanced features. We encourage you to try out and deploy Cloud Functions to understand the process of using them better. At the time of writing, Google Cloud Free Tier offers a generous number of free resources we can use. Cloud Functions offers the following with its free tier: 2 million invocations per month (this includes both background and HTTP invocations) 400,000 GB-seconds, 200,000 GHz-seconds of compute time 5 GB network egress per month Google Cloud has comprehensive tutorials that you can try to deploy. Go to https://cloud.google.com/functions/docs/tutorials to follow one. Conclusion In conclusion, Google Cloud Functions offer a powerful and scalable solution for automating tasks like optical character recognition and translation. Through this example, we have demonstrated how to use Cloud Functions, Pub/Sub, and the Google Cloud Vision and Translation APIs to build an end-to-end OCR and translation pipeline. By following the provided steps and code snippets, you can easily replicate this process for your own use cases. Google Cloud's generous Free Tier resources make it accessible to get started with Cloud Functions. We encourage you to explore more by deploying your own Cloud Functions and leveraging the full potential of Google Cloud Platform for serverless computing. Author BioAgnieszka is an experienced Systems Engineer who has been in the IT industry for 15 years. She is dedicated to supporting enterprise customers in the EMEA region with their transition to the cloud and hybrid cloud infrastructure by designing and architecting solutions that meet both business and technical requirements. Agnieszka is highly skilled in AWS, Google Cloud, and VMware solutions and holds certifications as a specialist in all three platforms. She strongly believes in the importance of knowledge sharing and learning from others to keep up with the ever-changing IT industry.With over 16 years in the IT industry, Wojciech is a seasoned and innovative IT professional with a proven track record of success. Leveraging extensive work experience in large and complex enterprise environments, Wojciech brings valuable knowledge to help customers and businesses achieve their goals with precision, professionalism, and cost-effectiveness. Holding leading certifications from AWS, Alibaba Cloud, Google Cloud, VMware, and Microsoft, Wojciech is dedicated to continuous learning and sharing knowledge, staying abreast of the latest industry trends and developments.

0
0
432

article-image-vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud

Jasmeet Bhatia, Kartik Chaudhary

04 Nov 2024

15 min read

Vertex AI Workbench: Your Complete Guide to Scaling Machine Learning with Google Cloud

Jasmeet Bhatia, Kartik Chaudhary

04 Nov 2024

15 min read

0
0
936

article-image-essential-sql-for-data-engineers

Kedeisha Bryan, Taamir Ransome

31 Oct 2024

10 min read

Essential SQL for Data Engineers

Kedeisha Bryan, Taamir Ransome

31 Oct 2024

10 min read

This article is an excerpt from the book, Cracking the Data Engineering Interview, by Kedeisha Bryan, Taamir Ransome. The book is a practical guide that’ll help you prepare to successfully break into the data engineering role. The chapters cover technical concepts as well as tips for resume, portfolio, and brand building to catch the employer's attention, while also focusing on case studies and real-world interview questions.Introduction In the world of data engineering, SQL is the unsung hero that empowers us to store, manipulate, transform, and migrate data easily. It is the language that enables data engineers to communicate with databases, extract valuable insights, and shape data to meet their needs. Regardless of the nature of the organization or the data infrastructure in use, a data engineer will invariably need to use SQL for creating, querying, updating, and managing databases. As such, proficiency in SQL can often the difference between a good data engineer and a great one. Whether you are new to SQL or looking to brush up your skills, this chapter will serve as a comprehensive guide. By the end of this chapter, you will have a solid understanding of SQL as a data engineer and be prepared to showcase your knowledge and skills in an interview setting. In this article, we will cover the following topics: Must-know foundational SQL concepts Must-know advanced SQL concepts Technical interview questions Must-know foundational SQL concepts In this section, we will delve into the foundational SQL concepts that form the building blocks of data engineering. Mastering these fundamental concepts is crucial for acing SQL-related interviews and effectively working with databases. Let’s explore the critical foundational SQL concepts every data engineer should be comfortable with, as follows: SQL syntax: SQL syntax is the set of rules governing how SQL statements should be written. As a data engineer, understanding SQL syntax is fundamental because you’ll be writing and reviewing SQL queries regularly. These queries enable you to extract, manipulate, and analyze data stored in relational databases. SQL order of operations: The order of operations dictates the sequence in which each of the following operators is executed in a query: FROM and JOIN WHERE GROUP BY HAVING SELECT DISTINCT ORDER BY LIMIT/OFFSET Data types: SQL supports a variety of data types, such as INT, VARCHAR, DATE, and so on. Understanding these types is crucial because they determine the kind of data that can be stored in a column, impacting storage considerations, query performance, and data integrity. As a data engineer, you might also need to convert data types or handle mismatches. SQL operators: SQL operators are used to perform operations on data. They include arithmetic operators (+, -, *, /), comparison operators (>, <, =, and so on), and logical operators (AND, OR, and NOT). Knowing these operators helps you construct complex queries to solve intricate data-related problems. Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL) commands: DML commands such as SELECT, INSERT, UPDATE, and DELETE allow you to manipulate data stored in the database. DDL commands such as CREATE, ALTER, and DROP enable you to manage database schemas. DCL commands such as GRANT and REVOKE are used for managing permissions. As a data engineer, you will frequently use these commands to interact with databases. Basic queries: Writing queries to select, filter, sort, and join data is an essential skill for any data engineer. These operations form the basis of data extraction and manipulation. Aggregation functions: Functions such as COUNT, SUM, AVG, MAX, MIN, and GROUP BY are used to perform calculations on multiple rows of data. They are essential for generating reports and deriving statistical insights, which are critical aspects of a data engineer’s role. The following section will dive deeper into must-know advanced SQL concepts, exploring advanced techniques to elevate your SQL proficiency. Get ready to level up your SQL game and unlock new possibilities in data engineering! Must-know advanced SQL concepts This section will explore advanced SQL concepts that will elevate your data engineering skills to the next level. These concepts will empower you to tackle complex data analysis, perform advanced data transformations, and optimize your SQL queries. Let’s delve into must-know advanced SQL concepts, as follows: Window functions: These do a calculation on a group of rows that are related to the current row. They are needed for more complex analyses, such as figuring out running totals or moving averages, which are common tasks in data engineering. Subqueries: Queries nested within other queries. They provide a powerful way to perform complex data extraction, transformation, and analysis, often making your code more efficient and readable. Common Table Expressions (CTEs): CTEs can simplify complex queries and make your code more maintainable. They are also essential for recursive queries, which are sometimes necessary for problems involving hierarchical data. Stored procedures and triggers: Stored procedures help encapsulate frequently performed tasks, improving efficiency and maintainability. Triggers can automate certain operations, improving data integrity. Both are important tools in a data engineer’s toolkit. Indexes and optimization: Indexes speed up query performance by enabling the database to locate data more quickly. Understanding how and when to use indexes is key for a data engineer, as it affects the efficiency and speed of data retrieval. Views: Views simplify access to data by encapsulating complex queries. They can also enhance security by restricting access to certain columns. As a data engineer, you’ll create and manage views to facilitate data access and manipulation. By mastering these advanced SQL concepts, you will have the tools and knowledge to handle complex data scenarios, optimize your SQL queries, and derive meaningful insights from your datasets. The following section will prepare you for technical interview questions on SQL. We will equip you with example answers and strategies to excel in SQL-related interview discussions. Let’s further enhance your SQL expertise and be well prepared for the next phase of your data engineering journey. Technical interview questions This section will address technical interview questions specifically focused on SQL for data engineers. These questions will help you demonstrate your SQL proficiency and problem-solving abilities. Let’s explore a combination of primary and advanced SQL interview questions and the best methods to approach and answer them, as follows: Question 1: What is the difference between the WHERE and HAVING clauses? Answer: The WHERE clause filters data based on conditions applied to individual rows, while the HAVING clause filters data based on grouped results. Use WHERE for filtering before aggregating data and HAVING for filtering after aggregating data. Question 2: How do you eliminate duplicate records from a result set? Answer: Use the DISTINCT keyword in the SELECT statement to eliminate duplicate records and retrieve unique values from a column or combination of columns. Question 3: What are primary keys and foreign keys in SQL? Answer: A primary key uniquely identifies each record in a table and ensures data integrity. A foreign key establishes a link between two tables, referencing the primary key of another table to enforce referential integrity and maintain relationships. Question 4: How can you sort data in SQL? Answer: Use the ORDER BY clause in a SELECT statement to sort data based on one or more columns. The ASC (ascending) keyword sorts data in ascending order, while the DESC (descending) keyword sorts it in descending order. Question 5: Explain the difference between UNION and UNION ALL in SQL. Answer: UNION combines and removes duplicate records from the result set, while UNION ALL combines all records without eliminating duplicates. UNION ALL is faster than UNION because it does not involve the duplicate elimination process. Question 6: Can you explain what a self join is in SQL? Answer: A self join is a regular join where a table is joined to itself. This is often useful when the data is related within the same table. To perform a self join, we have to use table aliases to help SQL distinguish the left from the right table. Question 7: How do you optimize a slow-performing SQL query? Answer: Analyze the query execution plan, identify bottlenecks, and consider strategies such as creating appropriate indexes, rewriting the query, or using query optimization techniques such as JOIN order optimization or subquery optimization. Question 8: What are CTEs, and how do you use them? Answer: CTEs are temporarily named result sets that can be referenced within a query. They enhance query readability, simplify complex queries, and enable recursive queries. Use the WITH keyword to define CTEs in SQL. Question 9: Explain the ACID properties in the context of SQL databases. Answer: ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These are basic properties that make sure database operations are reliable and transactional. Atomicity makes sure that a transaction is handled as a single unit, whether it is fully done or not. Consistency makes sure that a transaction moves the database from one valid state to another. Isolation makes sure that transactions that are happening at the same time don’t mess with each other. Durability makes sure that once a transaction is committed, its changes are permanent and can survive system failures. Question 10: How can you handle NULL values in SQL? Answer: Use the IS NULL or IS NOT NULL operator to check for NULL values. Additionally, you can use the COALESCE function to replace NULL values with alternative non-null values. Question 11: What is the purpose of stored procedures and functions in SQL? Answer: Stored procedures and functions are reusable pieces of SQL code encapsulating a set of SQL statements. They promote code modularity, improve performance, enhance security, and simplify database maintenance. Question 12: Explain the difference between a clustered and a non-clustered index. Answer: The physical order of the data in a table is set by a clustered index. This means that a table can only have one clustered index. The data rows of a table are stored in the leaf nodes of a clustered index. A non-clustered index, on the other hand, doesn’t change the order of the data in the table. After sorting the pointers, it keeps a separate object in a table that points back to the original table rows. There can be more than one non-clustered index for a table. Prepare for these interview questions by understanding the underlying concepts, practicing SQL queries, and being able to explain your answers. ConclusionThis article explored the foundational and advanced principles of SQL that empower data engineers to store, manipulate, transform, and migrate data confidently. Understanding these concepts has unlocked the door to seamless data operations, optimized query performance, and insightful data analysis. SQL is the language that bridges the gap between raw data and valuable insights. With a solid grasp of SQL, you possess the skills to navigate databases, write powerful queries, and design efficient data models. Whether preparing for interviews or tackling real-world data engineering challenges, the knowledge you have gained in this chapter will propel you toward success. Remember to continue exploring and honing your SQL skills. Stay updated with emerging SQL technologies, best practices, and optimization techniques to stay at the forefront of the ever-evolving data engineering landscape. Embrace the power of SQL as a critical tool in your data engineering arsenal, and let it empower you to unlock the full potential of your data. Author BioKedeisha Bryan is a data professional with experience in data analytics, science, and engineering. She has prior experience combining both Six Sigma and analytics to provide data solutions that have impacted policy changes and leadership decisions. She is fluent in tools such as SQL, Python, and Tableau.She is the founder and leader at the Data in Motion Academy, providing personalized skill development, resources, and training at scale to aspiring data professionals across the globe. Her other works include another Packt book in the works and an SQL course for LinkedIn Learning.Taamir Ransome is a Data Scientist and Software Engineer. He has experience in building machine learning and artificial intelligence solutions for the US Army. He is also the founder of the Vet Dev Institute, where he currently provides cloud-based data solutions for clients. He holds a master's degree in Analytics from Western Governors University.

1
0
704

article-image-how-to-create-and-connect-a-virtual-network-in-azure-for-windows-365

Christiaan Brinkhoff, Sandeep Patnaik, Morten Pedholt

31 Oct 2024

15 min read

How to Create and Connect a Virtual Network in Azure for Windows 365

Christiaan Brinkhoff, Sandeep Patnaik, Morten Pedholt

31 Oct 2024

15 min read

This article is an excerpt from the book, Mastering Windows 365, by Jonathan R. Danylko. Mastering Windows 365 provides you with detailed knowledge of cloud PCs by exploring its designing model and analyzing its security environment. This book will help you extend your existing skillset with Windows 365 effectively.Introduction In today's cloud-centric world, establishing a secure and efficient network infrastructure is crucial for businesses of all sizes. Microsoft Azure, with its robust set of networking tools, provides a seamless way to connect various environments, including Windows 365. In this guide, we will walk you through the process of creating a virtual network in Azure, and how to connect it to a Windows 365 environment. Whether you're setting up a new network or integrating an existing one, this step-by-step tutorial will ensure you have the foundation necessary for a successful deployment. Creating a virtual network in Azure Start by going to https://portal.azure.com/ and create a new virtual network. It's quite straightforward. You can use all the default settings, but take care that you aren't overlapping the address space with an existing one you are already using: 1. Start by logging in to https://portal.azure.com. 2. Start the creation of a new virtual network. From here, choose the Resource group option and the name of the virtual network. When these have been defi ned, choose Next. Figure 3.5 – Virtual network creation basic information 3. There are some security features you can enable on the virtual network. Th ese features are optional, but Azure Firewall should be considered if no other fi rewall solution is deployed. When you are ready, click on Next. Figure 3.6 – Virtual network creation security 4. Now the IP address range and subnets must be defined. Once these have been defi ned, click on Next. Figure 3.7 – Virtual network creation | IP addresses 5. Next, we can add any Azure tags that might be required for your organization. We will leave it as is in this case. Click on Next. Figure 3.8 – Virtual network | Azure tags selection 6. We are now able to see an overview of the entire configuration of the new virtual network. When you have reviewed this, click on Create. Figure 3.9 – Virtual network creation | settings review Now that the virtual network has been created, we can start looking at how we create an ANC in Intune. We will look at the confi guration for both an AADJ and HAADJ network connection. Setting up an AADJ ANC Let's have a look at how to configure an ANC for AADJ Cloud PC device : 1. Start by going to Microsoft Intune | Devices | Windows 365 | Azure network connection. From here, click on + Create and select Azure AD Join: Figure 3.10 – Creating an ANC in Windows 365 overview 2. Fill out the required information such as the display name of the connection, the virtual network, and the subnet you would like to integrate with Windows 365. Once that is done, click on Next. Figure 3.11 – Creating an AADJ ANC | network details 3. Review the information you have filled in. When you are ready, click Review + create: Figure 3.12 – Creating an AADJ ANC | settings review Once the ANC has been created, you are now done and should be able to view the connection in the ANC overview. You can now use that virtual network in your provisioning policy. Figure 3.13 – Windows 365 ANC network overview Setting up a HAADJ ANC A HAADJ network connection is a bit trickier to set up than the previous one. We must ensure the virtual network we are using has a connection with the domain we are trying to join. Once we are sure about that, let's go ahead and create a connection: 1. Visit Microsoft Intune | Windows 365 | Azure network connection. From here, click on + Create and select Hybrid Azure AD Join. Figure 3.14 – Creating a HAADJ ANC in Windows 365 | Overview 2. Provide the required information such as the display name of the connection, the virtual network, and the subnet you would like to integrate with Windows 365. Click Next. Figure 3.15 – Creating a HAADJ ANC | network details 3. Type the domain name you want the Cloud PCs to join. The Organization Unit field is optional. Type in the AD username and password for your domain-joined service account. Once done, click Next: Figure 3.16 – Creating a HAADJ ANC | domain details 4. Review the settings provided and click on Review + create. The connection will now be established: Figure 3.17 – Creating a HAADJ ANC | settings details Once the creation is done, you can view the connection in the ANC overview. You will now be able to use that virtual network in your provisioning policy. Figure 3.18 – Windows 365 ANC network overview ConclusionCreating a virtual network in Azure and connecting it to your Windows 365 environment is a fundamental step towards leveraging the full potential of cloud-based services. By following the outlined procedures, you can ensure a secure and efficient network connection, whether you're dealing with Azure AD Join (AADJ) or Hybrid Azure AD Join (HAADJ) scenarios. With the virtual network and ANC now configured, you are well-equipped to manage and monitor your network connections, enhancing the overall performance and reliability of your cloud infrastructure. Author BioChristiaan works as a Principal Program Manager and Community Lead on the Windows Cloud Experiences (Windows 365 + AVD) Engineering team at Microsoft, bringing his expertise to help customers imagine new virtualization experiences. A former Global Black Belt for Azure Virtual Desktop, Christiaan joined Microsoft in 2018 as part of the FSLogix acquisition. In his role at Microsoft, he worked on features such as Windows 365 app, Switch, and Boot. His mission is to drive innovation while bringing Windows 365, Windows, and Microsoft Endpoint Manager (MEM) closer together, and drive community efforts around virtualization to empower Microsoft customers in leveraging new cloud virtualization scenarios.Sandeep is a virtualization veteran with nearly two decades of experience in the industry. He has shipped multiple billion-dollar products and cloud services for Microsoft to a global user base including Windows, Azure Virtual Desktop, and Windows 365. His contributions have earned him multiple patents in this field.Currently, he leads a stellar team that is responsible for building the product strategy for Windows 365 and Azure Virtual Desktop services and shaping the future of end-user experiences for these services.Morten works as a Cloud Architect for a consultant company in Denmark where he advises and implements Microsoft virtual desktop solutions to customers around the world, Morten started his journey as a consultant over 8 years ago where he started with managing client devices but quickly found a passion for virtual device management. Today Windows 365 and Azure Virtual Desktop are the main areas that are being focused on alongside Microsoft Intune. Based on all the community activities Morten has done in the past years, he got rewarded with the Microsoft MVP award in the category of Windows 365 in March 2022.

0
0
409

article-image-building-efficient-web-apis-with-net-8-and-visual-studio-2022

Jonathan R. Danylko

30 Oct 2024

15 min read

Building Efficient Web APIs with .NET 8 and Visual Studio 2022

Jonathan R. Danylko

30 Oct 2024

15 min read

This article is an excerpt from the book, ASP.NET 8 Best Practices, by Jonathan R. Danylko. With the latest version of .NET 8.0 Core in LTS (Long-Term-Support), best practices are becoming harder to find as the technology continues to evolve. This book will guide you through coding practices and various aspects of software development.Introduction In the ever-evolving landscape of web development, .NET 8 has emerged as a game-changer, especially in the realm of Web APIs. With new features and enhancements, .NET 8 prioritizes the ease and efficiency of building Web APIs, supported by robust tools in Visual Studio 2022. This chapter explores the innovations in .NET 8, focusing on creating and testing Web APIs seamlessly. From leveraging minimal APIs to utilizing Visual Studio's new features, developers can now build powerful REST-based services with simplicity and speed. We'll guide you through the process, demonstrating how to create a minimal API and highlighting the benefits of this approach. Technical requirements In .NET 8, Web APIs take a front seat. Visual Studio has added new features to make Web APIs easier to build and test. For this chapter, we recommend using Visual Studio 2022, but the only requirement to view the GitHub repository is a simple text editor. The code for Chapter 09 is located in Packt Publishing’s GitHub repository, found at https:// github.com/PacktPublishing/ASP.NET-Core-8-Best-Practices. Creating APIs quickly With .NET 8, APIs are integrated into the framework, making it easier to create, test, and document. In this section, we’ll learn a quick and easy way to create a minimal API using Visual Studio 2022 and walk through the code it generates. We’ll also learn why minimal APIs are the best approach to building REST-based services. Using Visual Studio One of the features of .NET 8 is the ability to create minimal R EST APIs extremely fast. One way is to use the dotnet command-line tool and the other way is to use Visual Studio. To do so, follow these steps: 1. Open Visual Studio 2022 and create an ASP.NET Core Web API project. 2. After selecting the directory for the project, click Next. 3. Under the project options, make the following changes: Uncheck the Use Controllers option to use minimal APIs Check Enable OpenAPI support to include support for API documentation using Swagger: Figure 9.1 – Options for a web minimal API project 4. Click Create. That’s it – we have a simple API! It may not be much of one, but it’s still a complete API with Swagger documentation. Swagger is a tool for creating documentation for APIs and implementing the OpenAPI specification, whereas Swashbuckle is a NuGet package that uses Swagger for implementing Microsoft APIs. If we look at the project, there’s a single file called Program.cs. Opening Program.cs will show the entire application. This is one of the strong points of .NET – the ability to create a scaffolded REST API relatively quickly: var builder = WebApplication.CreateBuilder(args); // Add services to the container. // Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle builder.Services.AddEndpointsApiExplorer(); builder.Services.AddSwaggerGen(); var app = builder.Build(); // Configure the HTTP request pipeline. if (app.Environment.IsDevelopment()) { app.UseSwagger(); app.UseSwaggerUI(); } app.UseHttpsRedirection(); var summaries = new[] { "Freezing", "Bracing", "Chilly", "Cool", "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching" }; app.MapGet("/weatherforecast", () => { var forecast = Enumerable.Range(1, 5).Select(index => new WeatherForecast ( DateOnly.FromDateTime(DateTime.Now.AddDays (index)), Random.Shared.Next(-20, 55), summaries[Random.Shared.Next( summaries.Length)] )) .ToArray(); return forecast; }) .WithName("GetWeatherForecast") .WithOpenApi(); app.Run(); internal record WeatherForecast(DateOnly Date, int TemperatureC, string? Summary) { public int TemperatureF => 32 + (int)(TemperatureC / 0.5556); } In the preceding code, we created our “application” through the .CreateBuilder() method. We also added the EndpointsAPIExplorer and SwaggerGen services. EndpointsAPIExplorer enables the developer to view all endpoints in Visual Studio, which we’ll cover later. The SwaggerGen service, on the other hand, creates the documentation for the API when accessed through the browser. The next line creates our application instance using the .Build() method. Once we have our app instance and we are in development mode, we can add Swagger and the Swagger UI. .UseHttpsRedirection() is meant to redirect to HTTPS when the protocol of a web page is HTTP to make the API secure. The next line creates our GET weatherforecast route using .MapGet(). We added the .WithName() and .WithOpenApi() methods to identify the primary method to call and let .NET know it uses the OpenAPI standard, respectively. Finally, we called app.Run(). If we run the application, we will see the documented API on how to use our API and what’s available. Running the application produces the following output: Figure 9.2 – Screenshot of our documented Web API If we call the /weatherforecast API, we see that we receive JSON back with a 200 HTTP status. Figure 9.3 – Results of our /weatherforecast API Think of this small API as middleware with API controllers combined into one compact file (Program. cs). Why minimal APIs? I consider minimal APIs to be a feature in .NET 8, even though it’s a language concept. If the application is extremely large, adding minimal APIs should be an appealing feature in four ways: Self-contained: Simple API functionality inside one file is easy to follow for other developers Performance: Since we aren’t using controllers, the MVC overhead isn’t necessary when using these APIs Cross-platform: With .NET, APIs can now be deployed on any platform Self-documenting: While we can add Swashbuckle to other APIs, it also builds the documentation for minimal APIs Moving forward, we’ll take these minimal APIs and start looking at Visual Studio’s testing capabilities. Conclusion In conclusion, .NET 8 has revolutionized the process of building Web APIs by integrating them more deeply into the framework, making it easier than ever to create, test, and document APIs. By harnessing the power of Visual Studio 2022, developers can quickly set up minimal APIs, offering a streamlined and efficient approach to building REST-based services. The advantages of minimal APIs—being self-contained, performant, cross-platform, and self-documenting—make them an invaluable tool in a developer's arsenal. As we continue to explore the capabilities of .NET 8, the potential for creating robust and scalable web applications is limitless, paving the way for innovative and efficient software solutions. Author BioJonathan "JD" Danylko is an award-winning, full-stack ASP.NET architect. He's used ASP.NET as his primary way to build websites since 2002 and before that, Classic ASP.Jonathan contributes to his blog (DanylkoWeb.com) on a weekly basis, has built a custom CMS, is a founder of Tuxboard (an open-source ASP.NET dashboard library), has been on various podcasts, and guest posted on the C# Advent Calendar for 6 years. Jonathan has worked in various industries for small, medium, and Fortune 100 companies, but currently works as an Architect at Insight Enterprise. The best way to contact Jonathan is through GitHub, LinkedIn, Twitter, email, or through the website.

0
0
912

article-image-effortless-web-deployment-a-guide-to-deploying-your-application-on-netlify

Ekene Eze

30 Oct 2024

10 min read

Effortless Web Deployment: A Guide to Deploying Your Application on Netlify

Ekene Eze

30 Oct 2024

10 min read

This article is an excerpt from the book, Web Development on Netlify, by Ekene Eze. This book is a comprehensive guide to deploying and scaling frontend web applications on Netlify. With hands-on instructions and real-world examples, this book takes you from setting up a Netlify account and deploying web apps to optimizing performance.Introduction Deploying a web application can sometimes be a daunting task, especially with the various methods and tools available. In this article, we'll explore two straightforward deployment methods offered by Netlify: the drag-and-drop method, which is beginner-friendly and ideal for static sites, and the Netlify CLI (Netlify Dev) method, which provides greater control for developers who prefer using the command line. Deploying your web application on Netlify We will discuss two deployment methods in this chapter: the drag-and-drop method and the Netlify CLI (Netlify Dev) m ethod. A third method, the Git-based method, was covered in the Connecting to a Git repository section in Chapter 1. Netlify drag-and-drop deployment The drag-and-drop deployment method is the most straightforward and beginner-friendly way to deploy a web application on Netlify. Th is method is suitable for static websites or applications that do not require complex build processes. To deploy your web application on Netlify using the drag-and-drop method, follow these steps: 1. Organize your project files and ensure your project’s index.html file is in the root folder so that Netlify can easily find it and build your site from there: Figure 2.1 – Netlify drop sample structure 2. Visit netlify.com and sign in or create an account. 3. On your Netlify dashboard, locate the Sites section. Drag and drop your project folder into the designated area. Netlify will automatically upload your files, create a new site, deploy it, and assign a randomly generated URL. You can click on the generated URL to view your live site. 4. Optionally, configure your site. To configure your site’s settings, such as adding a custom domain or enabling SSL, click the Site settings button. We will discuss these configuration options in greater detail later, in the Configuring settings and o options section. Netlify CLI (Netlify Dev) deployment The Netlify CLI deployment method offers greater control over the deployment process for developers who prefer using the command line. Follow these steps to deploy your web applications to Netlify using the Netlify CLI: 1. Install the Netlify CLI globally on your computer using npm: npm install -g netlify-cli 2. Run the following command to authenticate your Netlify account: netlify login Your browser will open so that you can authorize access to your Netlify account. 3. Navigate to your project folder in the command line and run the following command to initialize a new Netlify site: netlify init 4. You will be prompted to choose between connecting an existing Git repository or creating a new site without a Git repository. Choose the option that best suits your needs. Connecting to a Git repository enables continuous deployment. 4. If your project requires specifi c build settings, open the automatically created netlify.toml fi le in your project’s root directory and confi gure the settings accordingly. Here’s an example: toml [build] command = "npm run build" publish = "dist" This configuration would run the npm run build command and deploy the dist folder as the publish directory. Run the following command in your project directory to deploy your site: netlify deploy By default, this command creates a draft deployment. Preview the draft by visiting the generated URL. 7. If you are satisfied with the draft deployment, run the following command for a production deployment: netlify deploy --prod This will create a production deployment with a randomly generated URL. 8. Visit your Netlify dashboard to view your live site or configure your site’s settings, such as adding a custom domain or enabling SSL. This step will be covered in more detail in the Configuring settings and options section of this chapter. Git-based deployment Refer to Chapter 1 for the Git-based deployment process. Choosing a deployment pattern Need help choosing a pattern for your needs? Here’s a tabular comparison of the three deployment patterns offered by Netlify: Git-based deployments, CLI deployments, and drag-and-drop: Deployment PatternWhen to ChooseKey BenefitsGit-based deployments Ideal for collaborative development Version control, automated builds, code reviewCLI deployments Ideal for advanced automation scenarios Scripted deployments, custom workflowsDrag-and-drop deployments Ideal for simple, non-technical usersUser-friendly, visual interface, quick deploymentsTable – Choosing a deployment pattern Now, let’s discuss when each deployment pattern is ideal and why: Git-based deployments: Git-based deployments are suitable for collaborative development environments where multiple team members contribute to the code base. It is ideal when you want to leverage the power of version control systems such as Git. Git-based deployments offer version control, which allows you to track changes, collaborate with others, and roll back to previous versions if needed. They also enable automated builds triggered by changes to the repository, facilitating continuous integration and deployment workflows. Code review processes can be integrated into the deployment pipeline, ensuring code quality. CLI deployments: CLI deployments are ideal for advanced automation scenarios, where you require fine-grained control over the deployment process and want to integrate it with custom scripts or workflows. CLI deployments off er fl exibility and programmability. They allow you to script deployments using command-line tools, which can be useful for automating complex deployment scenarios. You can customize and extend the deployment process to fit your requirements while integrating with other tools or services. Drag-and-drop deployments: Drag-and-drop deployments are ideal for non-technical users or individuals who prefer a simple, user-friendly interface for deploying static sites or applications quickly. Drag-and-drop deployments provide a visual interface that simplifies the deployment process. Users can simply drag and drop their site files or assets onto the Netlify web interface, and the platform takes care of the deployment and hosting. This pattern eliminates the need for technical knowledge or command-line usage, making it accessible to a wider range of users. The choice of deployment pattern depends on your specific needs and your technical expertise. Git-based deployments are suitable for collaborative development, CLI deployments offer advanced automation capabilities, and drag-and-drop deployments are ideal for non-technical users seeking a simple interface. Understanding the strengths and trade-offs of each pattern will help you select the most appropriate deployment approach for your project. ConclusionChoosing the right deployment method is crucial for the success and efficiency of your web application. Whether you opt for the simplicity of the drag-and-drop method, the command-line control of the Netlify CLI, or the collaborative advantages of Git-based deployments, each approach has its unique strengths. The drag-and-drop method offers a quick and easy solution for non-technical users, while the CLI method provides advanced automation capabilities for more complex scenarios. Git-based deployments, on the other hand, are perfect for teams working in a collaborative environment with a need for version control. By understanding these methods and their respective benefits, you can confidently deploy your web application on Netlify using the approach that best aligns with your goals and expertise. Author BioEkene Eze is a highly experienced Developer Advocate with over five years of professional experience in leading DevRel teams across multiple organizations. As a former member of the Developer Experience team at Netlify, he played a key role in helping numerous companies integrate and effectively utilize the Netlify platform. As a well-regarded speaker, he is dedicated to sharing his knowledge and expertise with the wider development community through a variety of mediums, including blog posts, video tutorials, live streams, and podcasts. Currently serving as the Director of Developer Relations at Abridged Inc, the author brings a wealth of experience and expertise to this comprehensive guide on scaling web applications with Netlify.

0
0
617

article-image-mastering-prometheus-sharding-boost-scalability-with-efficient-data-management

William Hegedus

28 Oct 2024

15 min read

Mastering Prometheus Sharding: Boost Scalability with Efficient Data Management

William Hegedus

28 Oct 2024

15 min read

0
0
584

Managing AI Security Risks with Zero Trust: A Strategic Guide

Mastering Transfer Learning: Fine-Tuning BERT and Vision Transformers

Supabase Unleashed: Advanced Features for TypeScript, Frameworks, and Direct Database Connections

How to Integrate AI into Software Development Teams

Airflow Ops Best Practices: Observation and Monitoring

Mastering Threat Detection with VirusTotal: A Guide for SOC Analysts

Mastering PromQL: A Comprehensive Guide to Prometheus Query Language

Mastering the API Life Cycle: A Comprehensive Guide to Design, Implementation, Release, and Maintenance

Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide

Vertex AI Workbench: Your Complete Guide to Scaling Machine Learning with Google Cloud

Trending Topics

Essential SQL for Data Engineers

How to Create and Connect a Virtual Network in Azure for Windows 365

Building Efficient Web APIs with .NET 8 and Visual Studio 2022

Effortless Web Deployment: A Guide to Deploying Your Application on Netlify

Mastering Prometheus Sharding: Boost Scalability with Efficient Data Management