Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7014 Articles
article-image-how-to-integrate-ai-into-software-development-teams
Anderson Soares Furtado Oliveira
21 Nov 2024
15 min read
Save for later

How to Integrate AI into Software Development Teams

Anderson Soares Furtado Oliveira
21 Nov 2024
15 min read
This article is an excerpt from the book, "​AI Strategies for Web Development", by Anderson Soares Furtado Oliveira. Embark on an enlightening AI journey by understanding its role and its fundamentals, crafting cutting-edge applications, and navigating ethical challenges. You’ll also explore strategic tools and gain foresight into future trends.IntroductionIntegrating AI into software development teams is no longer a futuristic concept; it is a strategic necessity in today's digital era. AI has the potential to revolutionize software development by optimizing processes, solving complex problems, improving user experience, and driving business value. However, harnessing the power of AI requires more than just adopting new tools—it demands a shift in mindset, processes, skills, and team culture. In this article, we explore actionable strategies for software engineering leaders to successfully integrate AI into their teams, drawing from Gartner’s recommendations and industry best practices. From fostering collaboration and upskilling teams to implementing data pipelines and AI solutions, these steps will help organizations fully leverage AI's transformative potential.How to integrate AI into software development teamsAI is a technology that can transform the way we create and use software applications. It can help us solve complex problems, optimize processes, improve UX, and generate value for businesses. However, for us to fully leverage the potential of AI, it needs to be effectively integrated into software development teams. In this section, we will present some actions that software engineering leaders should consider so that they can achieve this goal, based on Gartner’s recommendations (https://www.gartner. com/en/articles/set-up-now-for-ai-to-augment-software-development).Let’s start:Adopt an AI mindset from the start: The first action is to adopt an AI mindset from the start of the project, encouraging the exploration of AI techniques to improve application development. This means that developers should be open to learning about the possibilities and challenges of AI and seek innovative solutions that use this technology. In addition, leaders should set clear and measurable goals for the use of AI and align expectations with project stakeholders. So, encourage teams to explore AI by initiating projects that directly involve AI technologies. For instance, a development team could be tasked with creating a chatbot to streamline customer service interactions, encouraging them to learn and apply NLP techniques.Provide a framework to identify AI opportunities: The second action is to provide a framework to identify when and where AI can yield better results. This involves analyzing the needs and requirements of the project, and assessing whether AI can offer benefits in terms of quality, efficiency, scalability, security, or other aspects. It is also important to consider the costs and risks associated with implementing AI and compare them with available alternatives. The framework should guide developers in choosing the most suitable AI techniques for each case, such as ML, NLP, and computer vision. Develop a decision matrix to help identify opportunities for AI integration that can enhance project outcomes. This matrix could evaluate factors such as potential improvements in efficiency and quality against the costs and complexity of implementing AI solutions, helping to pinpoint where tools such as ML could be most beneficial.Invest in dedicated AI solutions: The third action is to invest in dedicated AI solutions to support various roles and tasks in software engineering. These solutions can be tools, platforms, services, or libraries that use AI to facilitate or automate activities such as design, coding, testing, debugging, integration, deployment, and monitoring. These solutions can increase the productivity, quality, and creativity of developers, as well as reduce errors and rework. Some examples of AI solutions for software engineering are intelligent assistants, code generators, code analyzers, and automatic testers. For example, implementing platforms such as TensorFlow or PyTorch for ML projects can aid in tasks ranging from predictive analytics to automated testing, thus boosting productivity and reducing the likelihood of errors.Expand the data engineering pipeline: The fourth action is to expand the data engineering pipeline to leverage AI enrichment and enable intelligent applications. Th is means that developers should collect, store, process, analyze, and visualize data efficiently and securely, using AI to extract insights and value from data. In addition, developers should integrate the data with AI models, and use these models to provide intelligent features to applications, such as recommendations, customizations, predictions, and detections. Intelligent applications can improve performance, usability, and end-user satisfaction. By integrating comprehensive data management tools such as Apache Kafka for real-time data streaming and processing, teams can enhance their applications with features such as real-time analytics and dynamic UX customization.Foster collaboration between development and model-building teams: The fifth action is to foster collaboration between development teams and model-building teams to avoid overlapping responsibilities and ensure smooth deployment. This involves creating a culture of collaboration and communication, where both teams understand their roles and responsibilities, and work together to implement AI solutions. This can help avoid conflicts, reduce delays, and ensure that the AI models are correctly integrated into the soft ware applications. Establish regular sync-up meetings between software developers and AI model builders to ensure alignment and seamless integration of AI capabilities into applications. These meetings can help clarify responsibilities, share insights, and quicken the pace of development.Continuously train and upskill the team: The sixth action is to continuously train and upskill the team in AI technologies. This involves providing regular training sessions, workshops, and resources to help developers learn about the latest AI techniques and tools. It also involves creating a learning culture, where developers are encouraged to learn and share their knowledge with others. This can help to build a team of skilled AI practitioners, who can effectively use AI to improve software development. Create ongoing educational programs and provide access to courses from platforms such as Coursera or Udemy that cover advanced AI topics. Encouraging participation in hackathons or internal projects focused on AI can also foster practical experience and innovation.Effectively integrating AI into software development teams is a complex task that requires a strategic and diligent approach. It’s not just about adopting new tools or technologies but transforming the mindset, processes, skills, and culture of the team. To navigate this transformation successfully, a structured checklist can serve as a valuable guide, ensuring that every critical aspect is addressed systematically:1. Assessment and planning:Identify objectives: Define clear objectives for integrating AI into your development processes. Determine what problems you aim to solve or what improvements you want to achieve.Evaluate readiness: Assess your team’s current capabilities, infrastructure, and tools to determine readiness for AI integration. Stakeholder alignment: Ensure all stakeholders understand the benefits and implications of AI integration. Secure their support and alignment with the project goals.2. Data collection and management:   Identify data sources: Determine the types of data that will be valuable for AI-driven insights (e.g., source code data, user interaction data, performance data).   Set up data pipelines: Implement data pipelines using tools such as Apache Kafka for real-time data collection and streaming.   Ensure data quality: Establish processes for data cleaning, normalization, and validation to maintain high data quality.3. Infrastructure and tools:Select AI tools: Choose appropriate AI-powered tools for different stages of the development process, such as GitHub Copilot for code generation, Testim for automated testing, and Dynatrace for performance monitoring.Scalable storage solutions: Implement scalable storage solutions such as Amazon S3 or Google Cloud Storage to handle large volumes of data.Processing frameworks: Utilize data processing frameworks such as Apache Spark or Flink for efficient data processing.4. Model development and integration:Build AI models: Use ML frameworks such as TensorFlow, PyTorch, and scikit-learn to develop AI models that can analyze data and generate insights.Integrate AI models: Integrate AI models into your development environment to provide intelligent features such as code suggestions, anomaly detection, and predictive analytics.5. Testing and validation:Automated testing tools: Implement AI-powered automated testing tools such as Testim to create and maintain test cases, ensuring the software remains robust and error-free.Continuous integration: Set up continuous integration (CI) pipelines to automatically run tests and validate code changes.Performance monitoring: Use tools such as New Relic AI and Dynatrace to monitor application performance and detect issues in real-time.6. Security and compliance:Vulnerability scanning: Use AI-powered security tools such as Snyk and Veracode to identify and fix vulnerabilities in the code. Compliance checks: Ensure that AI models and data processing adhere to relevant regulations and standards, such as General Data Protection Regulation (GDPR).7. Deployment and maintenance:Automated deployment: Set up automated deployment pipelines to streamline the release process.Real-time monitoring: Continuously monitor the application in production using tools such as Amazon CloudWatch and Splunk for anomaly detection.Feedback loop: Establish a feedback loop to collect user feedback and performance data, using this information to continuously improve the AI models and development processes.By following these actions, software engineering leaders can effectively integrate AI into their teams and leverage its potential to create innovative, high-quality, and intelligent software applications. This can lead to significant improvements in productivity, quality, creativity, and user satisfaction, as well as provide a competitive edge in today’s increasingly digital and data-driven market.However, it’s important to remember that AI is just a tool that can help solve problems and generate value. The ultimate success of the project depends on the team’s ability to understand user needs, create effective and innovative solutions, and deliver high-quality software. Therefore, AI should be integrated in a way that supports and enhances these goals, rather than replacing them.ConclusionIntegrating AI into software development teams is a multifaceted process that goes beyond adopting cutting-edge tools. It involves fostering a culture of collaboration, continuous learning, and innovation, as well as ensuring robust data management, security, and compliance frameworks. By following a structured approach—starting with clear objectives and readiness assessments, implementing advanced tools and frameworks, and maintaining continuous validation and feedback loops—software engineering leaders can unlock AI's full potential. This integration will not only enhance productivity and quality but also empower teams to create intelligent, high-performing applications that meet user needs and provide a competitive edge. Ultimately, AI should be a powerful enabler, complementing human creativity and expertise to deliver software solutions that truly excel.Author BioAnderson Soares Furtado Oliveira is an experienced executive, AI strategist, and machine learning engineer specializing in AI governance, risk management, and compliance. As a board member at The Global Center for Risk and Innovation (GCRI) and an AI strategy consultant at G³ AI Global, he co-authored the book PgM Canvas: Transforming Vision into Real Benefits - A Program Management Guide for Leaders and Managers. With over a decade of experience in IT governance (CGEIT) and a focus on integrating AI technologies to drive business growth, he has led numerous AI projects and developed AI governance frameworks. His expertise in digital transformation and national development has equipped him to create innovative solutions and ethical AI applications. Anderson is a PhD student in Computer Science and Computational Mathematics at the University of São Paulo and holds an MBA in Software Engineering Project Management.
Read more
  • 0
  • 0
  • 689

article-image-airflow-ops-best-practices-observation-and-monitoring
Dylan Intorf, Kendrick van Doorn, Dylan Storey
12 Nov 2024
15 min read
Save for later

Airflow Ops Best Practices: Observation and Monitoring

Dylan Intorf, Kendrick van Doorn, Dylan Storey
12 Nov 2024
15 min read
This article is an excerpt from the book, "Apache Airflow Best Practices", by Dylan Intorf, Kendrick van Doorn, Dylan Storey. With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering.IntroductionIn this article, we will continue to explore the application of modern “ops” practices within Apache Airflow, focusing on the observation and monitoring of your systems and DAGs after they’ve been deployed.We’ll divide this observation into two segments – the core Airflow system and individual DAGs. Each segment will cover specific metrics and measurements you should be monitoring for alerting and potential intervention.When we discuss monitoring in this section, we will consider two types of monitoring – active and suppressive.In an active monitoring scenario, a process will actively check a service’s health state, recording its state and potentially taking action directly on the return value.In a suppressive monitoring scenario, the absence of a state (or state change) is usually meaningful. In these scenarios, the monitored application sends an active schedule to a process to inform it that it is OK, usually suppressing an action (such as an alert) from occurring.This chapter covers the following topics:Monitoring core Airflow componentsMonitoring your DAGsTechnical requirementsBy now, we expect you to have a good understanding of Airflow and its core components, along with functional knowledge in the deployment and operation of Airflow and Airflow DAGs.We will not be covering specific observability aggregators or telemetry tools; instead, we will focus on the activities you should be keeping an eye on. We strongly recommend that you work closely with your ops teams to understand what tools exist in your stack and how to configure them for capture and alerting your deployments.Monitoring core Airflow componentsAll of the components we will discuss here are critical to ensuring a functioning Airflow deployment. Generally, all of them should be monitored with a bare minimum check of Is it on? and if a component is not, an alert should surface to your team for investigation. The easiest way to check this is to query the REST API on the web server at `/health/`; this will return a JSON object that can be parsed to determine whether components are healthy and, if not, when they were last seen.SchedulerThis component needs to be running and working effectively in order for tasks to be scheduled for execution.When the scheduler service is started, it also starts a `/health` endpoint that can be checked by an external process with an active monitoring approach.The returned signal does not always indicate that the scheduler is working properly, as its state is simply indicative that the service is up and running. There are many scenarios where the scheduler may be operating but unable to schedule jobs; as a result, many deployments will include a canary dag to their deployment that has a single task, acting to suppress an external alert from going off.Import metrics that airflow exposes for you include the following:scheduler.scheduler_loop_duration: This should be monitored to ensure that your scheduler is able to loop and schedule tasks for execution. As this metric increases, you will see tasks beginning to schedule more slowly, to the point where you may begin missing SLAs because tasks fail to reach a schedulable state.scheduler.tasks.starving: This indicates how many tasks cannot be scheduled because there are no slots available. Pools are a mechanism that Airflow uses to balance large numbers of submitted task executions versus a finite amount of execution throughput. It is likely that this number will not be zero, but being high for extended periods of time may point to an issue in how DAGs are being written to schedule work.scheduler.tasks.executable: This indicates how many tasks are ready for execution (i.e., queued). This number will sometimes not be zero, and that is OK, but if the number increases and stays high for extended periods of time, it indicates that you may need additional computer resources to handle the load. Look at your executor to increase the number of workers it can run. Metadata databaseThe metadata database is used to store and track all of the metadata for your Airflow deployments’ previous DAG/task executions, along with information about your environment’s roles and permissions. Losing data from this database can interrupt normal operations and cause unintended consequences, with DAG runs being repeated.While critical, because it is architecturally ubiquitous, the database is also least likely to encounter issues, and if it does, they are absolutely catastrophic in nature.We generally suggest you utilize a managed service for provisioning and operating your backing database, ensuring that a disaster recovery plan for your metadata database is in place at all times.Some active areas to monitor on your database include the following:Connection pool size/usage: Monitor both the connection pool size and usage over time to ensure appropriate configuration, and identify potential bottlenecks or resource contention arising from Airflow components’ concurrent connections.Query performance: Measure query latency to detect inefficient queries or performance issues, while monitoring query throughput to ensure effective workload handling by the database.Storage metrics: Monitor the disk space utilization of the metadata database to ensure that it has sufficient storage capacity. Set up alerts for low disk space conditions to prevent database outages due to storage constraints.Backup status: Monitor the status of database backups to ensure that they are performed regularly and successfully. Verify backup integrity and retention policies to mitigate the risk of data loss if there is a database failure.TriggererThe Triggerer instance manages all of the asynchronous operations of deferrable operators in a deferred state. As such, major operational concerns generally relate to ensuring that individual deferred operators don’t cause major blocking calls to the event loop. If this occurs, your deferrable tasks will not be able to check their state changes as frequently, and this will impact scheduling performance.Import metrics that airflow exposes for you include the following:triggers.blocked_main_thread: The number of triggers that have blocked the main thread. This is a counter and should monotonically increase over time; pay attention to large differences between recording (or quick acceleration) counts, as it’s indicative of a larger problem.triggers.running: The number of triggers currently on a triggerer instance. This metric should be monitored to determine whether you need to increase the number of triggerer instances you are running. While the official documentation claims that up to tens of thousands of triggers can be on an instance, the common operational number is much lower. Tune at your discretion, but depending on the complexity of your triggers, you may need to add a new instance for every few hundred consistent triggers you run.Executors/workersDepending on the executor you use, you will need to monitor your executors and workers a bit differently.The Kubernetes executor will utilize the Kubernetes API to schedule tasks for execution; as such, you should utilize the Kubernetes events and metrics servers to gather logs and metrics for your task instances. Common metrics to collect on an individual task are CPU and memory usage. This is crucial for tuning requests or mutating individual task resource requests to ensure that they execute safely.The Celery worker has additional components and long-lived processes that you need to metricize. You should monitor an individual Celery worker’s memory and CPU utilization to ensure that it is not over- or under-provisioned, tuning allocated resources accordingly. You also need to monitor the message broker (usually Redis or RabbitMQ) to ensure that it is appropriately sized. Finally, it is critical to measure the queue length of your message broker and ensure that too much “back pressure” isn’t being created in the system. If you find that your tasks are sitting in a queued state for a long period of time and the queue length is consistently growing, it’s a sign that you should start an additional Celery worker to execute on scheduled tasks. You should also investigate using the native Celery monitoring tool Flower (https://flower.readthedocs.io/en/latest/) for additional, more nuanced methods of monitoring.Web serverThe Airflow web server is the UI for not just your Airflow deployment but also the RESTful interface. Especially if you happen to be controlling Airflow scheduling behavior with API calls, you should keep an eye on the following metrics:Response time: Measure the time taken for the API to respond to requests. This metric indicates the overall performance of the API and can help identify potential bottlenecks.Error rate: Monitor the rate of errors returned by the API, such as 4xx and 5xx HTTP status codes. High error rates may indicate issues with the API implementation or underlying systems.Request rate: Track the rate of incoming requests to the API over time. Sudden spikes or drops in request rates can impact performance and indicate changes in usage patterns.System resource utilization: Monitor resource utilization metrics such as CPU, memory, disk I/O, and network bandwidth on the servers hosting the API. High resource utilization can indicate potential performance bottlenecks or capacity limits.Throughput: Measure the number of successful requests processed by the API per unit of time. Throughput metrics provide insights into the API’s capacity to handle incoming traffic.Now that you have some basic metrics to collect from your core architectural components and can monitor the overall health of an application, we need to monitor the actual DAGs themselves to ensure that they function as intended.Monitoring your DAGsThere are multiple aspects to monitoring your DAGs, and while they’re all valuable, they may not all be necessary. Take care to ensure that your monitoring and alerting stack match your organizational needs with regard to operational parameters for resiliency and, if there is a failure, recovery times. No matter how much or how little you choose to implement, knowing that your DAGs work and if and how they fail is the first step in fixing problems that will arise.LoggingAirflow writes logs for tasks in a hierarchical structure that allows you to see each task’s logs in the Airflow UI. The community also provides a number of providers to utilize other services for backing log storage and retrieval. A complete list of supported providers is available at https://airflow.apache.org/docs/apache-airflow-providers/core-extensions/logging.html.Airflow uses the standard Python logging framework to write logs. If you’re writing custom operators or executing Python functions with a PythonOperator, just make sure that you instantiate a Python logger instance, and then the associated methods will handle everything for you.AlertingAirflow provides mechanisms for alerting on operational aspects of your executing workloads that can be configured within your DAG:Email notifications: Email notifications can be sent if a task is put into a marked or retry state with the `email_on_failure` or `email_on_retry` state, respectively. These arguments can be provided to all tasks in the DAG with the `default_args` key work in the DAG, or individual tasks by setting the keyword argument individually.Callbacks: Callbacks are special actions that are executed if a specific state change occurs. Generally, these callbacks should be thoughtfully leveraged to send alerts that are critical operationally:on_success_callback: This callback will be executed at both the task and DAG levels when entering a successful state. Unless it is critical that you know whether something succeeds, we generally suggest not using this for alerting.on_failure_callback: This callback is invoked when a task enters a failed state. Generally, this callback should always be set and, in critical scenarios, alert on failures that require intervention and support.on_execute_callback: This is invoked right before a task executes and only exists at the task level. Use sparingly for alerting, as it can quickly become a noisy alert when overused.on_retry_callback: This is invoked when a task is placed in a retry state. This is another callback to be cautious about as an alert, as it can become noisy and cause false alarms.sla_miss_callback: This is invoked when a DAG misses its defined SLA. This callback is only executed at the end of a DAG’s execution cycle so tends to be a very reactive notification that something has gone wrong.SLA monitoringAs awesome of a tool as Airflow is, it is a well-known fact in the community that SLAs, while largely functional, have some unfortunate details with regard to implementation that can make them problematic at best, and they are generally regarded as a broken feature in Airflow. We suggest that if you require SLA monitoring on your workflows, you deploy a CRON job monitoring tool such as healthchecks (https://github.com/healthchecks/healthchecks) that allows you to create suppressive alerts for your services through its rest API to manage SLAs. By pairing this third- party service with either HTTP operators or simple requests from callbacks, you can ensure that your most critical workflows achieve dynamic and resilient SLA alerting.Performance profilingThe Airflow UI is a great tool for profiling the performance of individual DAGs:The Gannt chart view: This is a great visualization for understanding the amount of time spent on individual tasks and the relative order of execution. If you’re worried about bottlenecks in your workflow, start here.Task duration: This allows you to profile the run characteristics of tasks within your DAG over a historical period. This tool is great at helping you understand temporal patterns in execution time and finding outliers in execution. Especially if you find that a DAG slows down over time, this view can help you understand whether it is a systemic issue and which tasks might need additional development.Landing times: This shows the delta between task completion and the start of the DAG run. This is an un-intuitive but powerful metric, as increases in it, when paired with stable task durations in upstream tasks, can help identify whether a scheduler is under heavy load and may need tuning.Additional metrics that have proven to be useful (but may need to be calculated) include the following:Task startup time: This is an especially useful metric when operating with a Kubernetes executor. To calculate this, you will need to calculate the difference between `start_date` and `execution_date` on each task instance. This metric will especially help you identify bottlenecks outside of Airflow that may impact task run times.Task failure and retry counts: Monitoring the frequency of task failures and retries can help identify information about the stability and robustness of your environment. Especially if these types of failure can be linked back to patterns in time or execution, it can help debug interactions with other services.DAG parsing time: Monitoring the amount of time a DAG takes to parse is very important to understand scheduler load and bottlenecks. If an individual DAG takes a long time to load (either due to heavy imports or long blocking calls being executed during parsing), it can have a material impact on the timeliness of scheduling tasks.ConclusionIn this article, we covered some essential strategies to effectively monitor both the core Airflow system and individual DAGs post-deployment. We highlighted the importance of active and suppressive monitoring techniques and provided insights into the critical metrics to track for each component, including the scheduler, metadata database, triggerer, executors/workers, and web server. Additionally, we discussed logging, alerting mechanisms, SLA monitoring, and performance profiling techniques to ensure the reliability, scalability, and efficiency of Airflow workflows. By implementing these monitoring practices and leveraging the insights gained, operators can proactively manage and optimize their Airflow deployments for optimal performance and reliability.Author BioDylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.
Read more
  • 2
  • 0
  • 1670

article-image-mastering-threat-detection-with-virustotal-a-guide-for-soc-analysts
Mostafa Yahia
11 Nov 2024
15 min read
Save for later

Mastering Threat Detection with VirusTotal: A Guide for SOC Analysts

Mostafa Yahia
11 Nov 2024
15 min read
This article is an excerpt from the book, "Effective Threat Investigation for SOC Analysts", by Mostafa Yahia. This is a practical guide that enables SOC professionals to analyze the most common security appliance logs that exist in any environment.IntroductionIn today’s cybersecurity landscape, threat detection and investigation are essential for defending against sophisticated attacks. VirusTotal, a powerful Threat Intelligence Platform (TIP), provides security analysts with robust tools to analyze suspicious files, domains, URLs, and IP addresses. Leveraging VirusTotal’s extensive security database and community-driven insights, SOC analysts can efficiently detect potential malware and other cyber threats. This article delves into the ways VirusTotal empowers analysts to investigate suspicious digital artifacts and enhance their organization’s security posture, focusing on critical features such as file analysis, domain reputation checks, and URL scanning.Investigating threats using VirusTotalVirusTotal is a  Threat Intelligence Platform (TIP) that allows security analysts to analyze suspicious files, hashes, domains, IPs, and URLs to detect and investigate malware and other cyber threats. Moreover, VirusTotal is known for its robust automation capabilities, which allow for the automatic sharing of this intelligence with the broader security community. See Figure 14.1:Figure 14.1 – The VirusTotal platform main web pageThe  VirusTotal scans submitted artifacts, such as hashes, domains, URLs, and IPs, against more than 88 security solution signatures and intelligence databases. As a SOC analyst, you should use the VirusTotal platform to investigate the  following:Suspicious filesSuspicious domains and URLsSuspicious outbound IPsInvestigating suspicious filesVirusTotal allows cyber security analysts to analyze suspicious files either by uploading the file or searching for the file hash’s reputation. Either after uploading a fi le or submitting a file hash for analysis, VirusTotal scans it against multiple antivirus signature databases and predefined YARA rules and analyzes the file behavior by using different sandboxes.After the analysis of the submitted file is completed, VirusTotal provides analysts with general information about the analyzed file in five tabs; each tab contains a wealth of information. See Figure 14.2:Figure 14.2 – The details and tabs provided by analyzing a file on VirusTotalAs you see in the preceding figure, aft er submitting the file to the VirusTotal platform for analysis, the file was analyzed against multiple vendors’ antivirus signature databases, Sigma detection rules, IDS detection rules, and several sandboxes for dynamic analysis.The preceding figure is the first page provided by VirusTotal after submitting the file. As you can see, the first section refers to the most common name of the submitted file hash, the file hash, the number of antivirus vendors and sandboxes that flagged the submitted hash as malicious, and tags of the suspicious activities performed by the file when analyzed on the sandboxes, such as the persistence tag, which means that the executable file tried to maintain persistence. See Figure 14.3:Figure 14.3 – The first section of the first page from VirusTotal when analyzing a fileThe first tab of the five tabs provided by the VirusTotal platform that appear is the DETECTION tab. The first parts of the DETECTION tab include the matched Sigma rules, IDS rules, and dynamic analysis results from the sandboxes. See Figure 14.4:Figure 14.4 – The first parts of the DETECTION tabThe Sigma rules are threat detection rules designed to analyze system logs. Sigma was built to allow collaboration between the SOC teams as it allows them to share standardized detection rules regardless of the SIEM in place to detect the various threats by using the event logs. VirusTotal sandboxes store all event logs that are generated during the file detonation, which are later used to test against the list of the collected Sigma rules from different repositories. VirusTotal users will find the list of Sigma rules matching a submitted file in the DETECTION tab. As you can see in the preceding figure, it appears that the executed file has performed certain actions that have been identified by running the Sigma rules against the sandbox logs. Specifically, it disabled the Defender service, created an Auto-Start Extensibility Point (ASEP) entry to maintain persistence, and created another executable.Then as can be  observed, VirusTotal shows that the Intrusion Detection System (IDS) rules successfully detected the presence of Redline info-stealer malware's Command and Control (C&C) communication that matched four IDS rules.Important Note: It is noteworthy that both Sigma and IDS rules are assigned a severity level, and analysts can easily view the matched rule as well as the number of matches.Following the successful matching against IDS rules, you will find the dynamic sandboxes’ detections of the submitted file. In this case, the sandboxes categorized the submitted file/hash as info-stealer malware.Finally, the last part of the DETECTION tab is Security vendors’ analysis. See Figure 14.5:Figure 14.5 – The Security vendors’ analysis sectionAs you see in the preceding figure, the submitted fi le or hash is flagged as malicious by several security vendors and most of them label the given file as a Redline info-stealer malware.The second tab is the DETAILS tab, which includes the Basic properties section on the given file, which includes the file hashes, file type, and file size. That tab also includes times such as file creation, first submission on the platform, last submission on the platform, and last analysis times. Additionally, this tab provides analysts with all the filenames associated with previous submissions of the same file. See Figure 14.6:Figure 14.6 – The first three sections of the DETAILS tabMoreover, the DETAILS tab provides analysts with useful information such as signature verification, enabling identification of whether the file is digitally signed, a key indicator of its authenticity and trustworthiness. Additionally, the tab presents crucial insights into the imported Dynamic Link Libraries (DLLs) and called libraries, allowing analysts to understand the file intents.The third tab is the RELATIONS tab, which includes the IoCs of the analyzed file, such as the domains and IPs that the file is connected with, the files bundled with the executable, and the files dropped by the executable. See Figure 14.7:Figure 14.7 – The RELATIONS tabImportant noteWhen analyzing a malicious file, you can use the connected IPs and domains to scope the infection in your environment by using network security system logs such as the firewall and the proxy logs. However, not all the connected IPs and domains are necessarily malicious and may also be legitimate domains or IPs used by the malware for malicious intents.At the bottom of the RELATIONS tab, VirusTotal provides a great graph that binds the given file and all its relations into one graph, which should facilitate your investigations. To maximize the graph in a new tab, click on it. See Figure 14.8:Figure 14.8 – VT Relations graphThe fourth tab is the BEHAVIOR tab, which contains the detailed sandbox analysis of the submitted file. This report is presented in a structured format and includes the tags, MITRE ATT&CK Tactics and Techniques conducted by the executed file, matched IDS and Sigma rules, dropped files, network activities, and process tree information that was observed during the analysis of the given file. See Figure 14.9:Figure 14.9 – The BEHAVIOR tabRegardless of the matched signatures of security vendors, Sigma rules, and IDS rules, the BEHAVIOR tab allows analysts to examine the file’s actions and behavior to determine whether it is malicious or not. This feature is especially critical in the investigation of zero-day malware, where traditional signature-based detection methods may not be effective, and in-depth behavior analysis is required to identify and respond to potential threats.The fifth tab is the COMMUNITY tab, which allows analysts to contribute to the VirusTotal community with their thoughts and to read community members’ thoughts regarding the given file. See Figure 14.10:Figure 14.10 – The COMMUNITY tabAs you can see, we have two comments from two sandbox vendors indicating that the file is malicious and belongs to the Redline info-stealer family according to its behavior during the dynamic analysis of the file.Investigating suspicious domains and URLsA SOC analyst may depend on the VirusTotal platform to investigate suspicious domains and URLs. You can analyze the suspicious domain or URL on the VirusTotal platform either by entering it into the URL or Search form.During the Investigating suspicious files section, we noticed while navigating the RELATION tab that the file had established communication with the hueref[.]eu domain. In this section, we will investigate the hueref[.]eu domain by using the VirusTotal platform. See Figure 14.11:Figure 14.11 – The DETECTION tabUpon submitting the suspicious domain to the Search form in VirusTotal, it was discovered that the domain had several tags indicating potential security risks. These tags refer to the web domain category. As you can see in the preceding screenshot, there are two tags indicating that the domain is malicious.The first provided tab is the DETECTION tab, which include the Security vendors’ analysis. In this case, several security vendors labeled the domain as Malware or a Malicious domain.The second tab is the DETAILS tab, which includes information about the given domain such as the web domain categories from different sources, the last DNS records of the domain, and the domain Whois lookup results. See Figure 14.12:Figure 14.12 – The DETAILS tabThe third tab is the RELATIONS tab, which provides analysts with all domain relations, such as the DNS resolving the IP(s) of the given domain, along with their reputations, and the files that communicated with the given domain when previously analyzed in the VirusTotal sandboxes, along with their reputations. See Figure 14.13.Figure 14.13 – The RELATIONS tabThe RELATIONS tab is very useful, especially when investigating potential zero-day malicious domains that have not yet been detected and fl agged by security vendors. By analyzing the domain’s resolving IP(s) and their reputation, as well as any connections between the domain and previously analyzed malicious files on the VT platform, SOC analysts can quickly and accurately identify potential threats that potentially indicate a C&C server domain.At the bottom of the RELATIONS tab, you will find the same VirusTotal graph discussed in the previous section.The fourth tab is the COMMUNITY tab, which allows you to contribute to the VirusTotal community with your thoughts and read community members’ thoughts regarding the given domain.Investigating suspicious outbound IPsAs a security analyst, you may depend on the VirusTotal platform to investigate suspicious outbound IPs that your internal systems may have communicated with. By entering the IP into the search form, the VirusTotal platform will show you nearly the same tab details provided when analyzing domains in the last section.In this section, we will investigate the IP of the hueref[.]eu domain. As we mentioned, the tabs and details provided by VirusTotal when analyzing an IP are the same as those provided when analyzing a domain. Moreover, the RELATIONS tab in VirusTotal provides all domains hosted on this IP and their reputations. See Figure 14.14:Figure 14.14 – Domains hosted on the same IP and their reputationsImportant noteIt’s not preferred to depend on the VirusTotal platform to investigate suspicious inbound IPs such as port-scanning IPs and vulnerability-scanning IPs. This is due to the fact that VirusTotal relies on the reputation assessments provided by security vendors, which are particularly effective in detecting outbound IPs such as those associated with C&C servers or phishing activities.By the end of this section, you should have learned how to investigate suspicious files, domains, and outbound IPs by using the VirusTotal platform.ConclusionIn conclusion, VirusTotal is an invaluable resource for SOC analysts, enabling them to streamline threat investigations by analyzing artifacts through multiple detection engines and sandbox environments. From identifying malicious file behavior to assessing suspicious domains and URLs, VirusTotal’s capabilities offer comprehensive insights into potential threats. By integrating this tool into daily workflows, security professionals can make data-driven decisions that enhance response times and threat mitigation strategies. Ultimately, VirusTotal not only assists in pinpointing immediate risks but also contributes to a collaborative, community-driven approach to cybersecurity.Author BioMostafa Yahia is a passionate threat investigator and hunter who hunted and investigated several cyber incidents. His experience includes building and leading cyber security managed services such as SOC and threat hunting services. He earned a bachelor's degree in computer science in 2016. Additionally, Mostafa has the following certifications: GCFA, GCIH, CCNA, IBM Qradar, and FireEye System engineer. Mostafa also provides free courses and lessons through his Youtube channel. Currently, he is the cyber defense services senior leader for SOC, Threat hunting, DFIR, and Compromise assessment services in an MSSP company.
Read more
  • 0
  • 0
  • 1912

article-image-mastering-promql-a-comprehensive-guide-to-prometheus-query-language
Rob Chapman, Peter Holmes
07 Nov 2024
15 min read
Save for later

Mastering PromQL: A Comprehensive Guide to Prometheus Query Language

Rob Chapman, Peter Holmes
07 Nov 2024
15 min read
This article is an excerpt from the book, "Observability with Grafana", by Rob Chapman, Peter Holmes. This book provides a holistic understanding of observability concepts using the Grafana Labs tools, teaching you how to fully leverage the LGTM stack.Introduction PromQL, or Prometheus Query Language, is a powerful tool designed to work with Prometheus, an open-source systems monitoring and alerting toolkit. Initially developed by SoundCloud in 2012 and later accepted by the Cloud Native Computing Foundation in 2016, Prometheus has become a crucial component of modern infrastructure monitoring. PromQL allows users to query data stored in Prometheus, enabling the creation of insightful dashboards and setting up alerts based on the performance metrics of applications and systems. This article will explore the core functionalities of PromQL, including how it interacts with metrics data and how it can be used to effectively monitor and analyze system performance. Introducing PromQL Prometheus was initially developed by SoundCloud in 2012; the project was accepted by the Cloud Native Computing Foundation in 2016 as the second incubated project (after Kubernetes), and version 1.0 was released shortly after. PromQL is an integral part of Prometheus, which is used to query stored data and produce dashboards and alerts. Before we delve into the details of the language, let’s briefly look at the following ways in which Prometheus-compatible systems  interact with metrics data: Ingesting metrics: Prometheus-compatible systems accept a timestamp, key-value labels, and a sample value. As the details of the Prometheus Time Series Database (TSDB) are  quite complicated, the following diagram shows a simplified example of how an individual sample for a metric is stored once it has been ingested:           Figure 5.1 – A simplified view of metric data stored in the TSDB The labels or dimensions of a metric: Prometheus labels provide metadata to identify data of interest. These labels create metrics, time series, and samples: * Each unique __name__ value creates a metric. In the preceding figure, the metric is app_ frontend_requests. * Each unique set of labels creates a time series. In the preceding figure, the set of all labels is the time series. * A time series will contain multiple samples, each with a unique timestamp. The preceding figure shows a single sample, but over time, multiple samples will be collected for each  time series. * The number of unique values for a metric label is referred to as the cardinality of the l abel. Highly cardinal labels should be avoided, as they signifi cantly increase the storage costs of the metric. The following diagram shows a single metric containing two time series and five samples:        Figure 5.2 – An example of samples from multiple time series In Grafana, we can see a representation of the time series and samples from a metric. To do this, follow these steps: 1. In your Grafana instance, select Explore in the menu. 2. Choose your Prometheus data source, which will be labeled as grafanacloud-<team>prom (default). 3. In the Metric dropdown, choose app_frontend_requests_total, and under Options, set Format to Table, and then click on Run query. Th is will show you all the samples and time series in the metric over the selected time range. You should see data like this:    Figure 5.3 – Visualizing the samples and time series that make up a metric Now that we understand the data structure, let’s explore PromQL. An overview of PromQL features In this section, we will take you through the features that PromQL has. We will start with an explanation of the data types, and then we will look at how to select data, how to work on multiple datasets, and how to use functions. As PromQL is a query language, it’s important to know how to manipulate data to produce alerts and dashboards. Data types PromQL offers three data types, which are important, as the functions and operators in PromQL will work diff erently depending on the data types presented: Instant vectors are a data type that stores a set of time series containing a single sample, all sharing the same timestamp – that is, it presents values at a specifi c instant in time:                             Figure 5.4 – An instant vector Range vectors store a set of time series, each containing a range of samples with different timestamps:                              Figure 5.5 – Range vectors Scalars are simple numeric values, with no labels or timestamps involved. Selecting data PromQL offers several tools for you to select data to show in a dashboard or a list, or just to understand a system’s state. Some of these are described in the following table: Table 5.1 – The selection operators available in PromQL In addition to the operators that allow us to select data, PromQL offers a selection of operators to compare multiple sets of data. Operators between two datasets Some data is easily provided by a single metric, while other useful information needs to be created from multiple metrics. The following operators allow you to combine datasets. Table 5.2 – The comparison operators available in PromQL Vector matching is an initially confusing topic; to clarify it, let’s consider examples for the three cases of vector matching – one-to-one, one-to-many/many-to-one, and many-to-many. By default, when combining vectors, all label names and values are matched. This means that for each element of the vector, the operator will try to find a single matching element from the second vector.  Let’s consider a simple example: Vector A: 10{color=blue,smell=ocean} 31{color=red,smell=cinnamon} 27{color=green,smell=grass} Vector B: 19{color=blue,smell=ocean} 8{color=red,smell=cinnamon} ‚ 14{color=green,smell=jungle} A{} + B{}: 29{color=blue,smell=ocean} 39 {color=red,smell=cinnamon} A{} + on (color) B{} or A{} + ignoring (smell) B{}: 29{color=blue} 39{color=red} 41{color=green} When color=blue and smell=ocean, A{} + B{} gives 10 + 19 = 29, and when color=red and smell=cinnamon, A{} + B{} gives 31 + 8 = 29. The other elements do not match the two vectors so are ignored. When we sum the vectors using on (color), we will only match on the color label; so now, the two green elements match and are summed. This example works when there is a one-to-one relationship of labels between vector A and vector B. However, sometimes there may be a many-to-one or one-to-many relationship – that is, vector A or vector B may have more than one element that matches the other vector. In these cases, Prometheus will give an error, and grouping syntax must be used. Let’s look at another example to illustrate this: Vector A: 7{color=blue,smell=ocean} 5{color=red,smell=cinamon} 2{color=blue,smell=powder} Vector B: 20{color=blue,smell=ocean} 8{color=red,smell=cinamon} ‚ 14{color=green,smell=jungle} A{} + on (color) group_left  B{}: 27{color=blue,smell=ocean} 13{color=red,smell=cinamon} 22{color=blue,smell=powder} Now, we have two different elements in vector A with color=blue. The group_left command will use the labels from vector A but only match on color. This leads to the third element of the combined vector having a value of 22, when the item matching in vector B has a different smell. The group_right operator will behave in the opposite direction. The final option is a many-to-many vector match. These matches use the logical operators and, unless, and or to combine parts of vectors A and B. Let’s see some examples: Vector A: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} 27{color=green,smell=grass} Vector B: 19{color=blue,smell=ocean} 8{color=red,smell=cinamon} ‚ 14{color=green,smell=jungle} A{} and B{}: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} A{} unless B{}: 27{color=green,smell=grass} A{} or B{}: 10{color=blue,smell=ocean} 31{color=red,smell=cinamon} 27{color=green,smell=grass} 14{color=green,smell=jungle} Unlike the previous examples, mathematical operators are not being used here, so the values of the elements are the values from vector A, but only the elements of A that match the logical condition in B are returned. ConclusionPromQL is an essential component of Prometheus, offering users a flexible and powerful means of querying and analyzing time-series data. By understanding its data types and operators, users can craft complex queries that provide deep insights into system performance. The language supports a variety of data selection and comparison operations, allowing for precise monitoring and alerting. Whether working with instant vectors, range vectors, or scalars, PromQL enables developers and operators to optimize their use of Prometheus for monitoring and alerting, ensuring systems remain performant and reliable. As organizations continue to embrace cloud-native architectures, mastering PromQL becomes increasingly vital for maintaining robust and efficient systems. Author BioRob Chapman is a creative IT engineer and founder at The Melt Cafe, with two decades of experience in the full application life cycle. Working over the years for companies such as the Environment Agency, BT Global Services, Microsoft, and Grafana, Rob has built a wealth of experience on large complex systems. More than anything, Rob loves saving energy, time, and money and has a track record for bringing production-related concerns forward so that they are addressed earlier in the development cycle, when they are cheaper and easier to solve. In his spare time, Rob is a Scout leader, and he enjoys hiking, climbing, and, most of all, spending time with his family and six children.Peter Holmes is a senior engineer with a deep interest in digital systems and how to use them to solve problems. With over 16 years of experience, he has worked in various roles in operations. Working at organizations such as Boots UK, Fujitsu Services, Anaplan, Thomson Reuters, and the NHS, he has experience in complex transformational projects, site reliability engineering, platform engineering, and leadership. Peter has a history of taking time to understand the customer and ensuring Day-2+ operations are as smooth and cost-effective as possible.
Read more
  • 0
  • 0
  • 1058

article-image-mastering-the-api-life-cycle-a-comprehensive-guide-to-design-implementation-release-and-maintenance
Bruno Pedro
06 Nov 2024
15 min read
Save for later

Mastering the API Life Cycle: A Comprehensive Guide to Design, Implementation, Release, and Maintenance

Bruno Pedro
06 Nov 2024
15 min read
This article is an excerpt from the book, "Building an API Product", by Bruno Pedro. Build cutting-edge API products confidently, excelling in today's competitive market with this comprehensive guide on API fundamentals, inner workings, and steps for successful API product development.Introduction The life of an API product consists of a series of stages. Those stages form a cycle that starts with the initial conception of the API product and ends with the retirement of the API. The name of this sequence of stages is called a life cycle. This term started to gain popularity in software and product development in the 1980s. It’s used as a common framework to align the different participants during the life of a software application or product. Each stage of the API life cycle has specific goals, deliverables, and activities that must be completed before advancing to the next stage. There are many variations on the concept of API life cycles. I use my own version to simplify learning and focus on what is essential. Over the years, I have distilled the API life cycle into four easy-to-understand stages.  They are the design, implementation, release, and maintenance stages. Keep reading to gain an overview of what each of the stages looks like.  Figure 4.1 – The API life cycle The goal of this chapter is to provide you with a global overview of what an API life cycle is. You will see each one of the stages of the API life cycle as a transition and not simply an isolated step. You will first learn about the design stage and understand how it’s foundational to the success of an API product. Th en, you’ll continue o n to the implementation stage, where you’ll learn that a big part of an API server can be generated. After that, the chapter explores the release stage, where you’ll learn the importance of finding the right distribution model. Finally, you’ll understand the importance of versioning and sunsetting your API in the maintenance stage. After reading the chapter, you will understand and be able to recognize the API life cycle’s diff erent stages. You will understand how each API life cycle stage connects to the others. You will also know the participants and stakeholders of each stage of the API life cycle. Finally, you will know the most critical aspects of each stage of the API life cycle. In this article, you’ll learn about the four stages of the API life cycle: Design Implement Release Maintain  Design The first stage of the API life cycle is where you decide what you will build. You can view the design stage as a series of steps where your view of what your API will become gets more refined and validated. At the end of the design stage, you will be able to confidently implement your API, knowing that it’s aligned with the needs of your business and your customers. The steps I take in the design stage are as follows: Ideation Strategy Definition Validation Specification These steps help me advance in holistically designing the API, involving as many different stakeholders as possible so I get a complete alignment. I usually start with a rough idea of what the ideal API would look like. Then I start asking different stakeholders as many questions as possible to understand whether my initial assumptions were correct. Something I always ask is why an API should be built. Even though it looks like a simple question, its answer can reveal the real intentions behind building the API. Also, the answer is different depending on whom you ask the question. Your job is to synthesize the information you gather and document pieces of evidence that back up the decisions you make about the API design. You will, at this stage, interview as many stakeholders as possible. They can include potential API users, engineers who work with you, and your company’s leadership team. The goal is to find out why you’re building the API and to document it. Once you know why you’re building the API, you’ll learn what the API will look like to fit the needs of potential users. To learn what API users need, identify the personas you want to serve and then put yourself in their shoes. You’ve already seen a few proto-personas in Chapter 2. In this API life cycle stage, you draw from those generic personas and identify your API users. You then contact people representing your API user personas and interview them. During the interviews, you should understand their JTBDs, the challenges they face during their work, and the tools they use. From the information you obtain, you can infer the benefits they would get from the API you’re building and how they would use the API. This last piece of information is critical because it lets you define the architectural style of the API. By knowing what tools your user personas use daily, you can make an informed decision about the architectural style of your API. Architectural styles are how you identify the technology and type of communication that the API will use. For example, REST is one architectural style that lets API consumers interact with remote resources by executing one of the HTTP verbs. Among those verbs, there’s one that’s natively supported by web browsers—HTTP GET. So, if you identify that a user persona wants to use a web browser to consume your API, then you will want to follow the REST architectural style and limit it to HTTP GET. Otherwise, that user persona won’t be able to use your API directly from their tool of choice. Something else you’ll want to define is the capabilities your API will offer users. Defining capabilities is an exercise that combines the information you gathered from interviews. You translate JTBDs, benefits, and behaviors into a set of capabilities that your API will have. Ideally, those capabilities will cover all the needs of the users whom you interviewed. However, you might want to prioritize the capabilities according to their degree of urgency and the cost of implementation. In any case, you want to validate your assumptions before investing in actually implementing the API. Validation of your API design happens first at a high level, and after a positive review, you attempt a low-level validation. High-level validation involves sharing the definition of the architectural style and capabilities that you have created with the API stakeholders. You present your findings to the stakeholders, explain how you came up with the definitions, and then ask for their review. Sometimes the feedback will make you question your assumptions, and you must refine your definitions. Eventually, you will get to a point where the stakeholders are all aligned with what you think the API should be. At that point, you’re ready to attempt a low-level validation. The difference between a high-level and a low-level validation is the amount of detail you share with your stakeholders and how technical the feedback you expect needs to be. While in high-level validation, you mostly expect an opinion about the design of the API, in low-level validation, you actually want the stakeholders to test the API before you start building it. You do that by creating what is called an API mock server. It allows anyone to make real API requests to a server as if they were making requests to the real API. The mock server responds with data that is not real but has the same shape that the responses of the real API would have. Stakeholders can then test making requests to the mock server from their tools of choice to see how the API would work. You might need to make changes during this low-level validation process until the stakeholders are comfortable with how your API will work. After that, you’re ready to translate the API design into a machine-readable definition document that will be used during the implementation stage of the API life cycle. The type of machine-readable definition depends on the architectural style identified earlier. If, for example, the architectural style is REST, then you’ll create an OpenAPI document. Otherwise, you will work with the type of machine-readable definition most appropriate for the architectural style of the API. Once you have a machine-readable API definition, you’re ready to advance to the implementation stage of the API life cycle. Implementation Having a machine-readable API definition is halfway to getting an entire API server up and running. I won’t focus on any particular architectural style, so you can keep all options open at this point. The goal of the machine-readable definition is to make it easy to generate server code and configuration and give your API consumers a simple way to interact with your API. Some API server solutions require almost no coding as long as you have a machine-readable definition. One type of coding you’ll need to do—or ask an engineer to do—is the code responsible for the business logic behind each API capability. While the API itself can be almost entirely generated, the logic behind each capability must be programmed and linked to the API. Usually, you’ll start with a first version of your API server that can run locally and will be used to iteratively implement all the business logic behind each of the capabilities. Later, you’ll make your API server publicly available to your API consumers. When I say publicly available, I mean that your API consumers should be able to securely make requests. One of the elements of security that you should think about is authentication. Many APIs are fully open to the public without requiring any type of authentication. However, when building an API product, you want to identify who your users are. Monetization is only possible if you know who is making requests to your API. Other security factors to consider have already been covered in Chapter 3. They include things such as logging, monitoring, and rate limiting. In any case, you should always test your API thoroughly during the implementation stage to make sure that everything is working according to plan. One type of test that is particularly useful at this stage is contract testing. This type of test aims to verify whether the API responses include the expected information in the expected format. The word contract is used to describe the API definition as something that both you—the API producers—and your consumers agree to. By performing a contract test, you’ll verify whether the implementation of the API has been done according to what has been designed and defined in the machine-readable document. For example, you can verify whether a particular capability is responding with the type of data that you defined. Before deploying your API to production, though, you want to be more thorough with your testing. Other types of tests that are well suited to be performed at this stage are functional and performance testing. Functional tests, in particular, can help you identify areas of the API that are not behaving as functionally as intended. Testing different elements of your API helps you increase its quality. Nevertheless, there’s another activity that focuses on API quality and relies on tests to obtain insights. Quality assurance, or QA, is one type of activity where you test your API capabilities using different inputs and check whether the responses are the expected ones. QA can be performed manually or  automatically by following a programmable script. Performing API QA has the advantage of improving the quality of your API, its overall user experience, and even the security of the product. Since a QA process can identify defects early on during the implementation stage of an API product, it can reduce the cost of fi xing those defects if they’re found when consumers are already using the API. While contract and functional tests provide information on how an API works, QA off ers a broader perspective on how consumers experience the API. A QA process can be a part of the release process of your API and can determine whether the proposed changes have production quality. Release In soft ware development, you can say that a release happens whenever you make your soft ware available to users. Diff erent release environments target diff erent kinds of users. You can have a development environment that is mostly used to share your soft ware with other developers and to make testing easy. Th ere can also be a staging environment where the soft ware is available to a broader audience, and QA testing can happen. Finally, there is a production environment where the soft ware is made available generally to your customers. Releasing soft ware—and API products—can be done manually or automatically. While manual releases work well for small projects, things can get more complicated if you have a large code base and a growing team working on the project. In those situations, you want to automate the release as much as possible with something called a build process. During implementation, you focus on developing your API and ensuring you have all tests in place. If those tests are all fully automated, you can make them run every time you try to release your API. Each build process can automatically run a series of steps, including packaging the soft ware, making it available on a mock server, and running tests. If any of the build steps fail, you can consider that the whole build process failed, and the API isn’t released. If the build process succeeds, you have a packaged API ready to be deployed into your environment of choice. Deploying the API means it will become available to any users with access to the environment where you’re doing the release. You can either manage the deployment process yourself, including the servers where your API will run, or use one of the many available API gateway products. Either way, you’ll want to have a layer of control between your users and your API. If controlling how users interact with your API is important, knowing how your API is behaving is also fundamental. If you know how your API behaves, you can understand whether its behavior is aff ecting your users’ experience. By anticipating how users can be negatively aff ected, you can proactively take measures and improve the quality of your API. Using an API monitor lets you periodically receive information about the behavior and quality of your API. You can understand whether any part of your API is not working as expected by using a solution such as a Postman Monitor. Diff erent solutions let you gather information about API availability, response times, and error rates. If you want to go deeper and understand how the API server is performing, you can also use an Application Performance Monitor (APM). Services such as New Relic give you information about the performance and error rate of the server and the code that is running your API. Another area that you want to pay attention to during the release stage of the API life cycle is documentation. While you can have an API reference automatically built from your machine-readable defi nition, you’ll want to pay attention to other aspects of documentation. As you’ve seen in Chapter 2, good API documentation is fundamental to obtaining a good user experience. In Chapter 3, you learned how documentation can enhance support and help users get answers to their questions when interacting with your API. Documentation also involves tutorials covering the JTBDs of the API user personas and clearly showing how consumers can interact with each API feature. To promote the whole API and the features you’re releasing, you can make an announcement to your customers and the community. Announcing a release is a good idea because it raises the general public’s awareness and helps users understand what has changed since the last release. Depending on the size of your company, your available marketing budget, and the importance of the release, you choose the media where you make the announcement. You could simply share the news on your blog, or go all the way and promote the new version of your API with a marketing campaign. Your goal is always to reach the existing users of your API and to make the news available to other potential users. Sharing news about your release is a way to increase the reach of your API. Another way is to distribute your API reference in existing API marketplaces that already have their own audience. Online marketplaces let you list your API so potential users can fi nd it and start using it. Th ere are vertical marketplaces that focus on specifi c sectors, such as healthcare or education. Other marketplaces are more generic and let you list any API. Th e elements you make available are usually your API reference, documentation, and pointers on signing up and starting to use the API. You can pick as many marketplaces as you like. Keep in mind that some of the existing solutions charge you for listing your API, so measure each marketplace as a distribution channel. You can measure how many users sign up and use your API across the marketplaces where your API is listed. Over time, you’ll understand which marketplaces aren’t worth keeping, and you can remove your API from those. Th is measurement is part of API analytics, one of the activities of the maintenance stage of the API life cycle. Keep rea ding to learn more about it. Maintenance You’re now in the last stage of the API life cycle. This is the stage where you make sure that your API is continuously running without disturbances. Of all the activities at this stage, the one where you’ll spend the most time will be analyzing how users interact with your API. Analytics is where you understand who your users are, what they’re doing, whether they’re being successful, and if not, how you can help them succeed. The information you gather will help you identify features that you should keep, the ones that you should improve, and the ones that you should shut down. But analytics is not limited to usage. You can also obtain performance, security, and even business metrics. For example, with analytics, you can identify the customers who interact with the top features of your API and understand how much revenue is being generated. That information can tell you whether the investment in those top features is paying off. You can also understand what errors are the most common and which customers are having the most difficulties. Being able to do that allows you to proactively fix problems before users get in touch with your support team. Something to keep in mind is that there will be times when users will have difficulties working with your API. The issues can be related to your API server being slow or not working at all. There can be problems related to connectivity between some users and your API. Alternatively, individual users can have issues that only affect them. All these situations usually lead to customers contacting your support team. Having a support system in place is important because it increases the satisfaction of your users and their trust in your product. Without support, users will feel lost when they have difficulties. Worse, they’ll share their problems publicly without you having a chance to help. One situation where support is particularly requested is when you need to release a new version of your API. Versioning happens whenever you introduce new features, fix existing ones, or deprecate some part of your API. Having a version helps your users know what they should expect when interacting with your API. Versioning also enables you to communicate and identify those changes in different categories. You can have minor bug fixes, new features, or breaking changes. All those can affect how customers use your API, and communicating them is essential to maintaining a good experience. Another aspect of versioning is the ability to keep several versions running. As the API producer, running more than one version can be helpful but can increase your costs. The advantage of having at least two versions is that you can roll back to the previous version if the current one is having issues. This is often considered a good practice. Knowing when to end the life of your entire API or some of its features is a simple task, especially when there are customers using your API regularly. First of all, it’s essential that you have a communication plan so your customers know in advance when your API will stop working. Things to mention in the communication plan include a timeline of the shutdown and any alternative options, if available, even from a competitor of yours. A second aspect to account for is ensuring the API sunset is done according to existing laws and regulations. Other elements include handling the retention of data processed or generated by usage of the API and continuing to monitor accesses to the API even after you shut it down. ConclusionAt this point, you know how to identify the different stages of the API life cycle and how they’re all interconnected. You also understand which stakeholders participate at each stage of the API life cycle. You can describe the most important elements of each stage of the API life cycle and know why they must be considered to build a successful API product. You first learned about my simplified version of the API life cycle and its four stages. You then went into each of them, starting with the design stage. You learned how designing an API can affect its success. You understood the connection between user personas, their attributes, and the architectural type of the API that you’re building. After that, you got to know what high and low-level design validations are and how they can help you reach a product-market fit. You then learned that having a machine-readable definition enables you to document your API but is also a shortcut to implementing its server and infrastructure. Afterward, you learned about contract testing and QA and how they connect to the implementation and release stages. You acquired knowledge about the different release environments and learned how they’re used. You knew about distribution and API marketplaces and how to measure API usage and performance. Finally, you learned how to version and eventually shut down your API. Author BioBruno Pedro is a computer science professional with over 25 years of experience in the industry. Throughout his career, he has worked on a variety of projects, including Internet traffic analysis, API backends and integrations, and Web applications. He has also managed teams of developers and founded several companies, including tarpipe, an iPaaS, in 2008, and the API Changelog in 2015. In addition to his work experience, Bruno has also made contributions to the API industry through his written work, including two published books on API-related topics and numerous technical magazine and web articles. He has also been a speaker at numerous API industry conferences and events from 2013 to 2018.
Read more
  • 0
  • 0
  • 782

article-image-automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide
Agnieszka Koziorowska, Wojciech Marusiak
05 Nov 2024
15 min read
Save for later

Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide

Agnieszka Koziorowska, Wojciech Marusiak
05 Nov 2024
15 min read
This article is an excerpt from the book, "Google Cloud Associate Cloud Engineer Certification and Implementation Guide", by Agnieszka Koziorowska, Wojciech Marusiak. This book serves as a guide for students preparing for ACE certification, offering invaluable practical knowledge and hands-on experience in implementing various Google Cloud Platform services. By actively engaging with the content, you’ll gain the confidence and expertise needed to excel in your certification journey.Introduction In this article, we will walk you through an example of implementing Google Cloud Functions for optical character recognition (OCR) on Google Cloud Platform. This tutorial will demonstrate how to automate the process of extracting text from an image, translating the text, and storing the results using Cloud Functions, Pub/Sub, and Cloud Storage. By leveraging Google Cloud Vision and Translation APIs, we can create a workflow that efficiently handles image processing and text translation. The article provides detailed steps to set up and deploy Cloud Functions using Golang, covering everything from creating storage buckets to deploying and running your function to translate text. Google Cloud Functions Example Now that you’ve learned what Cloud Functions is, I’d like to show you how to implement a sample Cloud Function. We will guide you through optical character recognition (OCR) on Google Cloud Platform with Cloud Functions. Our use case is as follows: 1. An image with text is uploaded to Cloud Storage. 2. A triggered Cloud Function utilizes the Google Cloud Vision API to extract the text and identify the source language. 3. The text is queued for translation by publishing a message to a Pub/Sub topic. 4. A Cloud Function employs the Translation API to translate the text and stores the result in the translation queue. 5. Another Cloud Function saves the translated text from the translation queue to Cloud Storage. 6. The translated results are available in Cloud Storage as individual text files for each translation. We need to download the samples first; we will use Golang as the programming language. Source files can be downloaded from – https://github.com/GoogleCloudPlatform/golangsamples. Before working with the OCR function sample, we recommend enabling the Cloud Translation API and the Cloud Vision API. If they are not enabled, your function will throw errors, and the process will not be completed. Let’s start with deploying the function: 1. We need to create a Cloud Storage bucket.  Create your own bucket with unique name – please refer to documentation on bucket naming under following link: https://cloud.google.com/storage/docs/buckets We will use the following code: gsutil mb gs://wojciech_image_ocr_bucket 2. We also need to create a second bucket to store the results: gsutil mb gs://wojciech_image_ocr_bucket_results 3. We must create a Pub/Sub topic to publish the finished translation results. We can do so with the following code: gcloud pubsub topics create YOUR_TOPIC_NAME. We used the following command to create it: gcloud pubsub topics create wojciech_translate_topic 4. Creating a second Pub/Sub topic to publish translation results is necessary. We can use the following code to do so: gcloud pubsub topics create wojciech_translate_topic_results 5. Next, we will clone the Google Cloud GitHub repository with some Python sample code: git clone https://github.com/GoogleCloudPlatform/golang-samples 6. From the repository, we need to go to the golang-samples/functions/ocr/app/ file to be able to deploy the desired Cloud Function. 7. We recommend reviewing the included go files to review the code and understand it in more detail. Please change the values of your storage buckets and Pub/Sub topic names. 8. We will deploy the first function to process images. We will use the following command: gcloud functions deploy ocr-extract-go --runtime go119 --trigger-bucket wojciech_image_ocr_bucket --entry-point  ProcessImage --set-env-vars "^:^GCP_PROJECT=wmarusiak-book- 351718:TRANSLATE_TOPIC=wojciech_translate_topic:RESULT_ TOPIC=wojciech_translate_topic_results:TO_LANG=es,en,fr,ja" 9. After deploying the first Cloud Function, we must deploy the second one to translate the text.  We can use the following code snippet: gcloud functions deploy ocr-translate-go --runtime go119 --trigger-topic wojciech_translate_topic --entry-point  TranslateText --set-env-vars "GCP_PROJECT=wmarusiak-book- 351718,RESULT_TOPIC=wojciech_translate_topic_results" 10. The last part of the complete solution is a third Cloud Function that saves results to Cloud Storage. We will use the following snippet of code to do so: gcloud functions deploy ocr-save-go --runtime go119 --triggertopic wojciech_translate_topic_results --entry-point SaveResult  --set-env-vars "GCP_PROJECT=wmarusiak-book-351718,RESULT_ BUCKET=wojciech_image_ocr_bucket_results" 11. We are now free to upload any image containing text. It will be processed first, then translated and saved into our Cloud Storage bucket. 12. We uploaded four sample images that we downloaded from the Internet that contain some text. We can see many entries in the ocr-extract-go Cloud Function’s logs. Some Cloud Function log entries show us the detected language in the image and the other extracted text:  Figure 7.22 – Cloud Function logs from the ocr-extract-go function 13. ocr-translate-go translates detected text in the previous function:  Figure 7.23 – Cloud Function logs from the ocr-translate-go function 14. Finally, ocr-save-go saves the translated text into the Cloud Storage bucket:  Figure 7.24 – Cloud Function logs from the ocr-save-go function 15. If we go to the Cloud Storage bucket, we’ll see the saved translated files:  Figure 7.25 – Translated images saved in the Cloud Storage bucket 16. We can view the content directly from the Cloud Storage bucket by clicking Download next to the file, as shown in the following screenshot:  Figure 7.26 – Translated text from Polish to English stored in the Cloud Storage bucket Cloud Functions is a powerful and fast way to code, deploy, and use advanced features. We encourage you to try out and deploy Cloud Functions to understand the process of using them better. At the time of writing, Google Cloud Free Tier offers a generous number of free resources we can use. Cloud Functions offers the following with its free tier: 2 million invocations per month (this includes both background and HTTP invocations) 400,000 GB-seconds, 200,000 GHz-seconds of compute time 5 GB network egress per month Google Cloud has comprehensive tutorials that you can try to deploy. Go to https://cloud.google.com/functions/docs/tutorials to follow one. Conclusion In conclusion, Google Cloud Functions offer a powerful and scalable solution for automating tasks like optical character recognition and translation. Through this example, we have demonstrated how to use Cloud Functions, Pub/Sub, and the Google Cloud Vision and Translation APIs to build an end-to-end OCR and translation pipeline. By following the provided steps and code snippets, you can easily replicate this process for your own use cases. Google Cloud's generous Free Tier resources make it accessible to get started with Cloud Functions. We encourage you to explore more by deploying your own Cloud Functions and leveraging the full potential of Google Cloud Platform for serverless computing. Author BioAgnieszka is an experienced Systems Engineer who has been in the IT industry for 15 years. She is dedicated to supporting enterprise customers in the EMEA region with their transition to the cloud and hybrid cloud infrastructure by designing and architecting solutions that meet both business and technical requirements. Agnieszka is highly skilled in AWS, Google Cloud, and VMware solutions and holds certifications as a specialist in all three platforms. She strongly believes in the importance of knowledge sharing and learning from others to keep up with the ever-changing IT industry.With over 16 years in the IT industry, Wojciech is a seasoned and innovative IT professional with a proven track record of success. Leveraging extensive work experience in large and complex enterprise environments, Wojciech brings valuable knowledge to help customers and businesses achieve their goals with precision, professionalism, and cost-effectiveness. Holding leading certifications from AWS, Alibaba Cloud, Google Cloud, VMware, and Microsoft, Wojciech is dedicated to continuous learning and sharing knowledge, staying abreast of the latest industry trends and developments.
Read more
  • 0
  • 0
  • 417
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud
Jasmeet Bhatia, Kartik Chaudhary
04 Nov 2024
15 min read
Save for later

Vertex AI Workbench: Your Complete Guide to Scaling Machine Learning with Google Cloud

Jasmeet Bhatia, Kartik Chaudhary
04 Nov 2024
15 min read
This article is an excerpt from the book, "The Definitive Guide to Google Vertex AI", by Jasmeet Bhatia, Kartik Chaudhary. The Definitive Guide to Google Vertex AI is for ML practitioners who want to learn Google best practices, MLOps tooling, and turnkey AI solutions for solving large-scale real-world AI/ML problems. This book takes a hands-on approach to help you become an ML rockstar on Google Cloud Platform in no time.Introduction While working on an ML project, if we are running a Jupyter Notebook in a local environment, or using a web-based Colab- or Kaggle-like kernel, we can perform some quick experiments and get some initial accuracy or results from ML algorithms very fast. But we hit a wall when it comes to performing large-scale experiments, launching long-running jobs, hosting a model, and also in the case of model monitoring. Additionally, if the data related to a project requires some more granular permissions on security and privacy (fine-grained control over who can view/access the data), it’s not feasible in local or Colab-like environments. All these challenges can be solved just by moving to the cloud. Vertex AI Workbench within Google Cloud is a JupyterLab-based environment that can be leveraged for all kinds of development needs of a typical data science project. The JupyterLab environment is very similar to the Jupyter Notebook environment, and thus we will be using these terms interchangeably throughout the book. Vertex AI Workbench has options for creating managed notebook instances as well as user-managed notebook instances. User-managed notebook instances give more control to the user, while managed notebooks come with some key extra features. We will discuss more about these later in this section. Some key features of the Vertex AI Workbench notebook suite include the following: Fully managed–Vertex AI Workbench provides a Jupyter Notebook-based fully managed environment that provides enterprise-level scale without managing infrastructure, security, and user-management capabilities. Interactive experience–Data exploration and model experiments are easier as managed notebooks can easily interact with other Google Cloud services such as storage systems, big data solutions, and so on. Prototype to production AI–Vertex AI notebooks can easily interact with other Vertex AI tools and Google Cloud services and thus provide an environment to run end-to-end ML projects from development to deployment with minimal transition. Multi-kernel support–Workbench provides multi-kernel support in a single managed notebook instance including kernels for tools such as TensorFlow, PyTorch, Spark, and R. Each of these kernels comes with pre-installed useful ML libraries and lets us install additional libraries as required. Scheduling notebooks–Vertex AI Workbench lets us schedule notebook runs on an ad hoc and recurring basis. This functionality is quite useful in setting up and running large-scale experiments quickly. This feature is available through managed notebook instances. More information will be provided on this in the coming sections. With this background, we can now start working with Jupyter Notebooks on Vertex AI Workbench. The next section provides basic guidelines for getting started with notebooks on Vertex AI. Getting started with Vertex AI Workbench Go to the Google Cloud console and open Vertex AI from the products menu on the left pane or by using the search bar on the top. Inside Vertex AI, click on Workbench, and it will open a page very similar to the one shown in Figure 4.3. More information on this is available in the official  documentation (https://cloud.google.com/vertex-ai/docs/workbench/ introduction).  Figure 4.3 – Vertex AI Workbench UI within the Google Cloud console As we can see, Vertex AI Workbench is basically Jupyter Notebook as a service with the flexibility of working with managed as well as user-managed notebooks. User-managed notebooks are suitable for use cases where we need a more customized environment with relatively higher control. Another good thing about user-managed notebooks is that we can choose a suitable Docker container based on our development needs; these notebooks also let us change the type/size of the instance later on with a restart. To choose the best Jupyter Notebook option for a particular project, it’s important to know about the common differences between the two solutions. Table 4.1 describes some common differences between fully managed and user-managed notebooks: Table 4.1 – Differences between managed and user-managed notebook instances Let’s create one user-managed notebook to check the available options:  Figure 4.4 – Jupyter Notebook kernel configurations As we can see in the preceding screenshot, user-managed notebook instances come with several customized image options to choose from. Along with the support of tools such as TensorFlow Enterprise, PyTorch, JAX, and so on, it also lets us decide whether we want to work with GPUs (which can be changed later, of course, as per needs). These customized images come with all useful libraries pre-installed for the desired framework, plus provide the flexibility to install any third-party packages within the instance. After choosing the appropriate image, we get more options to customize things such as notebook name, notebook region, operating system, environment, machine types, accelerators, and so on (see the following screenshot):  Figure 4.5 – Configuring a new user-managed Jupyter Notebook Once we click on the CREATE button, it can take a couple of minutes to create a notebook instance. Once it is ready, we can launch the Jupyter instance in a browser tab using the link provided inside Workbench (see Figure 4.6). We also get the option to stop the notebook for some time when we are not using it (to reduce cost):  Figure 4.6 – A running Jupyter Notebook instance This Jupyter instance can be accessed by all team members having access to Workbench, which helps in collaborating and sharing progress with other teammates. Once we click on OPEN JUPYTERLAB, it opens a familiar Jupyter environment in a new tab (see Figure 4.7):  Figure 4.7 – A user-managed JupyterLab instance in Vertex AI Workbench A Google-managed JupyterLab instance also looks very similar (see Figure 4.8):  Figure 4.8 – A Google-managed JupyterLab instance in Vertex AI Workbench Now that we can access the notebook instance in the browser, we can launch a new Jupyter Notebook or terminal and get started on the project. After providing sufficient permissions to the service account, many useful Google Cloud services such as BigQuery, GCS, Dataflow, and so on can be accessed from the Jupyter Notebook itself using SDKs. This makes Vertex AI Workbench a one-stop tool for every ML development need. Note: We should stop Vertex AI Workbench instances when we are not using them or don’t plan to use them for a long period of time. This will help prevent us from incurring costs from running them unnecessarily for a long period of time. In the next sections, we will learn how to create notebooks using custom containers and how to schedule notebooks with Vertex AI Workbench. Custom containers for Vertex AI Workbench Vertex AI Workbench gives us the flexibility of creating notebook instances based on a custom container as well. The main advantage of a custom container-based notebook is that it lets us customize the notebook environment based on our specific needs. Suppose we want to work with a new TensorFlow version (or any other library) that is currently not available as a predefined kernel. We can create a custom Docker container with the required version and launch a Workbench instance using this container. Custom containers are supported by both managed and user-managed notebooks. Here is how to launch a user-managed notebook instance using a custom container: 1. The first step is to create a custom container based on the requirements. Most of the time, a derivative container (a container based on an existing DL container image) would be easy to set up. See the following example Dockerfile; here, we are first pulling an existing TensorFlow GPU image and then installing a new TensorFlow version from the source: FROM gcr.io/deeplearning-platform-release/tf-gpu:latest RUN pip install -y tensorflow2. Next, build and push the container image to Container Registry, such that it should be accessible to the Google Compute Engine (GCE) service account. See the following source to build and push the container image: export PROJECT=$(gcloud config list project --format "value(core.project)") docker build . -f Dockerfile.example -t "gcr.io/${PROJECT}/ tf-custom:latest" docker push "gcr.io/${PROJECT}/tf-custom:latest"Note that the service account should be provided with sufficient permissions to build and push the image to the container registry, and the respective APIs should be enabled. 3. Go to the User-managed notebooks page, click on the New Notebook button, and then select Customize. Provide a notebook name and select an appropriate Region and Zone value. 4. In the Environment field, select Custom Container. 5. In the Docker Container Image field, enter the address of the custom image; in our case, it would look like this: gcr.io/${PROJECT}/tf-custom:latest 6. Make the remaining appropriate selections and click the Create button. We are all set now. While launching the notebook, we can select the custom container as a kernel and start working on the custom environment. Conclusion Vertex AI Workbench stands out as a powerful, cloud-based environment that streamlines machine learning development and deployment. By leveraging its managed and user-managed notebook options, teams can overcome local development limitations, ensuring better scalability, enhanced security, and integrated access to Google Cloud services. This guide has explored the foundational aspects of working with Vertex AI Workbench, including its customizable environments, scheduling features, and the use of custom containers. With Vertex AI Workbench, data scientists and ML practitioners can focus on innovation and productivity, confidently handling projects from inception to production. Author BioJasmeet Bhatia is a machine learning solution architect with over 18 years of industry experience, with the last 10 years focused on global-scale data analytics and machine learning solutions. In his current role at Google, he works closely with key GCP enterprise customers to provide them guidance on how to best use Google's cutting-edge machine learning products. At Google, he has also worked as part of the Area 120 incubator on building innovative data products such as Demand Signals, and he has been involved in the launch of Google products such as Time Series Insights. Before Google, he worked in similar roles at Microsoft and Deloitte.When not immersed in technology, he loves spending time with his wife and two daughters, reading books, watching movies, and exploring the scenic trails of southern California.He holds a bachelor's degree in electronics engineering from Jamia Millia Islamia University in India and an MBA from the University of California Los Angeles (UCLA) Anderson School of Management.Kartik Chaudhary is an AI enthusiast, educator, and ML professional with 6+ years of industry experience. He currently works as a senior AI engineer with Google to design and architect ML solutions for Google's strategic customers, leveraging core Google products, frameworks, and AI tools. He previously worked with UHG, as a data scientist, and helped in making the healthcare system work better for everyone. Kartik has filed nine patents at the intersection of AI and healthcare.Kartik loves sharing knowledge and runs his own blog on AI, titled Drops of AI.Away from work, he loves watching anime and movies and capturing the beauty of sunsets.
Read more
  • 0
  • 0
  • 802

article-image-essential-sql-for-data-engineers
Kedeisha Bryan, Taamir Ransome
31 Oct 2024
10 min read
Save for later

Essential SQL for Data Engineers

Kedeisha Bryan, Taamir Ransome
31 Oct 2024
10 min read
This article is an excerpt from the book, Cracking the Data Engineering Interview, by Kedeisha Bryan, Taamir Ransome. The book is a practical guide that’ll help you prepare to successfully break into the data engineering role. The chapters cover technical concepts as well as tips for resume, portfolio, and brand building to catch the employer's attention, while also focusing on case studies and real-world interview questions.Introduction In the world of data engineering, SQL is the unsung hero that empowers us to store, manipulate, transform, and migrate data easily. It is the language that enables data engineers to communicate with databases, extract valuable insights, and shape data to meet their needs. Regardless of the nature of the organization or the data infrastructure in use, a data engineer will invariably need to use SQL for creating, querying, updating, and managing databases. As such, proficiency in SQL can often the difference between a good data engineer and a great one. Whether you are new to SQL or looking to brush up your skills, this chapter will serve as a comprehensive guide. By the end of this chapter, you will have a solid understanding of SQL as a data engineer and be prepared to showcase your knowledge and skills in an interview setting. In this article, we will cover the following topics: Must-know foundational SQL concepts Must-know advanced SQL concepts Technical interview questions Must-know foundational SQL concepts In this section, we will delve into the foundational SQL concepts that form the building blocks of data engineering. Mastering these fundamental concepts is crucial for acing SQL-related interviews and effectively working with databases. Let’s explore the critical foundational SQL concepts every data engineer should be comfortable with, as follows: SQL syntax: SQL syntax is the set of rules governing how SQL statements should be written. As a data engineer, understanding SQL syntax is fundamental because you’ll be writing and reviewing SQL queries regularly. These queries enable you to extract, manipulate, and analyze data stored in relational databases. SQL order of operations: The order of operations dictates the sequence in which each of the following operators is executed in a query: FROM and JOIN WHERE GROUP BY HAVING SELECT DISTINCT ORDER BY LIMIT/OFFSET Data types: SQL supports a variety of data types, such as INT, VARCHAR, DATE, and so on. Understanding these types is crucial because they determine the kind of data that can be stored in a column, impacting storage considerations, query performance, and data integrity. As a data engineer, you might also need to convert data types or handle mismatches. SQL operators: SQL operators are used to perform operations on data. They include arithmetic operators (+, -, *, /), comparison operators (>, <, =, and so on), and logical operators (AND, OR, and NOT). Knowing these operators helps you construct complex queries to solve intricate data-related problems. Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control  Language (DCL) commands: DML commands such as SELECT, INSERT, UPDATE, and DELETE allow you to manipulate data stored in the database. DDL commands such as CREATE, ALTER, and DROP enable you to manage database schemas. DCL commands such as GRANT and REVOKE are used for managing permissions. As a data engineer, you will frequently use these commands to interact with databases. Basic queries: Writing queries to select, filter, sort, and join data is an essential skill for any data engineer. These operations form the basis of data extraction and manipulation. Aggregation functions: Functions such as COUNT, SUM, AVG, MAX, MIN, and GROUP BY are used to perform calculations on multiple rows of data. They are essential for generating reports and deriving statistical insights, which are critical aspects of a data engineer’s role. The following section will dive deeper into must-know advanced SQL concepts, exploring advanced techniques to elevate your SQL proficiency. Get ready to level up your SQL game and unlock new possibilities in data engineering! Must-know advanced SQL concepts This section will explore advanced SQL concepts that will elevate your data engineering skills to the next level. These concepts will empower you to tackle complex data analysis, perform advanced data transformations, and optimize your SQL queries. Let’s delve into must-know advanced SQL concepts, as follows: Window functions: These do a calculation on a group of rows that are related to the current row. They are needed for more complex analyses, such as figuring out running totals or moving averages, which are common tasks in data engineering. Subqueries: Queries nested within other queries. They provide a powerful way to perform complex data extraction, transformation, and analysis, often making your code more efficient and readable. Common Table Expressions (CTEs): CTEs can simplify complex queries and make your code more maintainable. They are also essential for recursive queries, which are sometimes necessary for problems involving hierarchical data. Stored procedures and triggers: Stored procedures help encapsulate frequently performed tasks, improving efficiency and maintainability. Triggers can automate certain operations, improving data integrity. Both are important tools in a data engineer’s toolkit. Indexes and optimization: Indexes speed up query performance by enabling the database to locate data more quickly. Understanding how and when to use indexes is key for a data engineer, as it affects the efficiency and speed of data retrieval. Views: Views simplify access to data by encapsulating complex queries. They can also enhance security by restricting access to certain columns. As a data engineer, you’ll create and manage views to facilitate data access and manipulation. By mastering these advanced SQL concepts, you will have the tools and knowledge to handle complex data scenarios, optimize your SQL queries, and derive meaningful insights from your datasets. The following section will prepare you for technical interview questions on SQL. We will equip you with example answers and strategies to excel in SQL-related interview discussions. Let’s further enhance your SQL expertise and be well prepared for the next phase of your data engineering journey. Technical interview questions This section will address technical interview questions specifically focused on SQL for data engineers. These questions will help you demonstrate your SQL proficiency and problem-solving abilities. Let’s explore a combination of primary and advanced SQL interview questions and the best methods to approach and answer them, as follows: Question 1: What is the difference between the WHERE and HAVING clauses? Answer: The WHERE clause filters data based on conditions applied to individual rows, while the HAVING clause filters data based on grouped results. Use WHERE for filtering before aggregating data and HAVING for filtering after aggregating data. Question 2: How do you eliminate duplicate records from a result set? Answer: Use the DISTINCT keyword in the SELECT statement to eliminate duplicate records and retrieve unique values from a column or combination of columns. Question 3: What are primary keys and foreign keys in SQL? Answer: A primary key uniquely identifies each record in a table and ensures data integrity. A foreign key establishes a link between two tables, referencing the primary key of another table to enforce referential integrity and maintain relationships. Question 4: How can you sort data in SQL? Answer: Use the ORDER BY clause in a SELECT statement to sort data based on one or more columns. The ASC (ascending) keyword sorts data in ascending order, while the DESC (descending) keyword sorts it in descending order. Question 5: Explain the difference between UNION and UNION ALL in SQL. Answer: UNION combines and removes duplicate records from the result set, while UNION ALL combines all records without eliminating duplicates. UNION ALL is faster than UNION because it does not involve the duplicate elimination process. Question 6: Can you explain what a self join is in SQL? Answer: A self join is a regular join where a table is joined to itself. This is often useful when the data is related within the same table. To perform a self join, we have to use table aliases to help SQL distinguish the left from the right table. Question 7: How do you optimize a slow-performing SQL query? Answer: Analyze the query execution plan, identify bottlenecks, and consider strategies such as creating appropriate indexes, rewriting the query, or using query optimization techniques such as JOIN order optimization or subquery optimization.  Question 8: What are CTEs, and how do you use them? Answer: CTEs are temporarily named result sets that can be referenced within a query. They enhance query readability, simplify complex queries, and enable recursive queries. Use the WITH keyword to define CTEs in SQL. Question 9: Explain the ACID properties in the context of SQL databases. Answer: ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These are basic properties that make sure database operations are reliable and transactional. Atomicity makes sure that a transaction is handled as a single unit, whether it is fully done or not. Consistency makes sure that a transaction moves the database from one valid state to another. Isolation makes sure that transactions that are happening at the same time don’t mess with each other. Durability makes sure that once a transaction is committed, its changes are permanent and can survive system failures. Question 10: How can you handle NULL values in SQL? Answer: Use the IS NULL or IS NOT NULL operator to check for NULL values. Additionally, you can use the COALESCE function to replace NULL values with alternative non-null values. Question 11: What is the purpose of stored procedures and functions in SQL? Answer: Stored procedures and functions are reusable pieces of SQL code encapsulating a set of SQL statements. They promote code modularity, improve performance, enhance security, and simplify database maintenance. Question 12: Explain the difference between a clustered and a non-clustered index. Answer: The physical order of the data in a table is set by a clustered index. This means that a table can only have one clustered index. The data rows of a table are stored in the leaf nodes of a clustered index. A non-clustered index, on the other hand, doesn’t change the order of the data in the table. After sorting the pointers, it keeps a separate object in a table that points back to the original table rows. There can be more than one non-clustered index for a table. Prepare for these interview questions by understanding the underlying concepts, practicing SQL queries, and being able to explain your answers. ConclusionThis article explored the foundational and advanced principles of SQL that empower data engineers to store, manipulate, transform, and migrate data confidently. Understanding these concepts has unlocked the door to seamless data operations, optimized query performance, and insightful data analysis. SQL is the language that bridges the gap between raw data and valuable insights. With a solid grasp of SQL, you possess the skills to navigate databases, write powerful queries, and design efficient data models. Whether preparing for interviews or tackling real-world data engineering challenges, the knowledge you have gained in this chapter will propel you toward success. Remember to continue exploring and honing your SQL skills. Stay updated with emerging SQL technologies, best practices, and optimization techniques to stay at the forefront of the ever-evolving data engineering landscape. Embrace the power of SQL as a critical tool in your data engineering arsenal, and let it empower you to unlock the full potential of your data. Author BioKedeisha Bryan is a data professional with experience in data analytics, science, and engineering. She has prior experience combining both Six Sigma and analytics to provide data solutions that have impacted policy changes and leadership decisions. She is fluent in tools such as SQL, Python, and Tableau.She is the founder and leader at the Data in Motion Academy, providing personalized skill development, resources, and training at scale to aspiring data professionals across the globe. Her other works include another Packt book in the works and an SQL course for LinkedIn Learning.Taamir Ransome is a Data Scientist and Software Engineer. He has experience in building machine learning and artificial intelligence solutions for the US Army. He is also the founder of the Vet Dev Institute, where he currently provides cloud-based data solutions for clients. He holds a master's degree in Analytics from Western Governors University.
Read more
  • 1
  • 0
  • 591

article-image-how-to-create-and-connect-a-virtual-network-in-azure-for-windows-365
Christiaan Brinkhoff, Sandeep Patnaik, Morten Pedholt
31 Oct 2024
15 min read
Save for later

How to Create and Connect a Virtual Network in Azure for Windows 365

Christiaan Brinkhoff, Sandeep Patnaik, Morten Pedholt
31 Oct 2024
15 min read
This article is an excerpt from the book, Mastering Windows 365, by Jonathan R. Danylko. Mastering Windows 365 provides you with detailed knowledge of cloud PCs by exploring its designing model and analyzing its security environment. This book will help you extend your existing skillset with Windows 365 effectively.Introduction In today's cloud-centric world, establishing a secure and efficient network infrastructure is crucial for businesses of all sizes. Microsoft Azure, with its robust set of networking tools, provides a seamless way to connect various environments, including Windows 365. In this guide, we will walk you through the process of creating a virtual network in Azure, and how to connect it to a Windows 365 environment. Whether you're setting up a new network or integrating an existing one, this step-by-step tutorial will ensure you have the foundation necessary for a successful deployment. Creating a virtual network in Azure Start by going to https://portal.azure.com/ and create a new virtual network. It's quite straightforward. You can use all the default settings, but take care that you aren't overlapping the address space with an existing one you are already using: 1. Start by logging in to https://portal.azure.com. 2. Start the creation of a new virtual network. From here, choose the Resource group option and the name of the virtual network. When these have been defi ned, choose Next.  Figure 3.5 – Virtual network creation basic information 3. There are some security features you can enable on the virtual network. Th ese features are optional, but  Azure Firewall  should be considered if no other fi rewall solution is deployed.  When you are ready, click on Next.  Figure 3.6 – Virtual network creation security 4. Now the IP address range and subnets must be defined. Once these have been defi ned, click on Next.                                                       Figure 3.7 – Virtual network creation | IP addresses 5. Next, we can add any Azure tags that might be required for your organization. We will leave it as is in this case. Click on Next.                                                        Figure 3.8 – Virtual network | Azure tags selection 6. We are now able to see an overview of the entire configuration of the new virtual network.  When you have reviewed this, click on Create.                                                           Figure 3.9 – Virtual network creation | settings review Now that the virtual network has been created, we can start looking at how we create an ANC in Intune. We will look at the confi guration for both an AADJ and HAADJ network connection. Setting up an AADJ ANC Let's have a look at  how to configure an ANC for  AADJ Cloud PC device : 1. Start by going to Microsoft  Intune | Devices | Windows 365 | Azure network connection.  From here, click on + Create and select Azure AD Join:  Figure 3.10 – Creating an ANC in Windows 365 overview 2. Fill out the required information such as the display name of the connection, the virtual network, and the subnet you would like to integrate with Windows 365. Once that is done, click on Next.                                                                       Figure 3.11 – Creating an AADJ ANC | network details 3. Review the information you have filled in. When you are ready, click Review + create:  Figure 3.12 – Creating an AADJ ANC | settings review Once the ANC has been created, you are now done and should be able to view the connection in the ANC overview. You can now use that virtual network in your provisioning policy.  Figure 3.13 – Windows 365 ANC network overview Setting up a HAADJ ANC A HAADJ network connection is a bit trickier to set up than the previous one. We must ensure the virtual network we are using has a connection with the domain we are trying to join. Once we are sure about that, let's go ahead and create a connection: 1. Visit Microsoft  Intune | Windows 365 | Azure network connection. From here, click on + Create and select Hybrid Azure AD Join.  Figure 3.14 – Creating a HAADJ ANC in Windows 365 | Overview 2. Provide the required information such as  the display name of the connection, the virtual network, and the subnet you would like to integrate with Windows 365. Click Next.  Figure 3.15 – Creating a HAADJ ANC | network details 3. Type the domain name you want the Cloud PCs to join. The Organization Unit field is optional. Type in the AD username and password for your domain-joined service account. Once done, click Next:  Figure 3.16 – Creating a HAADJ ANC | domain details 4. Review the settings provided and click on Review + create. The connection will now be established:  Figure 3.17 – Creating a HAADJ ANC | settings details Once the creation is done, you can view the connection in the ANC overview. You will now be able to use that virtual network in your provisioning policy.  Figure 3.18 – Windows 365 ANC network overview  ConclusionCreating a virtual network in Azure and connecting it to your Windows 365 environment is a fundamental step towards leveraging the full potential of cloud-based services. By following the outlined procedures, you can ensure a secure and efficient network connection, whether you're dealing with Azure AD Join (AADJ) or Hybrid Azure AD Join (HAADJ) scenarios. With the virtual network and ANC now configured, you are well-equipped to manage and monitor your network connections, enhancing the overall performance and reliability of your cloud infrastructure. Author BioChristiaan works as a Principal Program Manager and Community Lead on the Windows Cloud Experiences (Windows 365 + AVD) Engineering team at Microsoft, bringing his expertise to help customers imagine new virtualization experiences. A former Global Black Belt for Azure Virtual Desktop, Christiaan joined Microsoft in 2018 as part of the FSLogix acquisition. In his role at Microsoft, he worked on features such as Windows 365 app, Switch, and Boot. His mission is to drive innovation while bringing Windows 365, Windows, and Microsoft Endpoint Manager (MEM) closer together, and drive community efforts around virtualization to empower Microsoft customers in leveraging new cloud virtualization scenarios.Sandeep is a virtualization veteran with nearly two decades of experience in the industry. He has shipped multiple billion-dollar products and cloud services for Microsoft to a global user base including Windows, Azure Virtual Desktop, and Windows 365. His contributions have earned him multiple patents in this field.Currently, he leads a stellar team that is responsible for building the product strategy for Windows 365 and Azure Virtual Desktop services and shaping the future of end-user experiences for these services.Morten works as a Cloud Architect for a consultant company in Denmark where he advises and implements Microsoft virtual desktop solutions to customers around the world, Morten started his journey as a consultant over 8 years ago where he started with managing client devices but quickly found a passion for virtual device management. Today Windows 365 and Azure Virtual Desktop are the main areas that are being focused on alongside Microsoft Intune. Based on all the community activities Morten has done in the past years, he got rewarded with the Microsoft MVP award in the category of Windows 365 in March 2022.
Read more
  • 0
  • 0
  • 400

article-image-building-efficient-web-apis-with-net-8-and-visual-studio-2022
Jonathan R. Danylko
30 Oct 2024
15 min read
Save for later

Building Efficient Web APIs with .NET 8 and Visual Studio 2022

Jonathan R. Danylko
30 Oct 2024
15 min read
This article is an excerpt from the book, ASP.NET 8 Best Practices, by Jonathan R. Danylko. With the latest version of .NET 8.0 Core in LTS (Long-Term-Support), best practices are becoming harder to find as the technology continues to evolve. This book will guide you through coding practices and various aspects of software development.Introduction In the ever-evolving landscape of web development, .NET 8 has emerged as a game-changer, especially in the realm of Web APIs. With new features and enhancements, .NET 8 prioritizes the ease and efficiency of building Web APIs, supported by robust tools in Visual Studio 2022. This chapter explores the innovations in .NET 8, focusing on creating and testing Web APIs seamlessly. From leveraging minimal APIs to utilizing Visual Studio's new features, developers can now build powerful REST-based services with simplicity and speed. We'll guide you through the process, demonstrating how to create a minimal API and highlighting the benefits of this approach. Technical requirements In .NET 8, Web APIs take a front seat. Visual Studio has added new features to make Web APIs easier to build and test. For this chapter, we recommend using Visual Studio 2022, but the only requirement to view the GitHub repository is a simple text editor. The code for Chapter 09 is located in Packt Publishing’s GitHub repository, found at https:// github.com/PacktPublishing/ASP.NET-Core-8-Best-Practices. Creating APIs quickly With .NET 8, APIs are integrated into the framework, making it easier to create, test, and document. In this section, we’ll learn a quick and easy way to create a minimal API using Visual Studio 2022 and walk through the code it generates. We’ll also learn why minimal APIs are the best approach to building REST-based services. Using Visual Studio One of the features of .NET 8 is the ability to create minimal R EST APIs extremely fast. One way is to use the dotnet command-line tool and the other way is to use Visual Studio. To do so, follow these steps: 1. Open Visual Studio 2022 and create an ASP.NET Core Web API project. 2. After selecting the directory for the project, click Next. 3. Under the project options, make the following changes: Uncheck the Use Controllers option to use minimal APIs Check Enable OpenAPI support to include support for API documentation using Swagger:  Figure 9.1 – Options for a web minimal API project 4. Click Create. That’s it – we have a simple API! It may not be much of one, but it’s still a complete API with Swagger documentation. Swagger is a tool for creating documentation for APIs and implementing the OpenAPI specification, whereas Swashbuckle is a NuGet package that uses Swagger for implementing Microsoft  APIs. If we look at the project, there’s a single file called Program.cs. Opening Program.cs will show the entire application. This is one of the strong points of .NET – the ability to create a scaffolded REST API relatively quickly: var builder = WebApplication.CreateBuilder(args); // Add services to the container. // Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle builder.Services.AddEndpointsApiExplorer(); builder.Services.AddSwaggerGen(); var app = builder.Build(); // Configure the HTTP request pipeline. if (app.Environment.IsDevelopment()) { app.UseSwagger(); app.UseSwaggerUI(); } app.UseHttpsRedirection(); var summaries = new[] { "Freezing", "Bracing", "Chilly", "Cool", "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching" }; app.MapGet("/weatherforecast", () => { var forecast = Enumerable.Range(1, 5).Select(index => new WeatherForecast ( DateOnly.FromDateTime(DateTime.Now.AddDays (index)), Random.Shared.Next(-20, 55), summaries[Random.Shared.Next( summaries.Length)] )) .ToArray(); return forecast; }) .WithName("GetWeatherForecast") .WithOpenApi(); app.Run(); internal record WeatherForecast(DateOnly Date, int TemperatureC, string? Summary) { public int TemperatureF => 32 + (int)(TemperatureC / 0.5556); } In the preceding code, we created our “application” through the .CreateBuilder() method. We also added the EndpointsAPIExplorer and SwaggerGen services. EndpointsAPIExplorer enables the developer to view all endpoints in Visual Studio, which we’ll cover later. The SwaggerGen service, on the other hand, creates the documentation for the API when accessed through the browser. The next line creates our application instance using the .Build() method. Once we have our app instance and we are in development mode, we can add Swagger and the Swagger UI. .UseHttpsRedirection() is meant to redirect to HTTPS when the protocol of a web page is HTTP to make the API secure. The next line creates our GET weatherforecast route using .MapGet(). We added the .WithName() and .WithOpenApi() methods to identify the primary method to call and let .NET know it uses the OpenAPI standard, respectively. Finally, we called app.Run(). If we run the application, we will see the documented  API on how to use our API and what’s available. Running the application produces the following output:  Figure 9.2 – Screenshot of our documented Web API If we call the /weatherforecast API, we see that we receive JSON back with a 200 HTTP status.  Figure 9.3 – Results of our /weatherforecast API Think of this small  API as middleware with API controllers combined into one compact file (Program. cs).  Why minimal APIs? I consider minimal APIs to be a feature in .NET 8, even though it’s a language concept. If the application is extremely large, adding minimal APIs should be an appealing feature in four ways: Self-contained: Simple API functionality inside one file is easy to follow for other developers Performance: Since we aren’t using controllers, the MVC overhead isn’t necessary when using these APIs Cross-platform: With .NET, APIs can now be deployed on any platform Self-documenting: While we can add Swashbuckle to other APIs, it also builds the documentation for minimal APIs Moving forward, we’ll take these minimal APIs and start looking at Visual Studio’s testing capabilities. Conclusion In conclusion, .NET 8 has revolutionized the process of building Web APIs by integrating them more deeply into the framework, making it easier than ever to create, test, and document APIs. By harnessing the power of Visual Studio 2022, developers can quickly set up minimal APIs, offering a streamlined and efficient approach to building REST-based services. The advantages of minimal APIs—being self-contained, performant, cross-platform, and self-documenting—make them an invaluable tool in a developer's arsenal. As we continue to explore the capabilities of .NET 8, the potential for creating robust and scalable web applications is limitless, paving the way for innovative and efficient software solutions. Author BioJonathan "JD" Danylko is an award-winning, full-stack ASP.NET architect. He's used ASP.NET as his primary way to build websites since 2002 and before that, Classic ASP.Jonathan contributes to his blog (DanylkoWeb.com) on a weekly basis, has built a custom CMS, is a founder of Tuxboard (an open-source ASP.NET dashboard library), has been on various podcasts, and guest posted on the C# Advent Calendar for 6 years. Jonathan has worked in various industries for small, medium, and Fortune 100 companies, but currently works as an Architect at Insight Enterprise. The best way to contact Jonathan is through GitHub, LinkedIn, Twitter, email, or through the website.
Read more
  • 0
  • 0
  • 804
article-image-effortless-web-deployment-a-guide-to-deploying-your-application-on-netlify
Ekene Eze
30 Oct 2024
10 min read
Save for later

Effortless Web Deployment: A Guide to Deploying Your Application on Netlify

Ekene Eze
30 Oct 2024
10 min read
This article is an excerpt from the book, Web Development on Netlify, by Ekene Eze. This book is a comprehensive guide to deploying and scaling frontend web applications on Netlify. With hands-on instructions and real-world examples, this book takes you from setting up a Netlify account and deploying web apps to optimizing performance.Introduction Deploying a web application can sometimes be a daunting task, especially with the various methods and tools available. In this article, we'll explore two straightforward deployment methods offered by Netlify: the drag-and-drop method, which is beginner-friendly and ideal for static sites, and the Netlify CLI (Netlify Dev) method, which provides greater control for developers who prefer using the command line.  Deploying your web application on Netlify We will discuss two deployment methods in this chapter: the drag-and-drop method and the Netlify CLI (Netlify Dev) m ethod. A third method, the Git-based method, was covered in the Connecting to a Git repository section in Chapter 1. Netlify drag-and-drop deployment The drag-and-drop deployment method is the most straightforward and beginner-friendly way to deploy a web application on Netlify. Th is method is suitable for static websites or applications that do not require complex build processes. To deploy your web application on Netlify using the drag-and-drop method, follow these steps: 1. Organize your project files and ensure your project’s index.html file is in the root folder so  that Netlify can easily find it and build your site from there:  Figure 2.1 – Netlify drop sample structure 2. Visit netlify.com and sign in or create an account. 3. On your Netlify dashboard, locate the Sites section. Drag and drop your project folder into the designated area. Netlify will automatically upload your files, create a new site, deploy it, and assign a randomly generated  URL. You can click on the generated URL to view your live site. 4. Optionally, configure your site.  To configure your site’s settings, such as adding a custom domain or enabling SSL, click the Site settings button. We will discuss these configuration options in greater detail later, in the Configuring settings and o options section. Netlify CLI (Netlify Dev) deployment The Netlify CLI deployment method offers greater control over the deployment process for developers who prefer using the command line. Follow these steps to deploy your web applications to Netlify using the Netlify CLI: 1. Install the Netlify CLI globally on your computer using npm: npm install -g netlify-cli 2. Run the following command to authenticate your Netlify account: netlify login Your browser will open so that you can authorize access to your Netlify account. 3. Navigate to your project folder in the command line and run the following command to initialize a new Netlify site: netlify init 4. You will be prompted to choose between connecting an existing Git repository or creating a new site without a Git repository. Choose the option that best suits your needs. Connecting to a Git repository enables continuous deployment. 4. If your project requires specifi c build settings, open the automatically created netlify.toml fi le in your project’s root directory and confi gure the settings accordingly. Here’s an example: toml [build] command = "npm run build" publish = "dist" This configuration would run the npm run build command and deploy the dist folder as the publish directory. Run the following command in your project directory to deploy your site: netlify deploy By default, this command creates a draft deployment. Preview the draft by visiting the generated URL. 7. If you are satisfied with the draft deployment, run the following  command for a production deployment: netlify deploy --prod This will create a production deployment with a randomly generated URL. 8. Visit your Netlify dashboard to view your live site or configure your site’s settings, such as adding a custom domain or enabling SSL. This step will be covered in more detail in the Configuring settings and options section of this chapter. Git-based deployment Refer to Chapter 1 for the Git-based deployment process. Choosing a deployment pattern Need help choosing a pattern for your needs? Here’s a tabular comparison of the three deployment patterns offered by Netlify: Git-based deployments, CLI deployments, and drag-and-drop: Deployment PatternWhen to ChooseKey BenefitsGit-based deployments Ideal for collaborative development Version control, automated builds, code reviewCLI deployments Ideal for advanced automation scenarios Scripted deployments, custom workflowsDrag-and-drop deployments Ideal for simple, non-technical usersUser-friendly, visual interface, quick deploymentsTable –  Choosing a deployment pattern Now, let’s discuss when each deployment pattern is ideal and why: Git-based deployments: Git-based deployments are suitable for collaborative development environments where multiple team members contribute to the code base. It is ideal when you want to leverage the power of version control systems such as Git. Git-based deployments offer version control, which allows you to track changes, collaborate with others, and roll back to previous versions if needed. They also enable automated builds triggered by changes to the repository, facilitating continuous integration and deployment workflows. Code review processes can be integrated into the deployment pipeline, ensuring code quality. CLI deployments: CLI deployments are ideal for advanced automation scenarios, where you require fine-grained control over the deployment process and want to integrate it with custom scripts or workflows. CLI deployments off er fl exibility and programmability. They allow you to script deployments using command-line tools, which can be useful for automating complex deployment scenarios. You can customize and extend the deployment process to fit your requirements while integrating with other tools or services. Drag-and-drop deployments: Drag-and-drop deployments are ideal for non-technical users or individuals who prefer a simple, user-friendly interface for deploying static sites or applications quickly. Drag-and-drop deployments provide a visual interface that simplifies the deployment process. Users can simply drag and drop their site files or assets onto the Netlify web interface, and the platform takes care of the deployment and hosting. This pattern eliminates the need for technical knowledge or command-line usage, making it accessible to a wider range of users. The choice of deployment pattern depends on your specific needs and your technical expertise. Git-based deployments are suitable for collaborative development, CLI deployments offer advanced automation capabilities, and drag-and-drop deployments are ideal for non-technical users seeking a simple interface. Understanding the strengths and trade-offs of each pattern will help you select the most appropriate deployment approach for your project. ConclusionChoosing the right deployment method is crucial for the success and efficiency of your web application. Whether you opt for the simplicity of the drag-and-drop method, the command-line control of the Netlify CLI, or the collaborative advantages of Git-based deployments, each approach has its unique strengths. The drag-and-drop method offers a quick and easy solution for non-technical users, while the CLI method provides advanced automation capabilities for more complex scenarios. Git-based deployments, on the other hand, are perfect for teams working in a collaborative environment with a need for version control. By understanding these methods and their respective benefits, you can confidently deploy your web application on Netlify using the approach that best aligns with your goals and expertise. Author BioEkene Eze is a highly experienced Developer Advocate with over five years of professional experience in leading DevRel teams across multiple organizations. As a former member of the Developer Experience team at Netlify, he played a key role in helping numerous companies integrate and effectively utilize the Netlify platform. As a well-regarded speaker, he is dedicated to sharing his knowledge and expertise with the wider development community through a variety of mediums, including blog posts, video tutorials, live streams, and podcasts. Currently serving as the Director of Developer Relations at Abridged Inc, the author brings a wealth of experience and expertise to this comprehensive guide on scaling web applications with Netlify.
Read more
  • 0
  • 0
  • 552

article-image-mastering-prometheus-sharding-boost-scalability-with-efficient-data-management
William Hegedus
28 Oct 2024
15 min read
Save for later

Mastering Prometheus Sharding: Boost Scalability with Efficient Data Management

William Hegedus
28 Oct 2024
15 min read
This article is an excerpt from the book, Mastering Prometheus, by William Hegedus. Become a Prometheus master with this guide that takes you from the fundamentals to advanced deployment in no time. Equipped with practical knowledge of Prometheus and its ecosystem, you’ll learn when, why, and how to scale it to meet your needs.IntroductionIn this article, readers will dive into techniques for optimizing Prometheus, a powerful open-source monitoring tool, by implementing sharding. As data volumes increase, so do the challenges associated with high cardinality, often resulting in strained single-instance setups. Instead of purging data to reduce load, sharding offers a viable solution by distributing scrape jobs across multiple Prometheus instances. This article explores two primary sharding methods: by service, which segments data by use case or team, and by dynamic relabeling, which provides a more flexible, albeit complex, approach to distributing data. By examining each method’s setup and trade-offs, the article offers practical insights for scaling Prometheus while maintaining efficient access to critical metrics across instances.Sharding Prometheus Chances are that if you’re looking to improve your Prometheus architecture through sharding, you’re hitting one of the limitations we talked about and it’s probably cardinality. You have a Prometheus instance that’s just got too much data in it, but… you don’t want to get rid of any data. So, the logical answer is… run another Prometheus instance! When you split data across Prometheus instances like this, it’s referred to as sharding. If you’re familiar with other database designs, it probably isn’t sharding in the traditional sense. As previously established, Prometheus TSDBs do not talk to each other, so it’s not as if they’re coordinating to shard data across instances. Instead, you predetermine where data will be placed by how you configure the scrape jobs on each instance. So, it’s more like sharding scrape jobs than sharding the data. Th ere are two main ways to accomplish this: sharding by service and sharding via relabeling. Sharding by service This is arguably the simpler of the two ways to shard data across your Prometheus instances. Essentially, you just separate your Prometheus instances by use case. This could be a Prometheus instance per team, where you have multiple Prometheus instances and each one covers services owned by a specific team so that each team still has a centralized location to see most of the data they care about. Or, you could arbitrarily shard it by some other criteria, such as one Prometheus instance for virtualized infrastructure, one for bare-metal, and one for containerized infrastructure. Regardless of the criteria, the idea is that you segment your Prometheus instances based on use case so that there is at least some unifi cation and consistency in which Prometheus gets which scrape targets. This makes it at least a little easier for other engineers and developers to reason when thinking about where the metrics they care about are located. From there, it’s fairly self-explanatory to get set up. It only entails setting up your scrape job in different locations. So, let’s take a look at the other, slightly more involved way of sharding your Prometheus instances. Sharding with relabeling Sharding via relabeling is a much more dynamic way of handling the sharding of your Prometheus scrape targets. However, it does have some tradeoff s. The biggest one is the added complexity of not necessarily knowing which Prometheus instance your scrape targets will end up on. As opposed to the sharding by service/team/domain example we already discussed, sharding via relabeling does not shard scrape jobs in a way that is predictable to users. Now, just because sharding is unpredictable to humans does not mean that it is not deterministic. It is consistent, but just not in a way that it will be clear to users which Prometheus they need to go to to find the metrics they want to see. There are ways to work around this with tools such as Th anos (which we’ll discuss later in this book) or federation (which we’ll discuss later in this chapter). The key to sharding via relabeling is the hashmod function, which is available during relabeling in Prometheus. The hashmod function works by taking a list of one or more source labels, concatenating them, producing an MD5 hash of it, and then applying a modulus to it. Then, you store the output of that and in your next step of relabeling, you keep or drop targets that have a specific hashmod value output. What’s relabeling again? For a refresher on relabeling in Prometheus, consult Chapter 4’s section on it. For this chapter, the type of relabeling we’re doing is standard relabeling (as opposed to metric relabeling) – it happens before a scrape occurs. Let’s look at an  example of how this works logically before diving into implementing it in our kubeprometheus stack. We’ll just use the Python REPL to keep it quick:  >>> from hashlib import md5 >>> SEPARATOR = ";" >>> MOD = 2 >>> targetA = ["app=nginx", "instance=node2"] >>> targetB = ["app=nginx", "instance=node23"] >>> hashA = int(md5(SEPARATOR.join(targetA).encode("utf-8")). hexdigest(), 16) >>> hashA 286540756315414729800303363796300532374 >>> hashB = int(md5(SEPARATOR.join(targetB).encode("utf-8")). hexdigest(), 16) >>> hashB 139861250730998106692854767707986305935 >>> print(f"{targetA} % {MOD} = ", hashA % MOD) ['app=nginx', 'instance=node2'] % 2 = 0 >>> print(f"{targetB} % {MOD} = ", hashB % MOD) ['app=nginx', 'instance=node23'] % 2 = 1As you can see, the hash of the app and instance labels has a modulus of 2 applied to it. For node2, the result is 0. For node23, the result is 1. Since the modulus is 2, those are the only possible values. Therefore, if we had two Prometheus instances, we would configure one to only keep targets where the result is 0, and the other would only keep targets where the result is 1 – that’s how we would shard our scrape jobs. The modulus value that you choose should generally correspond to the number of Prometheus instances that you wish to shard your scrape jobs across. Let’s look at how we can accomplish this type of sharding across two Prometheus instances using kube-prometheus. Luckily for us, kube-prometheus has built-in support for sharding Prometheus instances using relabeling by way of support via the Prometheus Operator. It’s a built-in option on Prometheus CRD objects. Enabling it is as simple as updating our prometheusSpec in our Helm values to specify the number of shards.  Additionally, we’ll need to clean up the names of our Prometheus instances; otherwise, Kubernetes won’t allow the new Pod to start due to character constraints. We can tell kube-prometheus to stop including kube-prometheus in the names of our resources, which will shorten the names. To do this, we’ll set cleanPrometheusOperatorObjectNames: true. The new values being added to our Helm values file from Chapter 2 look like this:  prometheus: prometheusSpec: shards: 2 cleanPrometheusOperatorObjectNames: trueThe full values file is available in this GitHub repository, which was linked at the beginning of this chapter. With that out of the way, we can apply these new values to get an additional Prometheus instance running to shard our scrape jobs across the two. The helm command to accomplish this is as follows:  $ helm upgrade --namespace prometheus \ --version 47.0.0 \ --values ch6/values.yaml \ mastering-prometheus \ prometheus-community/kube-prometheus-stackOnce that command completes, you should see a new pod named prometheus-masteringprometheus-kube-shard-1-0 in the output of kubectl get pods. Now, we can see the relabeling that’s taking place behind the scenes so that we can understand how it works and how to implement it in Prometheus instances not running via the Prometheus Operator. Port-forward to either of the two Prometheus instances (I chose the new one) and we can examine the configuration in our browsers at http://localhost:9090/config: $ kubectl port-forward \ pod/prometheus-mastering-prometheus-kube-shard-1-0 \ 9090The relevant section we’re looking for is the sequential parts of relabel_configs, where hashmod is applied and then a keep action is applied based on the output of hashmod and the shard number of the Prometheus instance. It should look like this:  relabel_configs: [ . . . ] - source_labels: [__address__] separator: ; regex: (.*) modulus: 2 target_label: __tmp_hash replacement: $1 action: hashmod - source_labels: [__tmp_hash] separator: ; regex: "1" replacement: $1 action: keepAs we can see, for each s crape job, a modulus of 2 is taken from the hash of the __address__ label, and its result is stored in a new label called __tmp_hash. You can store the result in whatever you want to name your label – there’s nothing special about __tmp_hash. Additionally, you can choose any one or more source labels you wish – it doesn’t have to be __address__. However, it’s recommended that you choose labels that will be unique per target – so instance and __address__ tend to be your best options. After calculating the modulus of the hash, the next step is the crucial one that determines which scrape targets the Prometheus shard will scrape. It takes the value of the __tmp_hash label and matches it against its shard number (shard numbers start at 0), and keeps only targets that match. The Prometheus Operator does the heavy lifting of automatically applying these two relabeling steps to all configured scrape jobs, but if you’re managing your own Prometheus configuration directly, then you will need to add them to every scrape job that you want to shard across Prometheus instances – there is currently no way to do it globally. It’s worth mentioning that sharding in this way does not guarantee that your scrape jobs are going to be evenly spread out across your number of shards. We can port-forward to the other Prometheus instance and run a quick PromQL query to easily see that they’re not evenly distributed across my two shards. I’ll port forward to port 9091 on my local host so that I can open both instances simultaneously: $ kubectl port-forward \ pod/prometheus-mastering-prometheus-kube-0 \ 9091:9090 Then, we can run this simple query to see how many scrape targets are assigned to each Prometheus instance: count(up) In my setup, there are eight scrape targets on shard 0 and 16 on shard 1. You can attempt to microoptimize scrape target sharding by including more unique labels in the source_label values for the hashmod operation, but it may not be worth the effort – as you add more unique scrape targets, they’ll begin to even out. One of the practical pain points you may have noticed already with sharding is that it’s honestly kind of a pain to have to navigate to multiple Prometheus instances to run queries. One of the ways we can try to make this easier is through federating our Prometheus instances. Conclusion In conclusion, sharding Prometheus is an effective way to manage the challenges posed by data volume and cardinality in your system. Whether you opt for sharding by service or through dynamic relabeling, both approaches offer ways to distribute scrape jobs across multiple Prometheus instances. While sharding via relabeling introduces more complexity, it also provides flexibility and scalability. However, it is important to consider the trade-offs, such as uneven distribution of scrape jobs and the need for tools like Thanos or federation to simplify querying across instances. By applying these strategies, you can ensure a more efficient and scalable Prometheus architecture. Author BioWill Hegedus has worked in tech for over a decade in a variety of roles, most recently in Site Reliability Engineering. After becoming the first SRE at Linode, an independent cloud provider, he came to Akamai Technologies by way of an acquisition.Now, Will manages a team of SREs focused on building an internal observability platform for Akamai&rsquo;s Connected Cloud. His team's responsibilities include managing a global fleet of Prometheus servers ingesting millions of data points every second.Will is an open-source advocate with contributions to Prometheus, Thanos, and other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, 4 kids, 3 cats, 2 dogs, and bearded dragon.
Read more
  • 0
  • 0
  • 473

article-image-how-to-land-music-placements-and-chart-on-billboard
Chris Noxx
28 Oct 2024
10 min read
Save for later

How to Land Music Placements and Chart on Billboard

Chris Noxx
28 Oct 2024
10 min read
This article is an excerpt from the book, A Power User's Guide to FL Studio 21, by Chris Noxx. Get a chance to learn from an FL Studio Power User to take your music productions to the next level using time-tested and decade-mastered production techniques. This book will uncover techniques for creating music in FL Studio and best approaches to making your way to Billboard charts.Introduction This broad article captures the essence of “Chapter 8 - How to Get Records Placed So They Land on Billboard Charts” of “A Power User's Guide to FL Studio 21” book written by Chris Noxx, covering key areas such as placements, catalog building, rights, outreach, and types of deals, all while remaining true to the original content​ In the highly competitive music industry, getting records placed with major artists and landing on the Billboard charts is a dream for many producers. This chapter provides a comprehensive guide on how to achieve that dream by focusing on placements, catalog building, networking, and understanding the business deals that will help propel your career to new heights. The Importance of the Journey Before diving into the technicalities, it’s essential to recognize that the path to success is not straightforward. Embracing the journey, with all its ups and downs, is crucial for maintaining the dedication and perseverance needed to make it in the music industry. What Are Record Placements? Record placements are one of the most coveted opportunities in the music world. They involve getting your music featured on a major artist's album, single, or other releases. In addition to record placements, sync placements offer another avenue for revenue, with your music being used in television shows, commercials, films, or video games. Both types of placements provide significant exposure and the potential for substantial income. Building and Valuing Your Music Catalog Your catalog of music is a valuable asset. Each track you create holds the potential for future placements, and over time, your catalog can generate consistent revenue. It's essential to recognize that catalogs have become an alternative asset class. They can be bought, sold, or licensed, much like real estate or stocks, making them a crucial long-term investment for your career. The value of a catalog is determined by various factors, including past performance, future sync potential, and overall demand. Even older tracks can increase in value when they find the right placement. Understanding Rights and Income Streams In the music business, revenue streams are primarily derived from two sides of the copyright: the publishing side and the master side. Publishing side: This covers the composition, including melodies, chords, and lyrics. Master side: This pertains to the actual sound recording. Maximizing income from placements means retaining as much ownership as possible on both sides of the copyright. Having a solid understanding of these rights ensures you're protecting your work and maximizing your earnings. Taking Action: The Key to Success Opportunities rarely come to those who wait, which is why taking action is critical. Whether through networking, outreach, or consistent improvement of your music, positioning yourself in the right places at the right times is vital to your success. Building relationships with key players in the industry—artists, managers, A&Rs—is a fundamental step toward getting your music in front of the right people. Attend industry events, create meaningful connections, and ensure you're continuously improving your craft to stand out in a crowded field. Two Primary Approaches to Securing Placements There are two main strategies for landing placements: Direct action: Actively pursuing placements by reaching out to artists, managers, and A&Rs. Indirect action: Building your brand and reputation through content creation and networking, allowing opportunities to come to you over time. Both approaches are essential and should be used together to maximize your chances of success. Consistent effort in both areas will yield the best results. The Power of Cold Outreach Cold outreach is a powerful, albeit often underutilized, tool in the industry. By reaching out to artists, managers, or other key players, you can introduce your work and potentially land a placement. Personalizing your outreach and demonstrating the value you bring to their projects will increase your chances of getting a response. Building a Strong Lead List Having a well-targeted lead list is crucial for cold outreach. Your list should include relevant artists, A&Rs, managers, and industry professionals who are likely to benefit from your music. The more focused your list, the better your chances of success when conducting outreach. Types of Industry Deals Understanding the types of deals available in the industry is essential for protecting your interests and maximizing your earnings. Here are some common deals that producers may encounter: Co-publishing deals: Where you split ownership of the publishing rights with a publisher. Administration deals: You maintain ownership of your rights but pay a third party to manage and administer them. Traditional publishing deals: You assign your publishing rights to a company that manages your catalog in exchange for an upfront payment and royalties. Self-publishing: You retain full control of your rights but take on the responsibilities of managing and administering them. Production deals: Where you provide services to artists in exchange for a portion of the income generated by their music. Management deals: An agreement where a manager oversees your career and takes a percentage of your earnings. Label deals: Contracts with record labels to distribute and promote your music. Joint venture deals: Partnerships with labels or other companies to jointly promote and distribute your music. Distribution deals: Agreements to distribute your music through a specific platform or company. Each type of deal offers different benefits and trade-offs, and understanding which one best suits your goals will help you navigate the business side of the music industry. Steps for Building a Successful Career Success in the music industry doesn’t happen overnight. It requires dedication, persistence, and the willingness to take deliberate action. By following these steps—building a strong catalog, mastering the business aspects of music, and positioning yourself effectively—you can increase your chances of landing major placements and seeing your records rise on the Billboard charts. Conclusion Achieving success in the music industry, particularly landing records on the Billboard charts, requires more than just talent; it demands strategic planning, consistent action, and a deep understanding of the business side of music. From building a valuable catalog of songs to mastering the intricacies of publishing rights and making the most of both direct and indirect outreach, every step plays a vital role in your journey. By positioning yourself in the right places, embracing opportunities through cold outreach, and networking with key industry players, you increase your chances of getting your music placed with major artists and securing lucrative sync placements. Understanding the various types of deals, from co-publishing to label agreements, further empowers you to protect your work and maximize your earnings. At the heart of it all is the drive to continuously improve and take action. The music industry is competitive, but by combining creative mastery with smart business moves, you can create lasting success and potentially see your records climb the Billboard charts. Your journey is as much about persistence as it is about creativity—embrace both to unlock your full potential. Author BioChris Noxx is a FL Studio Power User and JUNO nominated (2020 Rap Recording of the Year) producer, composer, and arranger, who has charted on Billboard Charts over 12 times in the US and Canada, and has worked with some of the most iconic hip hop artists of all time using FL Studio (including Dr Dre, Chuck D (Public Enemy), KRS 1, RBX, The Outlawz, Nate Dogg, DJ Quik, Bone Thugs & Harmony, Kurupt, B Real (Cypress Hill), Tory Lanez, Classified, Crooked I, Faith Evans, Troy Ave, Ras Kass, Bishop Lamont, Seether, Talib Kweli, Xzibit, Waka Flocka Flame, Lloyd Banks & Young Buck (G-Unit).
Read more
  • 0
  • 0
  • 297
article-image-how-to-install-ruby-on-rails-a-comprehensive-guide-for-macos-windows-and-linux
Bernard Pineda
25 Oct 2024
10 min read
Save for later

How to Install Ruby on Rails: A Comprehensive Guide for macOS, Windows, and Linux

Bernard Pineda
25 Oct 2024
10 min read
This article is an excerpt from the book, From PHP to Ruby on Rails, by Bernard Pineda. This book will help you adopt the Ruby mindset and get to grips with Ruby-related concepts. You'll learn about setting up your local environment, Ruby syntax, popular frameworks, and more. A language-agnostic approach will help you avoid common pitfalls and start integrating Ruby into your projects. Introduction Just like the libraries we’ve seen so far, Rails is an open source gem. It behaves a little differently than the gems we’ve seen so far as it uses many dependencies and can generate code examples, but at the end of the day, it’s still a gem. This means that we can either install it by itself, or we can include it in a Gemfile. For this section, we will have to divide the process into three separate sections – macOS installation, Windows installation, and Linux installation – as each operating system behaves differently. Installing Ruby on Rails on macOS The first step of setting up our local environment is to install rbenv. For most Mac installations, brew will simplify this process. Let’s get started with the steps: 1. Let’s open a shell and run the following command: brew install rbenv 2. This should install the rbenv program. Now, you’ll need to add the following line to your bash profile: eval "$(rbenv init -)" 3. Once you’ve added this line to your profile, you should activate the change by either opening a new shell or running the following command: source ~/.bash_profile Note that this command may differ if you’re using another shell, such as zsh or fish. 4. With rbenv installed, we need to install Ruby 2.6.10 with the following command: rbenv install 2.6.10 5. Once Ruby 2.6.10 has been installed, we must set the default Ruby version with the following command: rbenv global 2.6.10 6. Now, we need to install the program to manage gems, called bundler. Let’s install it with the following command: gem install bundler With that, our environment is ready for the next steps in this chapter. If you wish to see more details about this installation, please refer to the following web page: https:// www.digitalocean.com/community/tutorials/how-to-install-ruby-onrails-with-rbenv-on-macos. Installing Ruby on Rails on Windows Follow these steps to install Ruby on Rails on Windows: 1. To set up our local environment, first, we must install Git for Windows. We can download the package from https://gitforwindows.org/. Once downloaded, we can run the installer; it should open the installer application:  Figure 7.1 – Git installer You can safely accept the default options unless you want to change any of the specific behavior from Git. At the end of the installation process, you may just deselect all the options of the wizard and move on to the next step:  Figure 7.2 – Git finalized installation We will also need the Git SDK installed for some dependencies that Ruby on Rails requires.  We can get the installer from https://github.com/git-for-windows/buildextra/releases/tag/git-sdk-1.0.8. Be careful and select the correct option for your platform (32 or 64 bits). In my case, I had to choose 64 bits, so I downloaded the git-sdk-installer-1.0.8.0-64.7z.exe binary:  Figure 7.3 – Git SDK download Once this package has been downloaded, run it; we will be asked where we want the Git SDK to be installed. The default option is fine (C:\git-sdk-64):  Figure 7.4 – Git SDK installation location This package might take a while to complete as it has to download other additional packages but it will do so on its own. Please be patient. Once this package has finished installing the SDK, it will open a Git Bash console, which looks similar to Windows PowerShell. We can close this Git Bash console window and open another Windows PowerShell. Once we have the new window open, we must type the following command: new-item -type file -path $profile -force This command will help us create a Windows PowerShell profile, which will allow us to execute commands every time we open a Windows PowerShell console. Once we’ve run the previous command, we may also close the Windows PowerShell window, and move on to the next step. At this point, we will install rbenv, which allows us to install multiple versions of Ruby. However, this program wasn’t created for Windows, so its installation is a little different than in other operating systems. Let’s open a browser and go to the rbenv for Windows web page: https://github.com/ ccmywish/rbenv-for-windows. On that page, we will find instructions on how to install rbenv, which we will do now. Let’s open a new Windows PowerShell and type the following command: $env:RBENV_ROOT = "C:\Ruby-on-Windows " This command will set a special environment variable that will be used for the rbenv installation. 6. Once we’ve run this command, we must download the rest of the required files with the following command: iwr -useb "https://github.com/ccmywish/rbenv-for-windows/raw/ main/tools/install.ps1" | iex 7. Once this command has finished downloading the files from GitHub, modify the user’s profile with the following command from within the Windows PowerShell: notepad $profile This will open the Notepad application and the profile we previously set. On the rbenv-for-windows web page, we can see what the content of the file should be. Let’s add it with Notepad so that the profile file now looks like this:  $env:RBENV_ROOT = "C:\Ruby-on-Windows" & "$env:RBENV_ROOT\rbenv\bin\rbenv.ps1" initSave and close Notepad, and close all Windows PowerShell windows that we may have open. We should open a new Windows PowerShell to make these changes take effect. As this is the first time rbenv is running, our console will automatically install a default Ruby version. This might take a while and will put our patience to the test. Once the process has finished, we should see an output similar to this one:  Figure 7.5 – rbenv post-installation script Now, we are ready to install other versions of Ruby. For Ruby on Rails 5, we will install Ruby 2.6.10. Let’s install it by running the following command on the same Windows Powershell window that we just opened: rbenv install 2.6.10 The program will ask us whether we want to install the Lite version or the Full version. Choose the Full version. Once again, this might take a while, so please be patient. Once this command has finished running, we must set this Ruby version for our whole system.  We can do this by running the following command: rbenv global 2.6.10 To confirm that this version of Ruby has been installed and enabled, use the following command: ruby --version This should give us the following output: ruby 2.6.10-1 (set by C: \Ruby-on-Windows \global.txt) 12. Ruby needs a program called bundler to manage all the dependencies on our system. So, let’s install this program with the following command: gem install bundler 13. Once this gem has been installed, we must update the RubyGem system with the following command:  gem update –-system 3.2.3 This command will also take a while to compute, but once it’s finished, we will be ready to use Ruby on Rails on Windows. Next, let’s see the steps for installing Ruby on Rails on Linux. Installing Ruby on Rails on Linux For Ubuntu and Debian Linux distributions, we must also install rbenv and the dependencies necessary for Ruby on Rails to run correctly: 1. Let’s start by opening a terminal and running the following command: sudo apt update 2. Once this command has finished updating apt, we must install our dependencies for Ruby, Ruby on Rails, and some gems that require compiling. We’ll do so by running the following command: sudo apt install git curl libssl-dev libreadline-dev zlib1gdev autoconf bison build-essential libyaml-dev libreadline-dev libncurses5-dev libffi-dev libgdbm-dev pkg-config sqlite3 nodejs This command might take a while. Once it has finished running, we can install rbenv with the following command: curl -fsSL https://github.com/rbenv/rbenv-installer/raw/HEAD/ bin/rbenv-installer | bash We should add rbenv to our $PATH. Let’s do so by running the following command: echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc Now, let’s add the initialize command for rbenv to our bash profile with the following command: echo 'eval "$(rbenv init -)"' >> ~/.bashrc Next, run the bash profile with the following command: source ~/.bashrc 7. This command will (among other things) make the rbenv executable available to us. Now, we can install Ruby 2.6.10 on our system with the following command: rbenv install 2.6.10 This command might take a little while as it installs openssl and that process will take some time. Once this command has finished installing Ruby 2.6.10, we need to set it as the default Ruby version for the whole machine. We can do so by running the following command: rbenv global 2.6.10 We can confirm that this version of Ruby has been installed by running the following command: ruby --version This will result in the following output: ruby 2.6.10p210 (2022-04-12 revision 67958) [x86_64-linux] Ruby needs a program called bundler to manage all the dependencies on our system. So, let’s install this program with the following command: gem install bundler Once this gem has been installed, we can update the RubyGems system with the following command:  gem update –-system 3.2.3 This command will also take a while to compute, but once it’s finished, we will be ready to use Ruby on Rails on Linux. For other Linux distributions and other operating systems, please refer to the official Ruby-lang page: https://www.ruby-lang.org/en/documentation/installation/. Conclusion In conclusion, installing Ruby on Rails varies across operating systems, but the general process involves setting up a version manager like rbenv, installing Ruby, and then using Bundler to manage gems. Whether you're on macOS, Windows, or Linux, each system has specific steps to ensure Rails and its dependencies run smoothly. By following the detailed instructions for your platform, you'll have your development environment ready for Rails in no time. For further guidance and platform-specific nuances, refer to the official documentation and resources linked throughout this guide. Author BioBernard Pineda is a seasoned developer with 20 years of web development experience. Proficient in PHP, Ruby, Python, and other backend technologies, he has taught PHP and PHP-based frameworks through video courses on platforms like LinkedIn Learning. His extensive work with Ruby and Ruby on Rails, along with curiosity in frontend development and game development, bring a diverse perspective to this book. Currently working as a Site Reliability Engineer in Silicon Valley, Bernard is always seeking new adventures.
Read more
  • 0
  • 0
  • 562

article-image-gaming-in-the-metaverse
Irena Cronin, Robert Scoble
24 Oct 2024
10 min read
Save for later

Gaming in the Metaverse

Irena Cronin, Robert Scoble
24 Oct 2024
10 min read
This article is an excerpt from the book, The Immersive Metaverse Playbook for Business Leaders, by Irena Cronin, Robert Scoble. This book explains what the metaverse is and why it is of utmost value to business decision-makers. The chapters help you get a solid understanding of the concepts and roles that augmented reality and virtual reality play, along with providing information on metaverse technologies, as well as thought-provoking consumer and enterprise use cases.Introduction In the Metaverse’s expansive gaming landscape, several compelling use cases emerge. Gamers become creators and modifiers, democratizing game development, with quality control as a challenge. Crossplatform gaming integration fosters an inclusive gaming community, while blockchain-backed virtual merchandise and collectibles introduce new opportunities with authenticity and copyright concerns. Virtual esports tournaments become global events, requiring stringent security measures. In-game advertising and product placement offer marketing potential, but striking a balance with player experience is vital. These use cases exemplify the diverse facets of gaming in the Metaverse, highlighting innovation and challenges in the pursuit of immersive digital gaming experiences. Let’s take a closer look at some use cases. Use case 1 – game creation and modification This use case exemplifies how the Metaverse empowers gamers to become active contributors to the gaming industry, shaping its future through their creativity and innovation. It highlights the democratization of game development and the dynamic synergy between technology, interactivity, and the challenges that come with it in this evolving digital realm. The setup Within the expansive and thriving Metaverse gaming landscape, a remarkable facet emerges where 3D and 2D virtual gamers are not just players but empowered creators and modifiers of games themselves. The Metaverse offers a vast canvas, brimming with opportunities for individuals and teams to craft unique gaming experiences that cater to a global audience. Interactivity In this immersive gaming domain, players transition into creators as they engage with innovative game creation and modification tools which include the use of generative AI. These tools empower users to design levels, characters, and gameplay mechanics, breathing life into their imaginative concepts. Collaborative platforms within the Metaverse foster teamwork, allowing multiple creators to combine their skills and ideas seamlessly. Technical innovation The Metaverse’s technical innovation shines through in the form of user-friendly game development platforms that bridge the gap between novice creators and experienced developers. These platforms offer intuitive interfaces, drag-and-drop functionality, and pre-built assets, making game design accessible to a wide range of enthusiasts. AI-driven game design assistance provides suggestions and optimizations, reducing the learning curve for newcomers. And with generative AI, soon whole 3D, as well as 2D, games could be fully developed. Challenges While the Metaverse fuels creativity and democratizes game development, several challenges emerge on this vibrant frontier. Balancing the influx of user-generated content with quality control becomes pivotal. Moderation systems must ensure that games meet basic quality standards and are free from malicious or inappropriate content. Additionally, striking a harmonious balance between open creativity and maintaining fair play in modified games poses an ongoing challenge. Ensuring that user-created content doesn’t disrupt the gaming experience for others is a priority. Continuous development and refinement of moderation and quality control mechanisms are essential to maintain a thriving and enjoyable gaming ecosystem within the Metaverse. Use case 2 – cross-platform gaming integration This use case illustrates how the Metaverse transcends the limitations of individual gaming platforms, fostering a more inclusive and interconnected gaming community. Cross-platform gaming integration enhances the social and competitive aspects of gaming, enabling players to unite in a shared virtual gaming universe. As the Metaverse continues to evolve, it reshapes the way we perceive and engage in gaming, offering a glimpse into the future of interactive entertainment. The setup Within the expansive Metaverse gaming landscape, cross-platform gaming integration becomes a prominent feature. This innovation allows players from various gaming platforms and devices to seamlessly interact and play together, breaking down traditional gaming silos. Interactivity In this interconnected Metaverse, players can engage in cross-platform gaming experiences with friends and gamers from around the world. Whether you’re on a PC, console, VR headset, or mobile device, you can join the same virtual gaming universe. Gamers can form diverse teams and alliances, fostering a sense of community that transcends hardware preferences. This integration offers unprecedented opportunities for collaboration and competition. Technical innovation The technical innovation driving this use case is the development of cross-platform compatibility protocols and infrastructure. These innovations bridge the gaps between different gaming ecosystems, allowing for cross-device gameplay. Advanced matchmaking algorithms ensure that players of similar skill levels can enjoy fair and balanced gaming experiences, regardless of their chosen platform. This technical integration transforms the Metaverse into a truly inclusive gaming space. Challenges While cross-platform gaming integration is a remarkable achievement, it comes with its own set of challenges. Ensuring a level playing field for all players, regardless of their platform, requires ongoing fine-tuning of matchmaking algorithms. Addressing potential disparities in hardware capabilities, such as graphics processing power, can be complex. Additionally, maintaining a secure gaming environment across diverse platforms is essential to prevent cheating, unauthorized access, and other security concerns. Use case 3 – game-related merchandise and collectibles This use case showcases how the Metaverse transforms the concept of gaming merchandise and collectibles, offering a virtual marketplace where gamers can not only enhance their in-game experiences but also indulge in their passion for collecting virtual treasures. The integration of blockchain technology adds a layer of trust and scarcity to these digital possessions, creating a virtual economy that mirrors the real-world collectibles market. The setup Within the Metaverse, a vibrant and bustling marketplace dedicated to gaming-related merchandise and collectibles emerges. This dynamic digital marketplace transforms the concept of gaming memorabilia, offering a diverse range of 3D and 2D virtual goods that hold significant value for gamers and collectors alike. It’s a virtual bazaar where gamers can immerse themselves in the culture of their favorite games beyond the confines of traditional gameplay. Interactivity In this immersive Metaverse marketplace, players gain the opportunity to personalize their avatars with a rich array of virtual gaming apparel and accessories. Gamers can browse an extensive catalog of virtual merchandise, including iconic character costumes, in-game items, and exclusive skins. This personalized customization allows players to showcase their gaming identity and immerse themselves even deeper into their favorite game worlds. Technical innovation At the heart of this use case lies the groundbreaking implementation of blockchain technology. This innovation plays a pivotal role in securing virtual collectibles, offering gamers a sense of rarity and ownership verification akin to physical collectibles. Each virtual item is tokenized on the blockchain, ensuring its uniqueness and provenance. Gamers can confidently buy, sell, and trade virtual merchandise, knowing that their digital possessions are genuine and scarce. In terms of the companies that offer game-related merchandise and collectibles, generative AI provides an inexpensive, fast, and easy way to create assets. Challenges While this Metaverse marketplace promises exciting opportunities, it also presents unique challenges. Ensuring the authenticity of virtual merchandise is paramount. The presence of counterfeit or unauthorized virtual items could undermine the trust and value within the marketplace. Additionally, addressing potential copyright issues related to virtual merchandise is a central concern. Striking a balance between allowing creative expression and protecting intellectual property rights is essential to maintaining a thriving and ethical marketplace. Negative implications of gaming in the Metaverse Gaming in the Metaverse, while promising incredible innovation and immersive experiences, also carries negative implications that span technological, social, and ethical dimensions. These potential drawbacks must be considered alongside the benefits to ensure a balanced perspective on this digital frontier. Technological implications Dependency on technology: As gaming in the Metaverse becomes increasingly sophisticated, there is a risk of individuals becoming overly dependent on technology for their entertainment and social interactions. This dependence may lead to issues related to screen time, addiction, and reduced physical activity. Technical glitches: The reliance on advanced technology for immersive gaming experiences introduces the possibility of technical glitches, server outages, or compatibility issues. These disruptions can frustrate players and disrupt their gaming experiences. Privacy concerns: The collection and utilization of user data within the Metaverse for targeted advertising and analytics can raise privacy concerns. Users may feel uncomfortable with the extent to which their online activities are monitored and analyzed. Social implications Social isolation: Immersive gaming experiences in the Metaverse could lead to social isolation as individuals spend more time in virtual environments and less time in physical social interactions. Loneliness and a lack of real-world social skills can result from excessive immersion. Economic disparities: Access to the Metaverse and its premium gaming experiences may be limited by socioeconomic factors. Those with greater financial resources may enjoy a significant advantage, potentially creating digital divides and exclusivity. Loss of physical interaction: The allure of the Metaverse may lead to a reduction in face-toface social interactions, which are crucial for human well-being. The diminished importance of real-world connections could have adverse effects on mental health and relationships. Ethical implications Exploitative monetization: In-game purchases and microtransactions within the Metaverse can sometimes exploit players, particularly younger individuals who may not fully understand the financial implications. This raises ethical questions about the gaming industry’s practices. Digital addiction: The highly immersive nature of gaming in the Metaverse may contribute to digital addiction, where individuals struggle to disengage from virtual experiences and prioritize them over real-world responsibilities. Content regulation: Balancing freedom of expression and maintaining a safe and inclusive gaming environment can be challenging. The Metaverse may struggle with regulating hate speech, inappropriate content, and cyberbullying. Psychological implications Escapism: While gaming can be a form of entertainment, excessive escapism into the Metaverse may indicate underlying psychological issues or a desire to avoid real-world problems. Impact on mental health: Long hours spent in virtual gaming worlds may lead to mental health issues such as anxiety, depression, and a distorted sense of reality. Cognitive overload: The complexity of immersive gaming experiences within the Metaverse can lead to cognitive overload, especially in younger players, potentially impacting their academic performance and cognitive development. Environmental implications Energy consumption: The infrastructure required to support the Metaverse’s immersive experiences and multiplayer environments can consume significant amounts of energy, contributing to environmental concerns. Electronic waste: As technology evolves rapidly, older gaming equipment and hardware can quickly become obsolete, leading to electronic waste disposal challenges. Conclusion In conclusion, the Metaverse is revolutionizing gaming with new opportunities for creativity, community, and commerce. It empowers gamers as creators, enables cross-platform play, introduces blockchain-backed collectibles, and hosts virtual esports tournaments. However, these advancements come with challenges like quality control, security, and balancing ads with player experience. Additionally, potential negative impacts such as technological dependency, social isolation, and ethical concerns must be addressed. By fostering innovation responsibly, the Metaverse can become a transformative and enriching space for gamers worldwide. Author BioIrena Cronin is SVP of Product for DADOS Technology, which is making an Apple Vision Pro data analytics and visualization app. She is also the CEO of Infinite Retina, which helps companies develop and implement AI, AR, and other new technologies for their businesses. Before this, she worked as an equity research analyst and gained extensive experience in evaluating both public and private companies. Cronin has an MS with Distinction in Information Technology/Management and Systems from New York University, and a joint MBA/MA from the University of Southern California. She has a BA from the University of Pennsylvania with a major in Economics (summa cum laude). Cronin speaks four languages, with a near-fluent proficiency in Mandarin.Robert Scoble has coauthored four books on technology innovation – each a decade before the said technology went completely mainstream. He has interviewed thousands of entrepreneurs in the tech industry and has long kept his social media audiences up to date on what is happening inside the world of tech, which is bringing us so many innovations. Robert currently tracks the AI industry and is the host of a new video show, Unaligned, where he interviews entrepreneurs from the thousands of AI companies he tracks as head of strategy for Infinite Retina.
Read more
  • 0
  • 0
  • 443