Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7017 Articles
article-image-how-to-land-music-placements-and-chart-on-billboard
Chris Noxx
28 Oct 2024
10 min read
Save for later

How to Land Music Placements and Chart on Billboard

Chris Noxx
28 Oct 2024
10 min read
This article is an excerpt from the book, A Power User's Guide to FL Studio 21, by Chris Noxx. Get a chance to learn from an FL Studio Power User to take your music productions to the next level using time-tested and decade-mastered production techniques. This book will uncover techniques for creating music in FL Studio and best approaches to making your way to Billboard charts.Introduction This broad article captures the essence of “Chapter 8 - How to Get Records Placed So They Land on Billboard Charts” of “A Power User's Guide to FL Studio 21” book written by Chris Noxx, covering key areas such as placements, catalog building, rights, outreach, and types of deals, all while remaining true to the original content​ In the highly competitive music industry, getting records placed with major artists and landing on the Billboard charts is a dream for many producers. This chapter provides a comprehensive guide on how to achieve that dream by focusing on placements, catalog building, networking, and understanding the business deals that will help propel your career to new heights. The Importance of the Journey Before diving into the technicalities, it’s essential to recognize that the path to success is not straightforward. Embracing the journey, with all its ups and downs, is crucial for maintaining the dedication and perseverance needed to make it in the music industry. What Are Record Placements? Record placements are one of the most coveted opportunities in the music world. They involve getting your music featured on a major artist's album, single, or other releases. In addition to record placements, sync placements offer another avenue for revenue, with your music being used in television shows, commercials, films, or video games. Both types of placements provide significant exposure and the potential for substantial income. Building and Valuing Your Music Catalog Your catalog of music is a valuable asset. Each track you create holds the potential for future placements, and over time, your catalog can generate consistent revenue. It's essential to recognize that catalogs have become an alternative asset class. They can be bought, sold, or licensed, much like real estate or stocks, making them a crucial long-term investment for your career. The value of a catalog is determined by various factors, including past performance, future sync potential, and overall demand. Even older tracks can increase in value when they find the right placement. Understanding Rights and Income Streams In the music business, revenue streams are primarily derived from two sides of the copyright: the publishing side and the master side. Publishing side: This covers the composition, including melodies, chords, and lyrics. Master side: This pertains to the actual sound recording. Maximizing income from placements means retaining as much ownership as possible on both sides of the copyright. Having a solid understanding of these rights ensures you're protecting your work and maximizing your earnings. Taking Action: The Key to Success Opportunities rarely come to those who wait, which is why taking action is critical. Whether through networking, outreach, or consistent improvement of your music, positioning yourself in the right places at the right times is vital to your success. Building relationships with key players in the industry—artists, managers, A&Rs—is a fundamental step toward getting your music in front of the right people. Attend industry events, create meaningful connections, and ensure you're continuously improving your craft to stand out in a crowded field. Two Primary Approaches to Securing Placements There are two main strategies for landing placements: Direct action: Actively pursuing placements by reaching out to artists, managers, and A&Rs. Indirect action: Building your brand and reputation through content creation and networking, allowing opportunities to come to you over time. Both approaches are essential and should be used together to maximize your chances of success. Consistent effort in both areas will yield the best results. The Power of Cold Outreach Cold outreach is a powerful, albeit often underutilized, tool in the industry. By reaching out to artists, managers, or other key players, you can introduce your work and potentially land a placement. Personalizing your outreach and demonstrating the value you bring to their projects will increase your chances of getting a response. Building a Strong Lead List Having a well-targeted lead list is crucial for cold outreach. Your list should include relevant artists, A&Rs, managers, and industry professionals who are likely to benefit from your music. The more focused your list, the better your chances of success when conducting outreach. Types of Industry Deals Understanding the types of deals available in the industry is essential for protecting your interests and maximizing your earnings. Here are some common deals that producers may encounter: Co-publishing deals: Where you split ownership of the publishing rights with a publisher. Administration deals: You maintain ownership of your rights but pay a third party to manage and administer them. Traditional publishing deals: You assign your publishing rights to a company that manages your catalog in exchange for an upfront payment and royalties. Self-publishing: You retain full control of your rights but take on the responsibilities of managing and administering them. Production deals: Where you provide services to artists in exchange for a portion of the income generated by their music. Management deals: An agreement where a manager oversees your career and takes a percentage of your earnings. Label deals: Contracts with record labels to distribute and promote your music. Joint venture deals: Partnerships with labels or other companies to jointly promote and distribute your music. Distribution deals: Agreements to distribute your music through a specific platform or company. Each type of deal offers different benefits and trade-offs, and understanding which one best suits your goals will help you navigate the business side of the music industry. Steps for Building a Successful Career Success in the music industry doesn’t happen overnight. It requires dedication, persistence, and the willingness to take deliberate action. By following these steps—building a strong catalog, mastering the business aspects of music, and positioning yourself effectively—you can increase your chances of landing major placements and seeing your records rise on the Billboard charts. Conclusion Achieving success in the music industry, particularly landing records on the Billboard charts, requires more than just talent; it demands strategic planning, consistent action, and a deep understanding of the business side of music. From building a valuable catalog of songs to mastering the intricacies of publishing rights and making the most of both direct and indirect outreach, every step plays a vital role in your journey. By positioning yourself in the right places, embracing opportunities through cold outreach, and networking with key industry players, you increase your chances of getting your music placed with major artists and securing lucrative sync placements. Understanding the various types of deals, from co-publishing to label agreements, further empowers you to protect your work and maximize your earnings. At the heart of it all is the drive to continuously improve and take action. The music industry is competitive, but by combining creative mastery with smart business moves, you can create lasting success and potentially see your records climb the Billboard charts. Your journey is as much about persistence as it is about creativity—embrace both to unlock your full potential. Author BioChris Noxx is a FL Studio Power User and JUNO nominated (2020 Rap Recording of the Year) producer, composer, and arranger, who has charted on Billboard Charts over 12 times in the US and Canada, and has worked with some of the most iconic hip hop artists of all time using FL Studio (including Dr Dre, Chuck D (Public Enemy), KRS 1, RBX, The Outlawz, Nate Dogg, DJ Quik, Bone Thugs & Harmony, Kurupt, B Real (Cypress Hill), Tory Lanez, Classified, Crooked I, Faith Evans, Troy Ave, Ras Kass, Bishop Lamont, Seether, Talib Kweli, Xzibit, Waka Flocka Flame, Lloyd Banks & Young Buck (G-Unit).
Read more
  • 0
  • 0
  • 308

article-image-how-to-install-ruby-on-rails-a-comprehensive-guide-for-macos-windows-and-linux
Bernard Pineda
25 Oct 2024
10 min read
Save for later

How to Install Ruby on Rails: A Comprehensive Guide for macOS, Windows, and Linux

Bernard Pineda
25 Oct 2024
10 min read
This article is an excerpt from the book, From PHP to Ruby on Rails, by Bernard Pineda. This book will help you adopt the Ruby mindset and get to grips with Ruby-related concepts. You'll learn about setting up your local environment, Ruby syntax, popular frameworks, and more. A language-agnostic approach will help you avoid common pitfalls and start integrating Ruby into your projects. Introduction Just like the libraries we’ve seen so far, Rails is an open source gem. It behaves a little differently than the gems we’ve seen so far as it uses many dependencies and can generate code examples, but at the end of the day, it’s still a gem. This means that we can either install it by itself, or we can include it in a Gemfile. For this section, we will have to divide the process into three separate sections – macOS installation, Windows installation, and Linux installation – as each operating system behaves differently. Installing Ruby on Rails on macOS The first step of setting up our local environment is to install rbenv. For most Mac installations, brew will simplify this process. Let’s get started with the steps: 1. Let’s open a shell and run the following command: brew install rbenv 2. This should install the rbenv program. Now, you’ll need to add the following line to your bash profile: eval "$(rbenv init -)" 3. Once you’ve added this line to your profile, you should activate the change by either opening a new shell or running the following command: source ~/.bash_profile Note that this command may differ if you’re using another shell, such as zsh or fish. 4. With rbenv installed, we need to install Ruby 2.6.10 with the following command: rbenv install 2.6.10 5. Once Ruby 2.6.10 has been installed, we must set the default Ruby version with the following command: rbenv global 2.6.10 6. Now, we need to install the program to manage gems, called bundler. Let’s install it with the following command: gem install bundler With that, our environment is ready for the next steps in this chapter. If you wish to see more details about this installation, please refer to the following web page: https:// www.digitalocean.com/community/tutorials/how-to-install-ruby-onrails-with-rbenv-on-macos. Installing Ruby on Rails on Windows Follow these steps to install Ruby on Rails on Windows: 1. To set up our local environment, first, we must install Git for Windows. We can download the package from https://gitforwindows.org/. Once downloaded, we can run the installer; it should open the installer application:  Figure 7.1 – Git installer You can safely accept the default options unless you want to change any of the specific behavior from Git. At the end of the installation process, you may just deselect all the options of the wizard and move on to the next step:  Figure 7.2 – Git finalized installation We will also need the Git SDK installed for some dependencies that Ruby on Rails requires.  We can get the installer from https://github.com/git-for-windows/buildextra/releases/tag/git-sdk-1.0.8. Be careful and select the correct option for your platform (32 or 64 bits). In my case, I had to choose 64 bits, so I downloaded the git-sdk-installer-1.0.8.0-64.7z.exe binary:  Figure 7.3 – Git SDK download Once this package has been downloaded, run it; we will be asked where we want the Git SDK to be installed. The default option is fine (C:\git-sdk-64):  Figure 7.4 – Git SDK installation location This package might take a while to complete as it has to download other additional packages but it will do so on its own. Please be patient. Once this package has finished installing the SDK, it will open a Git Bash console, which looks similar to Windows PowerShell. We can close this Git Bash console window and open another Windows PowerShell. Once we have the new window open, we must type the following command: new-item -type file -path $profile -force This command will help us create a Windows PowerShell profile, which will allow us to execute commands every time we open a Windows PowerShell console. Once we’ve run the previous command, we may also close the Windows PowerShell window, and move on to the next step. At this point, we will install rbenv, which allows us to install multiple versions of Ruby. However, this program wasn’t created for Windows, so its installation is a little different than in other operating systems. Let’s open a browser and go to the rbenv for Windows web page: https://github.com/ ccmywish/rbenv-for-windows. On that page, we will find instructions on how to install rbenv, which we will do now. Let’s open a new Windows PowerShell and type the following command: $env:RBENV_ROOT = "C:\Ruby-on-Windows " This command will set a special environment variable that will be used for the rbenv installation. 6. Once we’ve run this command, we must download the rest of the required files with the following command: iwr -useb "https://github.com/ccmywish/rbenv-for-windows/raw/ main/tools/install.ps1" | iex 7. Once this command has finished downloading the files from GitHub, modify the user’s profile with the following command from within the Windows PowerShell: notepad $profile This will open the Notepad application and the profile we previously set. On the rbenv-for-windows web page, we can see what the content of the file should be. Let’s add it with Notepad so that the profile file now looks like this:  $env:RBENV_ROOT = "C:\Ruby-on-Windows" & "$env:RBENV_ROOT\rbenv\bin\rbenv.ps1" initSave and close Notepad, and close all Windows PowerShell windows that we may have open. We should open a new Windows PowerShell to make these changes take effect. As this is the first time rbenv is running, our console will automatically install a default Ruby version. This might take a while and will put our patience to the test. Once the process has finished, we should see an output similar to this one:  Figure 7.5 – rbenv post-installation script Now, we are ready to install other versions of Ruby. For Ruby on Rails 5, we will install Ruby 2.6.10. Let’s install it by running the following command on the same Windows Powershell window that we just opened: rbenv install 2.6.10 The program will ask us whether we want to install the Lite version or the Full version. Choose the Full version. Once again, this might take a while, so please be patient. Once this command has finished running, we must set this Ruby version for our whole system.  We can do this by running the following command: rbenv global 2.6.10 To confirm that this version of Ruby has been installed and enabled, use the following command: ruby --version This should give us the following output: ruby 2.6.10-1 (set by C: \Ruby-on-Windows \global.txt) 12. Ruby needs a program called bundler to manage all the dependencies on our system. So, let’s install this program with the following command: gem install bundler 13. Once this gem has been installed, we must update the RubyGem system with the following command:  gem update –-system 3.2.3 This command will also take a while to compute, but once it’s finished, we will be ready to use Ruby on Rails on Windows. Next, let’s see the steps for installing Ruby on Rails on Linux. Installing Ruby on Rails on Linux For Ubuntu and Debian Linux distributions, we must also install rbenv and the dependencies necessary for Ruby on Rails to run correctly: 1. Let’s start by opening a terminal and running the following command: sudo apt update 2. Once this command has finished updating apt, we must install our dependencies for Ruby, Ruby on Rails, and some gems that require compiling. We’ll do so by running the following command: sudo apt install git curl libssl-dev libreadline-dev zlib1gdev autoconf bison build-essential libyaml-dev libreadline-dev libncurses5-dev libffi-dev libgdbm-dev pkg-config sqlite3 nodejs This command might take a while. Once it has finished running, we can install rbenv with the following command: curl -fsSL https://github.com/rbenv/rbenv-installer/raw/HEAD/ bin/rbenv-installer | bash We should add rbenv to our $PATH. Let’s do so by running the following command: echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc Now, let’s add the initialize command for rbenv to our bash profile with the following command: echo 'eval "$(rbenv init -)"' >> ~/.bashrc Next, run the bash profile with the following command: source ~/.bashrc 7. This command will (among other things) make the rbenv executable available to us. Now, we can install Ruby 2.6.10 on our system with the following command: rbenv install 2.6.10 This command might take a little while as it installs openssl and that process will take some time. Once this command has finished installing Ruby 2.6.10, we need to set it as the default Ruby version for the whole machine. We can do so by running the following command: rbenv global 2.6.10 We can confirm that this version of Ruby has been installed by running the following command: ruby --version This will result in the following output: ruby 2.6.10p210 (2022-04-12 revision 67958) [x86_64-linux] Ruby needs a program called bundler to manage all the dependencies on our system. So, let’s install this program with the following command: gem install bundler Once this gem has been installed, we can update the RubyGems system with the following command:  gem update –-system 3.2.3 This command will also take a while to compute, but once it’s finished, we will be ready to use Ruby on Rails on Linux. For other Linux distributions and other operating systems, please refer to the official Ruby-lang page: https://www.ruby-lang.org/en/documentation/installation/. Conclusion In conclusion, installing Ruby on Rails varies across operating systems, but the general process involves setting up a version manager like rbenv, installing Ruby, and then using Bundler to manage gems. Whether you're on macOS, Windows, or Linux, each system has specific steps to ensure Rails and its dependencies run smoothly. By following the detailed instructions for your platform, you'll have your development environment ready for Rails in no time. For further guidance and platform-specific nuances, refer to the official documentation and resources linked throughout this guide. Author BioBernard Pineda is a seasoned developer with 20 years of web development experience. Proficient in PHP, Ruby, Python, and other backend technologies, he has taught PHP and PHP-based frameworks through video courses on platforms like LinkedIn Learning. His extensive work with Ruby and Ruby on Rails, along with curiosity in frontend development and game development, bring a diverse perspective to this book. Currently working as a Site Reliability Engineer in Silicon Valley, Bernard is always seeking new adventures.
Read more
  • 0
  • 0
  • 589

article-image-gaming-in-the-metaverse
Irena Cronin, Robert Scoble
24 Oct 2024
10 min read
Save for later

Gaming in the Metaverse

Irena Cronin, Robert Scoble
24 Oct 2024
10 min read
This article is an excerpt from the book, The Immersive Metaverse Playbook for Business Leaders, by Irena Cronin, Robert Scoble. This book explains what the metaverse is and why it is of utmost value to business decision-makers. The chapters help you get a solid understanding of the concepts and roles that augmented reality and virtual reality play, along with providing information on metaverse technologies, as well as thought-provoking consumer and enterprise use cases.Introduction In the Metaverse’s expansive gaming landscape, several compelling use cases emerge. Gamers become creators and modifiers, democratizing game development, with quality control as a challenge. Crossplatform gaming integration fosters an inclusive gaming community, while blockchain-backed virtual merchandise and collectibles introduce new opportunities with authenticity and copyright concerns. Virtual esports tournaments become global events, requiring stringent security measures. In-game advertising and product placement offer marketing potential, but striking a balance with player experience is vital. These use cases exemplify the diverse facets of gaming in the Metaverse, highlighting innovation and challenges in the pursuit of immersive digital gaming experiences. Let’s take a closer look at some use cases. Use case 1 – game creation and modification This use case exemplifies how the Metaverse empowers gamers to become active contributors to the gaming industry, shaping its future through their creativity and innovation. It highlights the democratization of game development and the dynamic synergy between technology, interactivity, and the challenges that come with it in this evolving digital realm. The setup Within the expansive and thriving Metaverse gaming landscape, a remarkable facet emerges where 3D and 2D virtual gamers are not just players but empowered creators and modifiers of games themselves. The Metaverse offers a vast canvas, brimming with opportunities for individuals and teams to craft unique gaming experiences that cater to a global audience. Interactivity In this immersive gaming domain, players transition into creators as they engage with innovative game creation and modification tools which include the use of generative AI. These tools empower users to design levels, characters, and gameplay mechanics, breathing life into their imaginative concepts. Collaborative platforms within the Metaverse foster teamwork, allowing multiple creators to combine their skills and ideas seamlessly. Technical innovation The Metaverse’s technical innovation shines through in the form of user-friendly game development platforms that bridge the gap between novice creators and experienced developers. These platforms offer intuitive interfaces, drag-and-drop functionality, and pre-built assets, making game design accessible to a wide range of enthusiasts. AI-driven game design assistance provides suggestions and optimizations, reducing the learning curve for newcomers. And with generative AI, soon whole 3D, as well as 2D, games could be fully developed. Challenges While the Metaverse fuels creativity and democratizes game development, several challenges emerge on this vibrant frontier. Balancing the influx of user-generated content with quality control becomes pivotal. Moderation systems must ensure that games meet basic quality standards and are free from malicious or inappropriate content. Additionally, striking a harmonious balance between open creativity and maintaining fair play in modified games poses an ongoing challenge. Ensuring that user-created content doesn’t disrupt the gaming experience for others is a priority. Continuous development and refinement of moderation and quality control mechanisms are essential to maintain a thriving and enjoyable gaming ecosystem within the Metaverse. Use case 2 – cross-platform gaming integration This use case illustrates how the Metaverse transcends the limitations of individual gaming platforms, fostering a more inclusive and interconnected gaming community. Cross-platform gaming integration enhances the social and competitive aspects of gaming, enabling players to unite in a shared virtual gaming universe. As the Metaverse continues to evolve, it reshapes the way we perceive and engage in gaming, offering a glimpse into the future of interactive entertainment. The setup Within the expansive Metaverse gaming landscape, cross-platform gaming integration becomes a prominent feature. This innovation allows players from various gaming platforms and devices to seamlessly interact and play together, breaking down traditional gaming silos. Interactivity In this interconnected Metaverse, players can engage in cross-platform gaming experiences with friends and gamers from around the world. Whether you’re on a PC, console, VR headset, or mobile device, you can join the same virtual gaming universe. Gamers can form diverse teams and alliances, fostering a sense of community that transcends hardware preferences. This integration offers unprecedented opportunities for collaboration and competition. Technical innovation The technical innovation driving this use case is the development of cross-platform compatibility protocols and infrastructure. These innovations bridge the gaps between different gaming ecosystems, allowing for cross-device gameplay. Advanced matchmaking algorithms ensure that players of similar skill levels can enjoy fair and balanced gaming experiences, regardless of their chosen platform. This technical integration transforms the Metaverse into a truly inclusive gaming space. Challenges While cross-platform gaming integration is a remarkable achievement, it comes with its own set of challenges. Ensuring a level playing field for all players, regardless of their platform, requires ongoing fine-tuning of matchmaking algorithms. Addressing potential disparities in hardware capabilities, such as graphics processing power, can be complex. Additionally, maintaining a secure gaming environment across diverse platforms is essential to prevent cheating, unauthorized access, and other security concerns. Use case 3 – game-related merchandise and collectibles This use case showcases how the Metaverse transforms the concept of gaming merchandise and collectibles, offering a virtual marketplace where gamers can not only enhance their in-game experiences but also indulge in their passion for collecting virtual treasures. The integration of blockchain technology adds a layer of trust and scarcity to these digital possessions, creating a virtual economy that mirrors the real-world collectibles market. The setup Within the Metaverse, a vibrant and bustling marketplace dedicated to gaming-related merchandise and collectibles emerges. This dynamic digital marketplace transforms the concept of gaming memorabilia, offering a diverse range of 3D and 2D virtual goods that hold significant value for gamers and collectors alike. It’s a virtual bazaar where gamers can immerse themselves in the culture of their favorite games beyond the confines of traditional gameplay. Interactivity In this immersive Metaverse marketplace, players gain the opportunity to personalize their avatars with a rich array of virtual gaming apparel and accessories. Gamers can browse an extensive catalog of virtual merchandise, including iconic character costumes, in-game items, and exclusive skins. This personalized customization allows players to showcase their gaming identity and immerse themselves even deeper into their favorite game worlds. Technical innovation At the heart of this use case lies the groundbreaking implementation of blockchain technology. This innovation plays a pivotal role in securing virtual collectibles, offering gamers a sense of rarity and ownership verification akin to physical collectibles. Each virtual item is tokenized on the blockchain, ensuring its uniqueness and provenance. Gamers can confidently buy, sell, and trade virtual merchandise, knowing that their digital possessions are genuine and scarce. In terms of the companies that offer game-related merchandise and collectibles, generative AI provides an inexpensive, fast, and easy way to create assets. Challenges While this Metaverse marketplace promises exciting opportunities, it also presents unique challenges. Ensuring the authenticity of virtual merchandise is paramount. The presence of counterfeit or unauthorized virtual items could undermine the trust and value within the marketplace. Additionally, addressing potential copyright issues related to virtual merchandise is a central concern. Striking a balance between allowing creative expression and protecting intellectual property rights is essential to maintaining a thriving and ethical marketplace. Negative implications of gaming in the Metaverse Gaming in the Metaverse, while promising incredible innovation and immersive experiences, also carries negative implications that span technological, social, and ethical dimensions. These potential drawbacks must be considered alongside the benefits to ensure a balanced perspective on this digital frontier. Technological implications Dependency on technology: As gaming in the Metaverse becomes increasingly sophisticated, there is a risk of individuals becoming overly dependent on technology for their entertainment and social interactions. This dependence may lead to issues related to screen time, addiction, and reduced physical activity. Technical glitches: The reliance on advanced technology for immersive gaming experiences introduces the possibility of technical glitches, server outages, or compatibility issues. These disruptions can frustrate players and disrupt their gaming experiences. Privacy concerns: The collection and utilization of user data within the Metaverse for targeted advertising and analytics can raise privacy concerns. Users may feel uncomfortable with the extent to which their online activities are monitored and analyzed. Social implications Social isolation: Immersive gaming experiences in the Metaverse could lead to social isolation as individuals spend more time in virtual environments and less time in physical social interactions. Loneliness and a lack of real-world social skills can result from excessive immersion. Economic disparities: Access to the Metaverse and its premium gaming experiences may be limited by socioeconomic factors. Those with greater financial resources may enjoy a significant advantage, potentially creating digital divides and exclusivity. Loss of physical interaction: The allure of the Metaverse may lead to a reduction in face-toface social interactions, which are crucial for human well-being. The diminished importance of real-world connections could have adverse effects on mental health and relationships. Ethical implications Exploitative monetization: In-game purchases and microtransactions within the Metaverse can sometimes exploit players, particularly younger individuals who may not fully understand the financial implications. This raises ethical questions about the gaming industry’s practices. Digital addiction: The highly immersive nature of gaming in the Metaverse may contribute to digital addiction, where individuals struggle to disengage from virtual experiences and prioritize them over real-world responsibilities. Content regulation: Balancing freedom of expression and maintaining a safe and inclusive gaming environment can be challenging. The Metaverse may struggle with regulating hate speech, inappropriate content, and cyberbullying. Psychological implications Escapism: While gaming can be a form of entertainment, excessive escapism into the Metaverse may indicate underlying psychological issues or a desire to avoid real-world problems. Impact on mental health: Long hours spent in virtual gaming worlds may lead to mental health issues such as anxiety, depression, and a distorted sense of reality. Cognitive overload: The complexity of immersive gaming experiences within the Metaverse can lead to cognitive overload, especially in younger players, potentially impacting their academic performance and cognitive development. Environmental implications Energy consumption: The infrastructure required to support the Metaverse’s immersive experiences and multiplayer environments can consume significant amounts of energy, contributing to environmental concerns. Electronic waste: As technology evolves rapidly, older gaming equipment and hardware can quickly become obsolete, leading to electronic waste disposal challenges. Conclusion In conclusion, the Metaverse is revolutionizing gaming with new opportunities for creativity, community, and commerce. It empowers gamers as creators, enables cross-platform play, introduces blockchain-backed collectibles, and hosts virtual esports tournaments. However, these advancements come with challenges like quality control, security, and balancing ads with player experience. Additionally, potential negative impacts such as technological dependency, social isolation, and ethical concerns must be addressed. By fostering innovation responsibly, the Metaverse can become a transformative and enriching space for gamers worldwide. Author BioIrena Cronin is SVP of Product for DADOS Technology, which is making an Apple Vision Pro data analytics and visualization app. She is also the CEO of Infinite Retina, which helps companies develop and implement AI, AR, and other new technologies for their businesses. Before this, she worked as an equity research analyst and gained extensive experience in evaluating both public and private companies. Cronin has an MS with Distinction in Information Technology/Management and Systems from New York University, and a joint MBA/MA from the University of Southern California. She has a BA from the University of Pennsylvania with a major in Economics (summa cum laude). Cronin speaks four languages, with a near-fluent proficiency in Mandarin.Robert Scoble has coauthored four books on technology innovation – each a decade before the said technology went completely mainstream. He has interviewed thousands of entrepreneurs in the tech industry and has long kept his social media audiences up to date on what is happening inside the world of tech, which is bringing us so many innovations. Robert currently tracks the AI industry and is the host of a new video show, Unaligned, where he interviews entrepreneurs from the thousands of AI companies he tracks as head of strategy for Infinite Retina.
Read more
  • 0
  • 0
  • 498

article-image-mastering-code-generation-exploring-the-llvm-backend
Kai Nacke, Amy Kwan
24 Oct 2024
10 min read
Save for later

Mastering Code Generation: Exploring the LLVM Backend

Kai Nacke, Amy Kwan
24 Oct 2024
10 min read
This article is an excerpt from the book, Learn LLVM 17 - Second Edition, by Kai Nacke, Amy Kwan. Learn how to build your own compiler, from reading the source to emitting optimized machine code. This book guides you through the JIT compilation framework, extending LLVM in a variety of ways, and using the right tools for troubleshooting.Introduction Generating optimized machine code is a critical task in the compilation process, and the LLVM backend plays a pivotal role in this transformation. The backend translates the LLVM Intermediate Representation (IR), derived from the Abstract Syntax Tree (AST), into machine code that can be executed on target architectures. Understanding how to generate this IR effectively is essential for leveraging LLVM's capabilities. This article delves into the intricacies of generating LLVM IR using a simple expression language example. We'll explore the necessary steps, from declaring library functions to implementing a code generation visitor, ensuring a comprehensive understanding of the LLVM backend's functionality. Generating code with the LLVM backend The task of the backend is to create optimized machine code from the LLVM IR of a module. The IR is the interface to the backend and can be created using a C++ interface or in textual form. Again, the IR is generated from the AST. Textual representation of LLVM IR Before trying to generate the LLVM IR, it should be clear what we want to generate. For our example expression language, the high-level plan is as follows: 1. Ask the user for the value of each variable. 2. Calculate the value of the expression. 3. Print the result. To ask the user to provide a value for a variable and to print the result, two library functions are used: calc_read() and calc_write(). For the with a: 3*a expression, the generated IR is as follows: 1. The library functions must be declared, like in C. The syntax also resembles C. The type before the function name is the return type. The type names surrounded by parenthesis are the argument types. The declaration can appear anywhere in the file: declare i32 @calc_read(ptr) declare void @calc_write(i32) 2. The calc_read() function takes the variable name as a parameter. The following construct defines a constant, holding a and the null byte used as a string terminator in C: @a.str = private constant [2 x i8] c"a\00" 3. It follows the main() function. The parameter names are omitted because they are not used.  Just as in C, the body of the function is enclosed in braces: define i32 @main(i32, ptr) { 4. Each basic block must have a label. Because this is the first basic block of the function, we name it entry: entry: 5. The calc_read() function is called to read the value for the a variable. The nested getelemenptr instruction performs an index calculation to compute the pointer to the first element of the string constant. The function result is assigned to the unnamed %2 variable.  %2 = call i32 @calc_read(ptr @a.str) 6. Next, the variable is multiplied by 3:  %3 = mul nsw i32 3, %2 7. The result is printed on the console via a call to the calc_write() function:  call void @calc_write(i32 %3) 8. Last, the main() function returns 0 to indicate a successful execution:  ret i32 0 } Each value in the LLVM IR is typed, with i32 denoting the 32-bit bit integer type and ptr denoting a pointer. Note: previous versions of LLVM used typed pointers. For example, a pointer to a byte was expressed as i8* in LLVM. Since L LVM 16, opaque pointers are the default. An opaque pointer is just a pointer to memory, without carrying any type information about it. The notation in LLVM IR is ptr.Previous versions of LLVM used typed pointers. For example, a pointer to a byte was expressed as i8* in LLVM. Since L LVM 16, opaque pointers are the default. An opaque pointer is just a pointer to memory, without carrying any type information about it. The notation in LLVM IR is ptr. Since it is now clear what the IR looks like, let’s generate it from the AST.  Generating the IR from the AST The interface, provided in the CodeGen.h header file, is very small:  #ifndef CODEGEN_H #define CODEGEN_H #include "AST.h" class CodeGen { public: void compile(AST *Tree); }; #endifBecause the AST contains the information, the basic idea is to use a visitor to walk the AST. The CodeGen.cpp file is implemented as follows: 1. The required includes are at the top of the file: #include "CodeGen.h" #include "llvm/ADT/StringMap.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/Support/raw_ostream.h" 2. The namespace of the LLVM libraries is used for name lookups: using namespace llvm; 3. First, some private members are declared in the visitor. Each compilation unit is represented in LLVM by the Module class and the visitor has a pointer to the module called M. For easy IR generation, the Builder (of type IRBuilder<>) is used. LLVM has a class hierarchy to represent types in IR. You can look up the instances for basic types such as i32 from the LLVM context. These basic types are used very often. To avoid repeated lookups, we cache the needed type instances: VoidTy, Int32Ty, PtrTy, and Int32Zero. The V member is the current calculated value, which is updated through the tree traversal. And last, nameMap maps a variable name to the value returned from the calc_read() function: namespace { class ToIRVisitor : public ASTVisitor { Module *M; IRBuilder<> Builder; Type *VoidTy; Type *Int32Ty; PointerType *PtrTy; Constant *Int32Zero; Value *V; StringMap<Value *> nameMap;4. The constructor initializes all members: public: ToIRVisitor(Module *M) : M(M), Builder(M->getContext()) { VoidTy = Type::getVoidTy(M->getContext()); Int32Ty = Type::getInt32Ty(M->getContext()); PtrTy = PointerType::getUnqual(M->getContext()); Int32Zero = ConstantInt::get(Int32Ty, 0, true); }5. For each function, a FunctionType instance must be created. In C++ terminology, this is a function prototype. A function itself is defined with a Function instance. The run() method defines the main() function in the LLVM IR first:  void run(AST *Tree) { FunctionType *MainFty = FunctionType::get( Int32Ty, {Int32Ty, PtrTy}, false); Function *MainFn = Function::Create( MainFty, GlobalValue::ExternalLinkage, "main", M); 6. Then we create the BB basic block with the entry label, and attach it to the IR builder: BasicBlock *BB = BasicBlock::Create(M->getContext(), "entry", MainFn); Builder.SetInsertPoint(BB);7. With this preparation done, the tree traversal can begin:     Tree->accept(*this); 8. After the tree traversal, the computed value is printed via a call to the calc_write() function. Again, a function prototype (an instance of FunctionType) has to be created. The only parameter is the current value, V:  FunctionType *CalcWriteFnTy = FunctionType::get(VoidTy, {Int32Ty}, false); Function *CalcWriteFn = Function::Create( CalcWriteFnTy, GlobalValue::ExternalLinkage, "calc_write", M); Builder.CreateCall(CalcWriteFnTy, CalcWriteFn, {V});9. The generation finishes  by returning 0 from the main() function: Builder.CreateRet(Int32Zero); }10. A WithDecl node holds the names of the declared variables. First, we create a function prototype for the calc_read() function:  virtual void visit(WithDecl &Node) override { FunctionType *ReadFty = FunctionType::get(Int32Ty, {PtrTy}, false); Function *ReadFn = Function::Create( ReadFty, GlobalValue::ExternalLinkage, "calc_read", M);11. The method loops through the variable names:  for (auto I = Node.begin(), E = Node.end(); I != E; ++I) { 12. For each  variable, a string  with a variable name is created:  StringRef Var = *I; Constant *StrText = ConstantDataArray::getString( M->getContext(), Var); GlobalVariable *Str = new GlobalVariable( *M, StrText->getType(), /*isConstant=*/true, GlobalValue::PrivateLinkage, StrText, Twine(Var).concat(".str"));13. Then the IR code to call the calc_read() function is created. The string created in the previous step is passed as a parameter:  CallInst *Call = Builder.CreateCall(ReadFty, ReadFn, {Str});14. The returned value is stored in the mapNames map for later use:  nameMap[Var] = Call; }15. The tree traversal continues with the expression:  Node.getExpr()->accept(*this); };16. A Factor node is either a variable name or a number. For a variable name, the value is looked up in the mapNames map. For a number, the value is converted to an integer and turned into a constant value: virtual void visit(Factor &Node) override { if (Node.getKind() == Factor::Ident) { V = nameMap[Node.getVal()]; } else { int intval; Node.getVal().getAsInteger(10, intval); V = ConstantInt::get(Int32Ty, intval, true); } };17. And last, for a BinaryOp node, the right calculation operation must be used:  virtual void visit(BinaryOp &Node) override { Node.getLeft()->accept(*this); Value *Left = V; Node.getRight()->accept(*this); Value *Right = V; switch (Node.getOperator()) { case BinaryOp::Plus: V = Builder.CreateNSWAdd(Left, Right); break; case BinaryOp::Minus: V = Builder.CreateNSWSub(Left, Right); break; case BinaryOp::Mul: V = Builder.CreateNSWMul(Left, Right); break; case BinaryOp::Div: V = Builder.CreateSDiv(Left, Right); break;    }       };       };       }18. With this, the visitor class is complete. The compile() method creates the global context and the  module, runs the tree traversal, and dumps the generated IR to the console:  void CodeGen::compile(AST *Tree) { LLVMContext Ctx; Module *M = new Module("calc.expr", Ctx); ToIRVisitor ToIR(M); ToIR.run(Tree); M->print(outs(), nullptr); }We now have implemented the frontend of the compiler, from reading the source up to generating the IR. Of course, all these components must work together on user input, which is the task of the compiler driver. We also need to implement the functions needed at runtime. Both are topics of the next section covered in the book. Conclusion In conclusion, the process of generating LLVM IR from an AST involves multiple steps, each crucial for producing efficient machine code. This article highlighted the structure and components necessary for this task, including function declarations, basic block management, and tree traversal using a visitor pattern. By carefully managing these elements, developers can harness the power of LLVM to create optimized and reliable machine code. The integration of all these components, alongside user input and runtime functions, completes the frontend implementation of the compiler. This sets the stage for the next phase, focusing on the compiler driver and runtime functions, ensuring seamless execution and integration of the compiled code. Author BioKai Nacke is a professional IT architect currently residing in Toronto, Canada. He holds a diploma in computer science from the Technical University of Dortmund, Germany. and his diploma thesis on universal hash functions was recognized as the best of the semester. With over 20 years of experience in the IT industry, Kai has extensive expertise in the development and architecture of business and enterprise applications. In his current role, he evolves an LLVM/clang-based compiler.nFor several years, Kai served as the maintainer of LDC, the LLVM-based D compiler. He is the author of D Web Development and Learn LLVM 12, both published by Packt. In the past, he was a speaker in the LLVM developer room at the Free and Open Source Software Developers’ European Meeting (FOSDEM).Amy Kwan is a compiler developer currently residing in Toronto, Canada. Originally, from the Canadian prairies, Amy holds a Bachelor of Science in Computer Science from the University of Saskatchewan. In her current role, she leverages LLVM technology as a backend compiler developer. Previously, Amy has been a speaker at the LLVM Developer Conference in 2022 alongside Kai Nacke.
Read more
  • 0
  • 0
  • 386

article-image-connecting-cloud-object-storage-with-databricks-unity-catalog
Pulkit Chadha
22 Oct 2024
10 min read
Save for later

Connecting Cloud Object Storage with Databricks Unity Catalog

Pulkit Chadha
22 Oct 2024
10 min read
This article is an excerpt from the book, Data Engineering with Databricks Cookbook, by Pulkit Chadha. This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you’ll implement DataOps and DevOps practices, and orchestrate data workflows.IntroductionDatabricks Unity Catalog allows you to manage and access data in cloud object storage using a unified namespace and a consistent set of APIs. With Unity Catalog, you can do the following: Create and manage storage credentials, external locations, storage locations, and volumes using SQL commands or the Unity Catalog UI Access data from various cloud platforms (AWS S3, Azure Blob Storage, or Google Cloud Storage) and storage formats (Parquet, Delta Lake, CSV, or JSON) using the same SQL syntax or Spark APIs Apply fine-grained access control and data governance policies to your data using Databricks SQL Analytics or Databricks Runtime In this article, you will learn what Unity Catalog is and how it integrates with AWS S3. Getting ready Before you start setting up and configuring Unity Catalog, you need to have the following prerequisites: A Databricks workspace with administrator privileges A Databricks workspace with the Unity Catalog feature enabled A cloud storage account (such as AWS S3, Azure Blob Storage, or Google Cloud Storage) with the necessary permissions to read and write data How to do it… In this section, we will first create a storage credential, the IAM role, with access to an s3 bucket. Then, we will create an external location in Databricks Unity Catalog that will use the storage credential to access the s3 bucket. Creating a storage credential You must create a storage credential to access data from an external location or a volume. In this example, you will create a storage credential that uses an IAM role taccess the S3 Bucket. The steps are as follows: 1. Go to Catalog Explorer: Click on Catalog in the left panel and go to Catalog Explorer. 2. Create storage credentials: Click on +Add and select Add a storage credential. Figure 10.1 – Add a storage credential 3. Enter storage credential details: Give the credential a name, the IAM role ARN that allows Unity Catalog to access the storage location on your cloud tenant, and a comment if you want, and click on Create.  Figure 10.2 – Create a new storage credential Important note To learn more about IAM roles in AWS, you can reference the user guide here: https:// docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html. 4. Get External ID: In the Storage credential created dialog, copy the External ID value and click on Done.  Figure 10.3 – External ID for the storage credential 5. Update the trust policy with an External ID: Update the trust policy associated with the IAM role and add the External ID value for sts:ExternalId:  Figure 10.4 – Updated trust policy with External ID Creating an external location An external location contains a reference to a storage credential and a cloud storage path. You need to create an external location to access data from a custom storage location that Unity Catalog uses to reference external tables. In this example, you will create an external location that points to the de-book-ext-loc folder in an S3 bucket. To create an external location, you can follow these steps: 1. Go to Catalog Explorer: Click on Catalog in the left panel to go to Catalog Explorer. 2. Create external location: Click on +Add and select Add an external location:  Figure 10.5 – Add an external location 3. Pick an external location creation method: Select Manual and then click on Next:  Figure 10.6 – Create a new external location 4. Enter external location details: Enter the external location name, select the storage credential, and enter the S3 URL; then, click on the Create button:  Figure 10.7 – Create a new external location manually 5. Test connection: Test the connection to make sure you have set up the credentials accurately and that Unity Catalog is able to access cloud storage:  Figure 10.8 – Test connection for external location If everything is set up right, you should see a screen like the following. Click on Done:  Figure 10.9 – Test connection results See also Databricks Unity Catalog: https://www.databricks.com/product/unity-catalog What is Unity Catalog: https://docs.databricks.com/en/data-governance/ unity-catalog/index.html Databricks Unity Catalog documentation: https://docs.databricks.com/en/ compute/access-mode-limitations.html Databricks SQL documentation: https://docs.databricks.com/en/datagovernance/unity-catalog/create-tables.html Databricks Unity Catalog: A Comprehensive Guide to Features, Capabilities, and Architecture: https://atlan.com/databricks-unity-catalog/ Step By Step Guide on Databricks Unity Catalog Setup and its key Features: https:// medium.com/@sauravkum780/step-by-step-guide-on-databricks-unitycatalog-setup-and-its-features-1d0366c282b7 Conclusion In summary, connecting to cloud object storage using Databricks Unity Catalog provides a streamlined approach to managing and accessing data across various cloud platforms such as AWS S3, Azure Blob Storage, and Google Cloud Storage. By utilizing a unified namespace, consistent APIs, and powerful governance features, Unity Catalog simplifies the process of creating and managing storage credentials and external locations. With built-in fine-grained access controls, you can securely manage data stored in different formats and cloud environments, all while leveraging Databricks' powerful data analytics capabilities. This guide walks through setting up an IAM role and creating an external location in AWS S3, demonstrating how easy it is to connect cloud storage with Unity Catalog. Author BioPulkit Chadha is a seasoned technologist with over 15 years of experience in data engineering. His proficiency in crafting and refining data pipelines has been instrumental in driving success across diverse sectors such as healthcare, media and entertainment, hi-tech, and manufacturing. Pulkit’s tailored data engineering solutions are designed to address the unique challenges and aspirations of each enterprise he collaborates with.
Read more
  • 0
  • 0
  • 769

article-image-understanding-data-structures-in-swift
Avi Tsadok
22 Oct 2024
10 min read
Save for later

Understanding Data Structures in Swift

Avi Tsadok
22 Oct 2024
10 min read
This article is an excerpt from the book, The Ultimate iOS Interview Playbook, by Avi Tsadok. The iOS Interview Guide is an essential book for iOS developers who want to maximize their skills and prepare for the competitive world of interviews on their way to getting their dream job. The book covers all the crucial aspects, from writing a resume to reviewing interview questions, and passing the architecture interview successfully.Introduction In iOS development, data structures are fundamental tools for managing and organizing data. Whether you are preparing for a technical interview or building robust iOS applications, mastering data structures like arrays, dictionaries, and sets is essential. This tutorial will guide you through the essential data structures in Swift, explaining their importance and providing practical examples of how to use them effectively. By the end of this tutorial, you will have a solid understanding of how to work with these data structures, making your code more efficient, modular, and reusable. Prerequisites Before diving into the tutorial, make sure you have the following prerequisites: Familiarity with Swift Programming Language: A basic understanding of Swift syntax, including variables, functions, and control flow, is essential for this tutorial. Xcode Installed: Ensure you have Xcode installed on your Mac. You can download it from the Mac App Store if you haven’t done so already. Basic Understanding of Object-Oriented Programming: Knowing concepts such as classes and objects will help you better understand the examples provided in this tutorial. Step-by-Step Instructions 1. Learning the Importance of Data Structures Data structures play a crucial role in iOS development. They allow you to store, manage, and manipulate data efficiently, which is especially important in performance-sensitive applications. Whether you're handling user data, managing app state, or working with APIs, choosing the right data structure can significantly impact your app's performance and scalability. Swift provides several built-in data structures, including arrays, dictionaries, and sets. Each of these data structures offers unique advantages and is suitable for different use cases. Understanding when and how to use each of them is a key skill for any iOS developer. 2. Working with Arrays Arrays are one of the most commonly used data structures in Swift. They allow you to store ordered collections of elements, making them ideal for tasks that require sequential access to data. Declaring and Initializing an Array To declare and initialize an array in Swift, you can use the following syntax: var numbers: [Int] = [1, 2, 3, 4, 5] This creates an array of integers containing the values 1 through 5. Arrays in Swift are type-safe, meaning you can only store elements of the specified type (in this case, Int). Removing Duplicates from an Array A common task in programming is to remove duplicate elements from an array. Swift makes this easy by converting the array into a Set, which automatically removes duplicates, and then converting it back into an array: let arrayWithDuplicates = [1, 2, 3, 3, 4, 5, 5] let arrayWithNoDuplicates = Array(Set(arrayWithDuplicates)) This approach is efficient and concise, leveraging the unique properties of sets to remove duplicates. Iterating Over an Array Arrays provide several methods for iterating over their elements. The most common approach is to use a for-in loop: for number in numbers {    print(number) } This loop prints each element in the array to the console. You can also use methods like map, filter, and reduce for more advanced operations on arrays. 3. Implementing a Queue Using an Array A queue is a data structure that follows the First-In-First-Out (FIFO) principle, where the first element added is the first one to be removed. Queues are commonly used in scenarios like task scheduling, breadth-first search algorithms, and managing requests in networking. In Swift, you can implement a basic queue using an array. Here’s an example: struct Queue<Element> {    private var array: [Element] = []    var isEmpty: Bool {        return array.isEmpty    }    var count: Int {        return array.count    }    mutating func enqueue(_ element: Element) {        array.append(element)    }    mutating func dequeue() -> Element? {        return array.isEmpty ? nil : array.removeFirst()    } } In this implementation: The enqueue method adds an element to the end of the array. The dequeue method removes and returns the first element in the array. Queues are useful in many scenarios, such as managing tasks in a multi-threaded environment or implementing a breadth-first search algorithm. 4. Dictionaries in Swift Dictionaries are another powerful data structure in Swift. They store data in key-value pairs, allowing you to quickly look up values based on their associated keys. Dictionaries are ideal for tasks where you need fast access to data based on a unique identifier. Declaring and Initializing a Dictionary Here’s how you can declare and initialize a dictionary in Swift: var userAges: [String: Int] = ["Alice": 25, "Bob": 30] In this example, the keys are strings representing user names, and the values are integers representing their ages. Accessing and Modifying Dictionary Values You can access and modify values in a dictionary using the key: if let age = userAges["Alice"] {    print("Alice is \(age) years old.") } userAges["Alice"] = 26 This code snippet retrieves Alice's age and updates it to 26. Dictionaries are highly efficient for lookups, making them a valuable tool when working with large datasets. Adding and Removing Key-Value Pairs To add a new key-value pair to a dictionary, simply assign a value to a new key: userAges["Charlie"] = 22 To remove a key-value pair, use the removeValue(forKey:) method: userAges.removeValue(forKey: "Bob") 5. Exploring Sets Sets in Swift are similar to arrays, but with one key difference: they do not allow duplicate elements. Sets are unordered collections of unique elements, making them ideal for tasks like checking membership, ensuring uniqueness, and performing set operations (e.g., union, intersection). Declaring and Initializing a Set Here’s how you can declare and initialize a set in Swift:  let uniqueNumbers: Set = [1, 2, 3, 4, 5] Unlike arrays, sets do not maintain the order of elements. However, they are more efficient for operations like checking if an element exists. Performing Set Operations Swift sets support various operations that are common in set theory, such as union, intersection, and subtraction: let evenNumbers: Set = [2, 4, 6, 8] let oddNumbers: Set = [1, 3, 5, 7] let union = evenNumbers.union(oddNumbers)  // All unique elements from both sets let intersection = evenNumbers.intersection([4, 5, 6])  // Elements common to both sets let difference = evenNumbers.subtracting([4, 6])  // Elements in evenNumbers but not in the other set 6. Understanding the Codable Protocol The Codable protocol in Swift simplifies encoding and decoding data, making it easier to work with JSON and other data formats. This is especially useful when interacting with web APIs or saving data to disk. Defining a Codable Struct Here’s an example of a struct that conforms to the Codable protocol: struct Person: Codable {    var name: String    var age: Int    var address: String } With Codable, you can easily encode and decode instances of Person using JSONEncoder and JSONDecoder: let person = Person(name: "Alice", age: 25, address: "123 Main St") let jsonData = try JSONEncoder().encode(person) let decodedPerson = try JSONDecoder().decode(Person.self, from: jsonData) Output and Explanation For each code snippet, you should test and verify that the output matches the expected results. For example, when implementing the queue structure, enqueue and dequeue elements to ensure the correct order of processing. Similarly, when working with dictionaries, confirm that you can retrieve, add, and remove key-value pairs correctly. Conclusion This tutorial has covered fundamental data structures in Swift, including arrays, dictionaries, and sets, and their practical applications in iOS development. Understanding these data structures will make you a better Swift developer and prepare you for technical interviews and real-world projects. Author BioAvi Tsadok, seasoned iOS developer with a 13-year career, has proven his expertise leading projects for notable companies like Any.do, a top productivity app, and currently at Melio Payments, where he steers the mobile team. Known for his ability to simplify complex tech concepts, Avi has written four books and published 40+ tutorials and articles that enlighten and empower aspiring iOS developers. His voice resonates beyond the page, as he's a recognized public speaker and has conducted numerous interviews with fellow iOS professionals, furthering the field's discourse and development.
Read more
  • 0
  • 0
  • 481
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-deploying-ar-experiences-onto-mobile-devices
Anna Braun, Raffael Rizzo
21 Oct 2024
5 min read
Save for later

Deploying AR Experiences onto Mobile Devices

Anna Braun, Raffael Rizzo
21 Oct 2024
5 min read
This article is an excerpt from the book, XR Development with Unity, by Anna Braun, Raffael Rizzo. This practical guide helps you create immersive VR, AR, and MR experiences using Unity 2021.3 or later versions. You’ll learn to add physics, animations, teleportation, sound, effects, and hand-tracking to XR scenes and deploy them on VR headsets, simulators, and mobile devices—all that you need to create interactive XR projects in Unity is here.Introduction In this article, you will learn to launch your AR experiences onto smartphones or tablets. Primarily, you have two paths to accomplish this: deployment onto an Android or an iOS device. For solo projects, where the AR application is meant for personal use, you may opt to deploy onto just Android or iOS, depending on your device’s operating system. However, for larger-scale projects that involve several users – be it academic, industrial, or any other group, irrespective of its size – it’s advisable to deploy and test the AR app on both Android and iOS platforms. This strategy has multiple benefits. First, if your application gains momentum or its usage expands, it would already be compatible with both major platforms, eliminating the need for time-consuming porting later on. Second, making your app accessible on both platforms from the outset can draw in more users, and possibly attract increased funding or support. Another key advantage to this is the cross-platform compatibility offered by Unity. This enables you to maintain a singular code base for both platforms, simplifying the management and updating process for your application. Any modifications made need to be done in one location and then deployed across both platforms. In the next section, we’ll delve into the steps required to deploy your AR scene onto an Android device. Deploying onto Android This section outlines the procedure to deploy your AR scene onto an Android device. The initial part of the process involves enabling some settings on your phone to prepare it for testing. Here’s a step-by-step guide: Confirm that your device is compatible with ARCore. ARCore is essential for AR Foundation to work correctly. You can find a list of supported devices at https://developers. google.com/ar/devices. Install ARCore, which AR Foundation uses to enable AR capabilities on Android devices. ARCore can be downloaded from the Google Play Store at https://play.google.com/ store/apps/details?id=com.google.ar.core. Activate Developer Options. To do this, open Settings on your Android device, scroll down, and select About Phone. Find Build number and tap it seven times until a message appears stating You are now a developer! Upon returning to the main Settings menu, you should now see an option called Developer Options. If it’s not present, perform an online search to find out how to enable developer mode for your specific device. Though the method described in the previous step is the most common, the variety of Android devices available might require slightly different steps. With Developer Options enabled, turn on USB Debugging. This will allow you to transfer your AR scene to your Android device via a USB cable. Navigate to Settings | Developer options, scroll down to USB Debugging, and switch it on. Acknowledge any pop-up prompts. Depending on your Android version, you might need to allow the installation of apps from unknown sources: For Android versions 7 (Nougat) and earlier: Navigate to Settings | Security Settings and then check the box next to Unknown Sources to allow the installation of apps from sources other than the Google Play Store. For Android versions 8 (Oreo) and above: Select Settings |Apps & Notifications | Special App Access | Install unknown apps and activate Unknown sources. You will see a list of apps that you can grant permission to install from unknown sources. This is where you select the File Manager app, as you’re using it to download the unknown app from Unity. 7. Link your Android device to your computer using a USB cable. You can typically use your device’s charging cable for this. A prompt will appear on your Android device asking for permission to allow USB debugging from your computer. Confirm it. With your Android device properly prepared for testing AR scenes, you can now proceed to deploy your Unity AR scene. This involves adjusting several parameters in the Unity Editor’s Build Settings and Player Settings. Here’s a step-by-step guide on how to do this: Select File | Build Settings | Android, then click the Switch Platform button. Now, your Build Settings should look something like what is illustrated in Figure 4.17.  Figure 4.17 – Unity’s Build Settings configuration for Android Next, click on the Player Settings button. This will open a new window, also called Player Settings. Here, select the Android tab, scroll down, and set Minimum API Level to Android 7.0 Nougat (API level 24) or above. This is crucial, as ARCore requires at least Android 7.0 to function properly. Remaining in the Android tab of Player Settings, enter a package name. Ensure it follows the pattern com.company_name.application_name. This pattern is a widely adopted convention for naming application packages in Android and is used to ensure unique identification for each application on the Google Play Store. Return to File | Build Settings and click the Build and Run button. A new window will pop up, prompting you to create a new folder in your project’s directory. Name this folder Builds. Upon selecting this folder, Unity will construct the scene within the newly created Builds folder. This is how you can set up your Android device for deploying AR scenes onto it. In the next section, you will learn how you can deploy your AR scene onto an iOS device, such as an iPhone or iPad. Deploying onto iOS Before we delve into the process of deploying an AR scene onto an iOS device, it’s important to discuss certain hardware prerequisites. Regrettably, if you’re using a Windows PC and an iOS device, it’s not as straightforward as deploying an AR scene made in Unity. The reason for this is that Apple, in its characteristic style, requires the use of Xcode, its proprietary development environment, as an intermediary step. This is only available on Mac devices, not Windows or Linux. If you don’t possess a Mac, there are still ways to deploy your AR scene onto an iOS device. Here are a few alternatives: Borrowing a Mac: The simplest solution to gain access to Xcode and deploy your app onto an iOS device is to borrow a Mac from a friend or coworker. It’s also worth checking whether local libraries, universities, or co-working spaces offer public access to Macs. For commercial or academic projects, it’s highly recommended to invest in a Mac for testing your AR app on iOS. Using a virtual machine: Another no-cost alternative is to establish a macOS environment on your non-Apple PC. However, Apple neither endorses nor advises this method due to potential legal issues and stability concerns. Therefore, we won’t elaborate further or recommend it. Employing a Unity plugin: Fortunately, a widely used Unity plugin enables deployment of an AR scene onto your iOS device with relatively less hassle. Navigate to Windows | Asset Store, click on Search Online, and Unity Asset Store will open in your default browser. Search for iOS Project Builder for Windows by Pierre-Marie Baty. Though this plugin costs $50, it is a much cheaper alternative than buying a Mac. After purchasing the plugin, import it into your AR scene and configure everything correctly by following the plugin’s documentation (https://www.pmbaty.com/iosbuildenv/documentation/ unity.html). In this article, we focus exclusively on deploying AR applications onto iOS devices using a Mac for running Unity and Xcode. This is due to potential inconsistencies and maintenance concerns with other aforementioned methods. Before you initiate the deployment setup, ensure that your Mac and iOS devices have the necessary software and settings. The following steps detail this preparatory process: Make sure the latest software versions are installed on your MacOS and iOS devices. Check for updates by navigating to Settings | General | Software Update on each device and install any that are available. Confirm that your iOS device supports ARKit, which is crucial for the correct functioning of AR Foundation. You can check compatibility at https://developer.apple.com/ documentation/arkit/. Generally, any device running on iPadOS 11 or iOS 11 and later versions are compatible. You will need an Apple ID for the following steps. If you don’t have one, you can create it at https://appleid.apple.com/account. Download the Xcode software onto your Mac from Apple’s developer website at https:// developer.apple.com/xcode/. Enable Developer Mode on your iOS device by going to Settings | Privacy & Security | Developer Mode, activate Developer Mode, and then restart your device. If you don’t find the Developer Mode option, connect your iOS device to a Mac using a cable. Open Xcode, then navigate to Window | Devices and Simulator. If your device isn’t listed in the left pane, ensure you trust the computer on your device by acknowledging the prompt that appears after you connect your device to the Mac. Subsequently, you can enable Developer Mode on your iOS device. Having set up your Mac and iOS devices correctly, let’s now proceed with how to deploy your AR scene onto your iOS device. Each time you want to deploy your AR scene onto your iOS device, follow these steps: 1. Use a USB cable to connect your iOS device to your Mac. 2. Within your Unity project, navigate to File | Build Settings and select iOS from Platform options. Click the Switch Platform button. 3. Check the Development Build option in Build Settings for iOS. This enables you to deploy the app for testing purposes onto your iOS device. This step is crucial to avoid the annual subscription cost of an Apple Developer account. Note: Deploying apps onto an iOS device with a free Apple Developer account has certain limitations. You can only deploy up to three apps onto your device at once, and they need to be redeployed every 7 days due to the expiration of the free provisioning profile. For industrial or academic purposes, we recommend subscribing to a paid Developer account after thorough testing using the Development Build function. 4. Remain in File | Build Settings | iOS tab, click on Player Settings, scroll down to Bundle Identifier, and input an identifier in the form of com.company_name.application_name. 5. Return to File | Build Settings | iOS tab and click Build and Run. In the pop-up window, create a new folder in your project directory named Builds and select it. 6. Xcode will open with the build, displaying an error message due to the need for a signing certificate. To create this, click on the error message, navigate to the Signing and Capabilities tab, and select the checkbox. In the Team drop-down menu, select New Team, and create a new team consisting solely of yourself. Now, select this newly-created team from the drop-down menu. Ensure that the information in the Bundle Identifier field matches your Unity Project found in Edit | Project Settings | Player. 7. While in Xcode, click on the Any iOS Device menu and select your specific iOS device as the output. 8. Click the Play button on the top left of Xcode and wait for a message indicating Build succeeded. Your AR application should now be on your iOS device. However, you won’t be able to open it until you trust the developer (in this case, yourself). Navigate to Settings | General | VPN & Device Management on your iOS device, tap Developer App certificate under your Apple ID, and then tap Trust (Your Apple ID). 9. On your iOS device’s home screen, click the icon of your AR app. Grant the necessary permissions, such as camera access. Congratulations, you’ve successfully deployed your AR app onto your iOS device! You now know how to deploy your AR experiences onto both Android and iOS devices. ConclusionDeploying AR experiences onto mobile devices opens up a world of possibilities, enabling users to engage with your application in innovative ways. By following the steps outlined in this guide, you can ensure that your AR applications are compatible with both Android and iOS platforms, maximizing their reach and impact. Whether you’re developing for personal use or planning to distribute your app to a broader audience, having cross-platform compatibility from the start can save time and resources in the long run. With the tools and techniques provided here, you are well on your way to creating and deploying compelling AR experiences that captivate users on any mobile device. Looking to dive into the world of virtual, augmented, and mixed reality? XR Development with Unity is the perfect guide for beginners and professionals alike! This book takes you step-by-step through creating immersive experiences using Unity, without needing expensive VR hardware. Learn how to build interactive XR apps, explore hand-tracking, gaze-tracking, multiplayer capabilities, and more. Whether you're a game developer, hobbyist, or industry professional, this resource is a must-have to master cutting-edge XR technologies. Don't miss out—start your XR journey today! Author BioAnna Braun is a Unity expert, who is specialized in creating XR applications. At Deutsche Telekom, Anna has developed XR prototypes in Unity. One prototype enabled warehouse workers to find commodities more easily through the use of special location data and Augmented Reality. At Fraunhofer, Anna specialized in Hand-Tracking and worked on a VR education platform. Her master's degree in Extended Reality has a special focus on Eye Tracking, Deep Learning, and Computer Graphics. She is a published author in the tech space and regularly speaks at conferences hosted by academia or non-profits like the Mozilla Foundation. Anna co-founded a company that offers XR consulting and development.Raffael Rizzo is a XR developer and Unity expert. During his work at Deutsche Telekom, he consulted companies on the use of digital twins and implemented augmented reality wayfinding solutions. At Fraunhofer IGD, Raffael worked on a VR education platform. He developed a VR training program for a soccer academy to test the children's reaction times. For the same academy, Raffael created an application that uses computer vision and machine learning to automatically evaluate ball juggling. His master's degree in Extended Reality encompasses Rendering, Computer Vision, Machine Learning, and 3D Visualization. Raffael co-founded a company specializing in XR consulting and development. 
Read more
  • 0
  • 0
  • 526

article-image-how-we-are-thinking-about-generative-ai
Packt
18 Jul 2024
10 min read
Save for later

How we are Thinking About Generative AI

Packt
18 Jul 2024
10 min read
How we are Thinking About Generative AI for Developers and Tech LearningPackt is a global tech publisher serving developers and tech professionals (TechPros). Over the last 20 years, we have published over 8,000 books and videos, gaining deep insights into the evolving challenges tech professionals face. Recently, the rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape, affecting everyone from software developers to business strategists.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape. These changes affect everyone from software developers to business strategists. The tech industry is at a critical inflection point with technology use, development, and education. At Packt, we are actively exploring generative AI's impact on the industry and TechPros' daily work and learning. Here, we outline our thoughts on how GenAI reshapes professional activities and tech learning, and our strategic responses to it. We would love to hear your feedback on this document and your thoughts on the issues raised within it. Please do send any comments to: GenAI_feedback@packt.com. The Impact of GenAI on TechPro WorkThe rapid pace of advancement in Generative AI makes it difficult to predict, but we believe, on balance, that it is a force for good in software development. A core Packt value that we share with our TechPro users is a belief in and commitment to the power of technology for progress. Our default setting is to get on board with change.GenAI is already changing the nature of many development jobs, but it will not mean the end of software development. We are fundamentally optimistic about the future for TechPros powered by GenAI. It will mean more, faster, better work.This is how we at Packt see these changes: Increased Software ProductionHumanity continuously evolves, adapts, and advances, maintaining a need for more sophisticated software solutions – whether those are built on traditional software platforms or on top of AI models themselves. GenAI is already transforming the economics of supply by making engineers more productive and enabling more engineering tasks. The demand for more, better software will remain, leading to an increase in the number of professionals building, designing, adapting, and managing software. Shifts in Software DevelopmentMuch of what engineers spend time doing can be quite generic. GenAI is beginning to automate these middle-tier, routine activities, allowing developers to focus on higher-value, more creative tasks. This shift redistributes work in three dimensions from the center of the development stack. Work moves ‘up the stack’ into architecture, domain expertise, and design, ‘down the stack’ into complex algorithm development, infrastructure, and tooling, and outwards to the edges with specific integrations and implementations. To meet the increased demand for software, there will be significantly more designers and implementors at those development edges, with increasing business and domain focus and specialization. There will be a continuously hard-to-meet need for deep tech engineers building the tools and infrastructure that enable this automation to operate efficiently at scale and speed. This will be seen at the hardware and firmware level as well as operating systems, cloud platforms, and the models and algorithms that modern software is built upon. Increased Domain and Business SpecializationAs GenAI moves tasks from generic operations upwards and outwards to more specialized domains, engineers will increasingly make decisions that require greater judgment and domain expertise. This will lead to a greater focus on domain experience and knowledge, and a higher value on business relationships.GenAI also democratizes the development and management of systems, making these processes accessible to more users and transforming many jobs from direct task execution to overseeing AI agents that perform the work. This evolution could significantly expand the roles involving aspects of software design or delivery. Impact on Tech Pro LearningGenAI integrates automation and problem solving, leading to profound change in how TechPros learn and solve problems. We see the core changes as being:Shift Toward Just-In-Time (JIT) Continuous LearningDevelopers have always preferred to learn by doing—starting work and solving problems on the fly. GenAI makes this the only viable approach. The ROI of upfront Just-In-Case (JIC) learning, where developers research technologies that might be useful in future, declines when co-pilots can accelerate initial builds and troubleshoot during development. GenAI tools can escalate to rapid Just-in-Time [JIT] learning sprints to backfill knowledge gaps as they are discovered.GenAI tools can help engineers to rapidly understand and work on existing complex and often undocumented code bases, again backfilling knowledge gaps JIT. Entry Level Learning Moves to Simulated EnvironmentsThe JIT learning-by-doing model also applies to students and juniors, but the study work they do will be “as good as real.” Traditional, linear courseware will be replaced by personalized, hands-on projects in rich simulated environments. These environments provide shorter, contextual learning experiences that effectively bridge the gap between theory and practice, reducing the training load on increasingly busy senior developers. Growth in Demand for Real World Experience and Peer InteractionAs development increasingly moves up the stack and routine tasks are automated, there is a growing need for TechPros to understand specific real-world applications of systems and solutions. Highly specific, detailed, and objective case studies with high relevance to a specific problem area and technical solution will become increasingly valuable. Demand for discussion and interaction with experienced fellow professionals to share knowledge and insights will also grow. Such authentic content not only aids learning but also enhances the training of AI models. Authoritative and Expert Insight Remains KeyDespite the shift towards more automated and JIT learning approaches, a thorough understanding of core concepts remains crucial. Books will continue to be one of the most powerful and authoritative ways for technology originators to share their foundational knowledge. This will remain the key long-term use-case for tech books. Continuing Need for Creator Trust and AuthenticityGen AI enables the rapid creation of written work. In the tech publishing domain, we estimate that up to around 50% of titles in certain categories on Amazon might already be AI-generated or derived. This AI content meets certain user needs, and this proliferation will continue across store platforms. We believe that human-generated work fulfils a different user need and that there will always be value in authentic creator insight and expertise. We continue to build direct relationships with tech professionals and authors to create and publish this content. The Future is UncertainHow this evolves is hard to know. The pace of change both in the technology and in the landscape around it has surfaced issues with reliability, compliance, cost, and memory/reasoning limitations. GenAI technology is moving extremely fast but has serious technical challenges.  GenAI technology is moving extremely fast but has serious technical challenges.These issues will be resolved over time, but they limit the pace of actual deployment. A Cautious Approach to ChangeThe case for changing existing systems, practices, and organizational models should be approached with caution. Enterprises have a high bar for adopting core systems and the deployment phase will be long and require detailed work. Uncertainty in Computing PlatformsIt remains uncertain whether GenAI might evolve into the dominant general purpose computing platform or how it will evolve past the current transformer architecture. It may become a ubiquitous implementation layer for all services over time; we do not know. However, we share the view that this is a pivotal phase for technology and for humanity. A Mixed Economy of the Old and the NewWe see a long phase of a mixed economy of old methods and new GenAI tools. There will be pockets of rapid adoption of GenAI tooling, like we see in coding co-pilots and in application areas, such as customer service agents. However, with every deployment there will be a lot of “old style” engineering: problem solving, integrations, QA, optimization. The shifts to high level working will be gradual and not immediately noticeable. Friction in Human SystemsHuman systems inherently resist change. Individuals stick with working and learning systems with which they are comfortable. Teaching methods evolve slowly, and we see different generations working and learning in different ways. While a shift toward Just-In-Time (JIT) learning is underway, structured, long-form learning will continue to play a crucial role. Rapid Adoption Among DevelopersThe pace at which individual developers have adopted co-pilots and are using GenAI for problem solving is striking. We expect these trends of grassroots, individual adoption to continue and accelerate. How Packt is RespondingThe insights gained from talking with TechPros combined with our thinking about the impact of GenAI on TechPro work and learning has resulted in these strategic initiatives:Shift to the Edges of the Development Stack in PublishingWe are pioneering new approaches to developing and publishing real world practical case studies to answer the crucial questions: “What are people actually building with this right now?” and, “How are they actually doing it?”What are people actually building with this right now? How are they actually doing it?We will increase our focus on publishing specific, definitive, deep, technical books from the creators and builders of new technology to help TechPros broaden their skills across the development stack. We will continue to build the tech book canon in the era of GenAI.License for LLM Training ResponsiblyThe uniquely high-quality content tech authors create has immense value for LLM training. We want to support the evolution of this technology while developing model training as a potentially valuable new channel for published content.We want authors to get fair value and the recognition they are due, and we will pursue all agreements with partners in a pragmatic but principled way. Use GenAI to Enable a Step Change in Content Engineering and Derived WorksGenAI tools and automations can reduce the cost and effort of keeping a title up to date as technology evolves, and of creating a rich portfolio of derived works from the initial content. We call this BODE: Build Once, Deploy Everywhere.We are exploring exciting use-cases to increase the value of the original work, and its reach into new platforms, formats, languages, and versions. Build Packt Models and Explore JITWe have already delivered experimental AI agents fine-tuned on specific Packt titles. We are expanding this to topic, role, and whole-library models. We are exploring integration of the Packt corpus into co-pilots and tools to deliver workflow-embedded JIT knowledge and learning escalation. Build Professional MembershipsRecognizing the increased value of live interactions in a post-GenAI world, we are committed to enabling Tech Professionals to engage in high-quality, trustworthy interactions with peers working on similar roles and projects.Thoughts? Feedback?Please send any comments to:GenAI_feedback@packt.com
Read more
  • 3
  • 0
  • 793

article-image-microsoft-ais-skeleton-key-automl-with-autogluon-multion-ais-retrieve-api-narrative-bis-hybrid-ai-pythons-duck-typing-gibbs-diffusion
05 Jul 2024
13 min read
Save for later

Microsoft AI’s Skeleton Key, AutoML with AutoGluon, MultiOn AI's Retrieve API, Narrative BI’s Hybrid AI, Python's Duck Typing, Gibbs Diffusion

05 Jul 2024
13 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Happy Friday! Welcome to DataPro#101—Your Essential Data Science & ML Update! 🚀  This week, we’ve curated the latest techniques in data extraction, transforming unstructured data into structured formats, best practices for prompt engineering in NL2SQL, and much more. Consider this your all-in-one guide to staying informed in the ever-evolving world of data science and machine learning. Now, dive in and explore these exciting new ideas! ⚡ Tech Highlights: Stay Updated! Prompt Engineering with Claude 3: Learn hands-on techniques on Amazon Bedrock. Accelerated PyTorch: Boost models with torch.compile on AWS Graviton. BigQuery Data Canvas: Perfect your prompts. Skeleton Key AI: New AI jailbreak method. GraphRAG: Complex data discovery tool on GitHub. 📚 New from Packt Library Data Science for Web3 - Guide to blockchain data analysis and ML. 🔍 Latest in LLMs & GPTs NASA-IBM's INDUS Models: Advanced science LLMs. EvoAgent: Evolutionary multi-agent systems. Kyutai's Moshi: Real-time AI model. MultiOn AI's Retrieve API: Accurate web search. Gibbs Diffusion (GDiff): Bayesian image denoising. Narrative BI’s Hybrid AI: Business data analysis. WildGuard: Safe LLM interactions. ProgressGym: Ethical AI alignment. OmniParse: Structuring unstructured data for GenAI. ✨ What's Fresh Claude 3.5 Sonnet Use Cases: Future AI capabilities. Explainability in ML: Make models understandable. Group-By Aggregation: Powerful EDA tool. OpenAI and PandasAI: Series operations. AutoML with AutoGluon: ML in four lines of code. Python's Duck Typing: Flexible coding concept. 🔰 GitHub Finds: Add These Repos fal/AuraSR arcee-ai/Arcee-Spark-GGUF pprp/Pruner-Zero ruiyiw/patient-psi hrishioa/rakis ragapp/ragapp Doriandarko/claude-engineer hao-ai-lab/MuxServe DataPro Newsletter is not just a publication; it’s a complete toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 Data Science Tool Kit ➔ ️ fal/AuraSR: AuraSR, a GAN-based super-resolution model for upscaling images. Implemented in PyTorch, it's inspired by the GigaGAN paper, enhancing image quality significantly. ➔ arcee-ai/Arcee-Spark-GGUF: Arcee Spark, a 7B model from Qwen2, excels with fine-tuning and DPO, outperforming GPT-3.5 on tasks, ideal for efficient AI deployment. ➔ pprp/Pruner-Zero: Pruner-Zero automates symbolic pruning metric discovery for Large Language Models, surpassing current methods in language modeling and zero-shot tasks. ➔ ruiyiw/patient-psi: Patient-Ψ uses Large Language Models to simulate patient interactions for training mental health professionals, emphasizing cognitive modeling and practical deployment. ➔ hrishioa/rakis: Rakis is a browser-based permissionless AI inference network enabling decentralized consensus without servers, emphasizing open-source and educational use. ➔ ragapp/ragapp: RAGapp simplifies enterprise use of Agentic RAG models, configurable like OpenAI's custom GPTs, deployable via Docker on cloud infrastructure. ➔ Doriandarko/claude-engineer: Claude Engineer, powered by Anthropic's Claude-3.5-Sonnet, aids software development through an interactive CLI blending AI model capabilities with file operations and web search. ➔ hao-ai-lab/MuxServe: MuxServe efficiently serves multiple LLMs using spatial-temporal multiplexing, optimizing memory and computation resources based on LLM popularity and characteristics. 📚 Expert Insights from Packt CommunityData Science for Web3: A comprehensive guide to decoding blockchain data with data analysis basics and machine learning cases By Gabriela Castillo Areco Understanding the blockchain ingredients If you have a background in blockchain development, you may skip this section. Web3 represents a new generation of the World Wide Web that is based on decentralized databases, permissionless and trustless interactions, and native payments. This new concept of the internet opens up various business possibilities, some of which are still in their early stages. Currently, we are in the Web2 stage, where centralized companies store significant amounts of data sourced from our interactions with apps. The promise of Web3 is that we will interact with Decentralized Apps (dApps) that store only the relevant information on the blockchain, accessible to everyone. As of the time of writing, Web3 has some limitations recognized by the Ethereum organization: Velocity: The speed at which the blockchain is updated poses a scalability challenge. Multiple initiatives are being tested to try to solve this issue. Intuition: Interacting with Web3 is still difficult to understand. The logic and user experience are not as intuitive as in Web2 and a lot of education will be necessary before users can start utilizing it on a massive scale. Cost: Recording an entire business process on the chain is expensive. Having multiple smart contracts as part of a dApp costs a lot for the developer and the user. Blockchain technology is a foundational technology that underpins Web3. It is based on Distributed Ledger Technology (DLT), which stores information once it is cryptographically verified. Once reflected on the ledger, each transaction cannot be modified and multiple parties have a complete copy of it. Two structural characteristics of the technology are the following: It is structured as a set of blocks, where each block contains information (cryptographically hashed – we will learn more about this in this chapter) about the previous block, making it impossible to alter it at a later stage. Each block is chained to the previous one by this cryptographic sharing mechanism. It is decentralized. The copy of the entire ledger is distributed among several servers, which we will call nodes. Each node has a complete copy of the ledger and verifies consistency every time it adds a new block on top of the blockchain. This structure provides the solution to double spending, enabling for the first time the decentralized transfer of value through the internet. This is why Web3 is known as the internet of value. Since the complete version of the ledger is distributed among all the participants of the blockchain, any new transaction that contradicts previously stored information will not be successfully processed (there will be no consensus to add it). This characteristic facilitates transactions among parties that do not know each other without the need for an intermediary acting as a guarantor between them, which is why this technology is known as trustless. The decentralized storage also takes control away from each server and, thus, there is no sole authority with sufficient power to change any data point once the transaction is added to the blockchain. Since taking down one node will not affect the network, if a hacker wants to attack the database, they would require such high computing power that the attempt would be economically unfeasible. This adds a security level that centralized servers do not have. This excerpt is from the latest book, "Data Science for Web3: A comprehensive guide to decoding blockchain data with data analysis basics and machine learning cases” written by Gabriela Castillo Areco. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!   Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS  ➤ Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock. In this blog post, the focus is on crafting effective prompts for generative AI models to achieve desired outputs. It emphasizes the importance of well-constructed prompts in guiding models like Claude 3 Haiku on Amazon Bedrock to produce accurate and relevant responses, showcasing examples of prompt variations and their impact. ➤ Accelerated PyTorch inference with torch.compile on AWS Graviton processors. In this blog post, AWS optimized PyTorch's torch.compile feature for AWS Graviton3 processors, significantly enhancing performance for Hugging Face and TorchBench model inference compared to the default eager mode. These optimizations, available from PyTorch 2.3.1, aim to streamline model execution on Graviton3-based Amazon EC2 instances. Google➤ How to write prompts for BigQuery data canvas?  This blog post focuses on leveraging generative AI, specifically Gemini in BigQuery, to perform data tasks via natural language queries (NL2SQL and NL2Chart). It highlights how refining NL prompts can enhance query accuracy, promoting collaboration and efficiency among data professionals using BigQuery's data canvas tool. Microsoft➤ Microsoft AI Unveils Skeleton Key: A Novel Generative AI Jailbreak Method. This blog post discusses a newly discovered type of attack in generative AI called Skeleton Key, also known as Master Key. It explores how this attack bypasses AI guardrails, allowing models to generate unauthorized content, and outlines Microsoft's mitigation strategies using Prompt Shields in Azure AI. ➤ GraphRAG: New tool for complex data discovery now on GitHub. The update introduces GraphRAG, a graph-based approach to retrieval-augmented generation (RAG), now available on GitHub. It enhances information retrieval and response generation by automating knowledge graph extraction from text datasets, offering structured insights for global queries. An Azure-hosted API facilitates easy deployment without coding. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🔸 NASA-IBM Collaboration Develops INDUS Large Language Models for Advanced Science Research. The blog explores NASA's collaboration with IBM to develop INDUS, a suite of specialized language models (LLMs) tailored for scientific domains. INDUS enhances data analysis, retrieval, and curation across Earth science, heliophysics, and more, advancing research capabilities in diverse scientific disciplines. 🔸 EvoAgent: Expanding Expert Agents to Multi-Agent Systems with Evolutionary Algorithms. EvoAgent automates the extension of expert agents to multi-agent systems using evolutionary algorithms, applicable to any LLM-based agent framework. It enhances agent diversity and performance across tasks, exemplified in debates by generating varied opinions and improving content quality dynamically. 🔸 Kyutai Releases Moshi: A Real-Time AI Model that Understands and Speaks. Kyutai introduces Moshi, a real-time native multimodal foundation model surpassing GPT-4o functionalities. Moshi understands emotions, speaks with accents like French, and handles dual audio streams, enabled by joint pre-training on text and audio. It supports open-source transparency and runs efficiently on consumer hardware. 🔸 MultiOn AI's Retrieve API Boosts Web Search with Real-Time Accuracy for Advanced Applications. MultiOn AI has launched the Retrieve API, a cutting-edge tool for autonomous web information retrieval. It enhances data extraction from web pages with real-time processing, catering to diverse applications such as personalized shopping assistants, automated lead generation, and content creation tools, setting new standards in web data extraction technology. 🔸 Gibbs Diffusion (GDiff): A Bayesian Blind Denoising Method for Images and Cosmology. The study introduces Gibbs Diffusion (GDiff) as an innovative method for blind denoising with deep generative models. It enables simultaneous sampling of signal and noise parameters, improving Bayesian inference for scenarios like natural image denoising and cosmological data analysis, enhancing accuracy in noise characterization and signal recovery. 🔸 Narrative BI Introduces Hybrid AI Approach for Business Data Analysis: The research explores hybrid approaches in business data analysis, combining rule-based systems' precision with Large Language Models' (LLMs) pattern recognition. This integration aims to generate actionable insights from complex datasets, improving efficiency and accuracy in decision-making processes for businesses. 🔸 WildGuard: A Lightweight Moderation Tool for User Safety in LLM Interactions. The paper introduces WildGuard, an open and lightweight moderation tool for enhancing safety in Large Language Models (LLMs). It focuses on identifying malicious intent in user prompts, detecting safety risks in model responses, and evaluating model refusal rates. WildGuard achieves state-of-the-art performance across these tasks, addressing critical gaps in existing moderation tools.  🔸 ProgressGym: ML Framework for Ethical Alignment in Frontier AI. This research addresses the influence of AI systems, particularly large language models (LLMs), on human epistemology and societal values. It introduces progress alignment as a technical solution to prevent AI reinforcement of problematic moral beliefs. ProgressGym, an experimental framework, facilitates learning from historical data to advance real-world moral decision-making challenges. 🔸 OmniParse: AI Platform for Structuring Unstructured Data for GenAI Applications. OmniParse tackles the challenge of managing diverse unstructured data types—documents, images, audio, video, and web content—by converting them into structured formats optimized for AI applications. It integrates various tools like Surya OCR and Florence-2 for accurate data extraction, enhancing workflow efficiency and data usability across platforms. ✨ On the Radar: Catch Up on What's Fresh🔹 10 Use Cases of Claude 3.5 Sonnet: Unveiling the Future of Artificial Intelligence AI with Revolutionary Capabilities. Claude 3.5 Sonnet by Anthropic AI marks a leap forward in AI capabilities, showcasing versatility across diverse domains. It excels in generating n-body particle animations, interactive learning dashboards, escape room experiences, virtual psychiatry, interactive poster designs, educational visual demonstrations, customizable calendar applications, real-time object detection, financial tools, and advanced physics simulations. 🔹 Explainability, Interpretability and Observability in Machine Learning: The article explores the nuances of machine learning (ML) transparency through concepts like explainability, interpretability, and observability. It discusses their definitions, distinctions, and importance in fostering trust, accountability, and effective deployment of ML models across various industries and applications. 🔹 A Powerful EDA Tool: Group-By Aggregation. The article dives into Exploratory Data Analysis (EDA) techniques, focusing on group-by aggregation in Pandas. Using the Metro Interstate Traffic dataset as an example, it demonstrates how to derive insights such as monthly traffic progression, daily traffic profiles, hourly traffic patterns by weekday versus weekend, and identifying top weather conditions associated with congestion rates. 🔹 Using OpenAI and PandasAI for Series Operations: This article explores PandasAI, leveraging AI models like OpenAI to enhance Pandas data manipulation tasks. It covers querying Series values, creating new Series, conditional value setting, and reshaping data using natural language commands. Examples include summarizing statistics, conditional operations, and reshaping COVID-19 and NLS youth study datasets efficiently. 🔹 AutoML with AutoGluon: ML workflow with Just Four Lines of Code. The article explores AutoGluon, an automated machine-learning framework developed by Amazon Web Services (AWS). It discusses how AutoGluon simplifies the entire machine-learning process—from data preprocessing to model selection and hyperparameter tuning—making it accessible and efficient for users across various data types like tabular, text, and image data. 🔹 Understanding Python's Duck Typing: The article explores the concept of duck typing in Python, emphasizing behavior over type. It allows objects to be used based on their methods rather than explicit types, promoting flexibility and polymorphism. Duck typing simplifies code but requires careful handling to avoid runtime errors. See you next time!
Read more
  • 0
  • 0
  • 1226

article-image-vertex-ai-workbench-the-ultimate-tool-for-aiml-development
Jasmeet Bhatia
04 Jul 2024
15 min read
Save for later

Vertex AI Workbench: The Ultimate Tool for AI/ML Development

Jasmeet Bhatia
04 Jul 2024
15 min read
This article is an excerpt from the book, The Definitive Guide to Google Vertex AI, by Jasmeet Bhatia, Kartik Chaudhary. Accelerate your machine learning journey with Google Cloud Vertex AI and MLOps best practicesIntroductionMachine learning (ML) projects are complex in nature and require an entirely different type of development environment from normal software applications. When the data is huge, a data scientist may want to use several big data tools for quick wrangling or preprocessing needs, and a deep learning (DL) model might require several GPUs for fast training and experimentation. Additionally, dedicated compute resources are required for hosting models in production, and even more to scale them up to the enterprise level. Acquiring such resources and tools is quite costly, and even if we manage to buy and set things up, it takes a lot of effort and technical knowledge to bring them together into a project pipeline. Even after doing all that, there are risks of downtime and data security.Nowadays, cloud-based solutions are very popular and take care of all the technical hassle, scaling, and security aspects for us. These solutions let ML developers focus more on project development and experimentation without worrying about infrastructure and other low-level things. As an artificial intelligence (AI)-first company, Google brings all the important resources required for ML project development under one umbrella called Vertex AI. In this chapter, we will learn about Vertex AI Workbench, a managed solution for Jupyter Notebook kernels that can help us bring our ML projects from prototype to production many times faster.This chapter covers the following topics:What is Jupyter Notebooks?Vertex AI WorkbenchCustom containers for Vertex AI WorkbenchScheduling notebooks in Vertex AIWhat is Jupyter Notebook?Jupyter Notebook is an open source web-based application for writing and sharing live code, documentation, visualizations, and so on. Jupyter Notebooks are very popular among ML practitioners as they provide the flexibility to run code dynamically and collaborate, provide fast visualizations, and can also be used for presentations. Most data scientists and ML practitioners prefer Jupyter Notebook as their primary tool for exploring, visualizing, and preprocessing data using powerful Python libraries such as pandas and NumPy. Jupyter Notebooks are very useful for exploratory data analysis (EDA) as they let us run small code blocks dynamically and also draw quick plots to understand data statistically. Notebooks can also be used for doing quick ML modeling experiments. Another good thing about Jupyter Notebooks is that they let us write Markdown cells as well. Using Markdown, we can explain each code block inside the notebook and turn it into a tutorial. Jupyter Notebooks are popular among ML communities to share and collaborate on projects on platforms such as GitHub and Kaggle.Getting started with Jupyter NotebookThe Jupyter Notebook application can be installed in local systems using a simple pip command (shown next). For quick experiments, we can also utilize web-based notebook kernels such as Colab and Kaggle, where everything is already set and we can run the Python code directly. As these kernels are public, we can’t use them if our data is confidential, and we will have to install the Jupyter Notebook application on our system.We can install the Jupyter application on our local system by using the following pip command:$ pip install jupyterOnce the application is installed, it can be launched through the terminal by typing the following command, and it will automatically open the Jupyter application in a browser tab:$ jupyter notebookIf it doesn’t open the browser tab automatically, we can launch the application by typing the following URL: http://localhost:8888/tree. By default, the Jupyter server starts on port 8888, but if this port is unavailable, it finds the next available port. If we are interested in using a custom port, we can launch Jupyter by passing a custom port number.Here is a terminal command for launching the Jupyter application on custom port number 9999:$ jupyter notebook --port 9999Note: In some cases, the Jupyter server may ask for a token (maybe in the case of a non-default browser) when we try to hit the aforementioned URL manually. In such cases, we can copy the URL from the terminal output that provides the token within the URL. Alternatively, we can obtain a token by running the jupyter notebook list command in the terminal.Once we are able to launch the application server in a browser, the Jupyter server looks something like this:Figure 4.1 – Jupyter application server UINow, we can launch a Jupyter Notebook instance by clicking on the New button. It creates a new notebook and saves it in the same directory where we started the Jupyter Notebook from the terminal. We can now open that notebook in a new tab and start running scripts. The following screenshot shows an empty notebook:Figure 4.2 – A Jupyter Notebook instanceAs we can see in the previous screenshot, the web UI provides multiple options to manipulate notebooks, code, cells, kernels, and so on. A notebook cell can execute code or can be converted into a Markdown cell by changing its type from the drop-down menu. There are also options for exporting notebooks into different formats such as HTML, PDF, Markdown, LaTeX, and so on for creating reports or presentations. Going further in the book, we will be working with notebooks a lot for data wrangling, modeling, and so on.Now that we have some basic understanding of Jupyter Notebooks in general, let’s see how Vertex AI Workbench provides a more enriched experience of working with a Jupyter Notebook-based environment.Vertex AI WorkbenchWhile working on an ML project, if we are running a Jupyter Notebook in a local environment, or using a web-based Colab- or Kaggle-like kernel, we can perform some quick experiments and get some initial accuracy or results from ML algorithms very fast. But we hit a wall when it comes to performing large-scale experiments, launching long-running jobs, hosting a model, and also in the case of model monitoring. Additionally, if the data related to a project requires some more granular permissions on security and privacy (fine-grained control over who can view/access the data), it’s not feasible in local or Colab-like environments. All these challenges can be solved just by moving to the cloud. Vertex AI Workbench within Google Cloud is a JupyterLab-based environment that can be leveraged for all kinds of development needs of a typical data science project. The JupyterLab environment is very similar to the Jupyter Notebook environment, and thus we will be using these terms interchangeably throughout the book.Vertex AI Workbench has options for creating managed notebook instances as well as user-managed notebook instances. User-managed notebook instances give more control to the user, while managed notebooks come with some key extra features. We will discuss more about these later in this section. Some key features of the Vertex AI Workbench notebook suite include the following:Fully managed–Vertex AI Workbench provides a Jupyter Notebook-based fully managed environment that provides enterprise-level scale without managing infrastructure, security, and user-management capabilities.Interactive experience–Data exploration and model experiments are easier as managed notebooks can easily interact with other Google Cloud services such as storage systems, big data solutions, and so on.Prototype to production AI–Vertex AI notebooks can easily interact with other Vertex AI tools and Google Cloud services and thus provide an environment to run end-to-end ML projects from development to deployment with minimal transition.Multi-kernel support–Workbench provides multi-kernel support in a single managed notebook instance including kernels for tools such as TensorFlow, PyTorch, Spark, and R. Each of these kernels comes with pre-installed useful ML libraries and lets us install additional libraries as required.Scheduling notebooks–Vertex AI Workbench lets us schedule notebook runs on an ad hoc and recurring basis. This functionality is quite useful in setting up and running large-scale experiments quickly. This feature is available through managed notebook instances. More information will be provided on this in the coming sections.With this background, we can now start working with Jupyter Notebooks on Vertex AI Workbench. The next section provides basic guidelines for getting started with notebooks on Vertex AI.Getting started with Vertex AI WorkbenchGo to the Google Cloud console and open Vertex AI from the products menu on the left pane or by using the search bar on the top. Inside Vertex AI, click on Workbench, and it will open a page very similar to the one shown in Figure 4.3. More information on this is available in the officialdocumentation (https://cloud.google.com/vertex-ai/docs/workbench/ introduction).Figure 4.3 – Vertex AI Workbench UI within the Google Cloud consoleAs we can see, Vertex AI Workbench is basically Jupyter Notebook as a service with the flexibility of working with managed as well as user-managed notebooks. User-managed notebooks are suitable for use cases where we need a more customized environment with relatively higher control. Another good thing about user-managed notebooks is that we can choose a suitable Docker container based on our development needs; these notebooks also let us change the type/size of the instance later on with a restart.To choose the best Jupyter Notebook option for a particular project, it’s important to know about the common differences between the two solutions. Table 4.1 describes some common differences between fully managed and user-managed notebooks:Vertex AI-managed notebooksVertex AI user-managed notebooksGoogle-managed environment with integrations and features that provide us with an end-toend notebook-based production environment without setting up anything by hand.Heavily customizable VM instances (with prebuilt DL images) that are ideal for users who need a lot of control over the environment.Scaling up and down (for vCPUs and RAM) can be performed from within the notebook itself without needing to restart the environment.Changing the size/memory of an instance requires stopping the instance in the Workbench UI and restarting it every time.Managed notebooks let us browse data from Google Cloud Storage (GCS) and BigQuery without leaving the Jupyter environment (with GCS and BigQuery integrations).UI-level data browsing is not supported in usermanaged notebooks. However, we can read the data using Python in a notebook cell and view it.Automated notebook runs are supported with one-time and recurring schedules. The executor runs scheduled tasks and saves results even when an instance is in a shutdown state.Automated runs are not yet supported in a user-managed environment.Less control over networking and security.Option to implement desired networking and security features and VPC service controls on a per-need basis.Not much control for a DL-based environment while setting up notebooks.User-managed instances provide multiple DL VM options to choose from during notebook creation.Table 4.1 – Differences between managed and user-managed notebook instances Let’s create one user-managed notebook to check the available options:Figure 4.4 – Jupyter Notebook kernel configurationsAs we can see in the preceding screenshot, user-managed notebook instances come with several customized image options to choose from. Along with the support of tools such as TensorFlow Enterprise, PyTorch, JAX, and so on, it also lets us decide whether we want to work with GPUs (which can be changed later, of course, as per needs). These customized images come with all useful libraries pre-installed for the desired framework, plus provide the flexibility to install any third-party packages within the instance.After choosing the appropriate image, we get more options to customize things such as notebook name, notebook region, operating system, environment, machine types, accelerators, and so on (see the following screenshot):Figure 4.5 – Configuring a new user-managed Jupyter NotebookOnce we click on the CREATE button, it can take a couple of minutes to create a notebook instance. Once it is ready, we can launch the Jupyter instance in a browser tab using the link provided inside Workbench (see Figure 4.6). We also get the option to stop the notebook for some time when we are not using it (to reduce cost):Figure 4.6 – A running Jupyter Notebook instanceThis Jupyter instance can be accessed by all team members having access to Workbench, which helps in collaborating and sharing progress with other teammates. Once we click on OPEN JUPYTERLAB, it opens a familiar Jupyter environment in a new tab (see Figure 4.7):Figure 4.7 – A user-managed JupyterLab instance in Vertex AI Workbench A Google-managed JupyterLab instance also looks very similar (see Figure 4.8):Figure 4.8 – A Google-managed JupyterLab instance in Vertex AI WorkbenchNow that we can access the notebook instance in the browser, we can launch a new Jupyter Notebook or terminal and get started on the project. After providing sufficient permissions to the service account, many useful Google Cloud services such as BigQuery, GCS, Dataflow, and so on can be accessed from the Jupyter Notebook itself using SDKs. This makes Vertex AI Workbench a one-stop tool for every ML development need.Note: We should stop Vertex AI Workbench instances when we are not using them or don’t plan to use them for a long period of time. This will help prevent us from incurring costs from running them unnecessarily for a long period of time.In the next sections, we will learn how to create notebooks using custom containers and how to schedule notebooks with Vertex AI Workbench.Custom containers for Vertex AI WorkbenchVertex AI Workbench gives us the flexibility of creating notebook instances based on a custom container as well. The main advantage of a custom container-based notebook is that it lets us customize the notebook environment based on our specific needs. Suppose we want to work with a new TensorFlow version (or any other library) that is currently not available as a predefined kernel. We can create a custom Docker container with the required version and launch a Workbench instance using this container. Custom containers are supported by both managed and user-managed notebooks.Here is how to launch a user-managed notebook instance using a custom container:1. The first step is to create a custom container based on the requirements. Most of the time, a derivative container (a container based on an existing DL container image) would be easy to set up. See the following example Dockerfile; here, we are first pulling an existing TensorFlow GPU image and then installing a new TensorFlow version from the source:FROM gcr.io/deeplearning-platform-release/tf-gpu:latest RUN pip install -y tensorflow2.      Next, build and push the container image to Container Registry, such that it should be accessible to the Google Compute Engine (GCE) service account. See the following source to build and push the container image:export PROJECT=$(gcloud config list project --format "value(core.project)") docker build . -f Dockerfile.example -t "gcr.io/${PROJECT}/ tf-custom:latest" docker push "gcr.io/${PROJECT}/tf-custom:latest"Note that the service account should be provided with sufficient permissions to build and push the image to the container registry, and the respective APIs should be enabled.3. Go to the User-managed notebooks page, click on the New Notebook button, and then select Customize. Provide a notebook name and select an appropriate Region and Zone value.4. In the Environment field, select Custom Container.5. In the Docker Container Image field, enter the address of the custom image; in our case, it would look like this:gcr.io/${PROJECT}/tf-custom:latest6.      Make the remaining appropriate selections and click the Create button.We are all set now. While launching the notebook, we can select the custom container as a kernel and start working on the custom environment.We can now successfully launch Vertex AI notebooks and also create custom container-based environments if required. In the next section, we will learn how to schedule notebook runs within Vertex AI.Scheduling notebooks in Vertex AIJupyter Notebook environments are great for doing some initial experiments. But when it comes to launching long-running jobs, multiple training trials with different input parameters (such as hyperparameter tuning jobs), or adding accelerators to training jobs, we usually copy our code into a Python file and launch experiments using custom Docker containers or managed pipelines such as Vertex AI pipelines. Considering this situation and to minimize the duplication of efforts, Vertex AI-managed notebook instances provide us with the functionality of scheduling notebooks on an ad hoc or recurring basis. This feature allows us to execute our scheduled notebook cell by cell on Vertex AI. It provides us with the flexibility to seamlessly scale our processing power and choose suitable hardware for the task. Additionally, we can pass different input parameters for experimentation purposes.Configuring notebook executionsLet’s try to configure notebook executions to check the various options it provides. Imagine we are building a toy application that takes two parameters–user_name and frequency–and when executed, it prints the user_name parameter as many times as the frequency parameter. Now, let’s launch a managed notebook and create our application, as follows:Figure 4.9 – A simple Python application within Jupyter NotebookNext, put all the parameters into a single cell and click on the gear-like button at the top-right corner.Assign this cell with tag parameters. See the following screenshot:Figure 4.10 – Tagging parameters within a Jupyter Notebook cellOur toy application is now ready. Once you click on the Execute button from the toolbar, it provides us with the options for customizing machine type, accelerators, environment (which can be a custom Docker container), and execution type–one-time or recurring. See the following screenshot:Figure 4.11 – Configuring notebook execution for Python applicationNext, let’s change the parameters for our one-time execution by clicking on the ADVANCED OPTIONS Here, we can provide key-value pairs for parameter names and values. Check the following screenshot:Figure 4.12 – Setting up parameters for one-time executionFinally, click the SUBMIT button. It will then display the following dialog box:Figure 4.13 – One-time execution scheduledWe have now successfully scheduled our notebook run with custom parameters on Vertex AI. We can find it under the EXECUTIONS section in the Vertex AI UI:Figure 4.14 – Checking the EXECUTIONS section for executed notebook instancesWe can now check the results by clicking on VIEW RESULT. Check the following screenshot for how it overrides the input parameters:Figure 4.15 – Checking the results of the executionSimilarly, we can schedule large one-time or recurring experiments without moving our code out of the notebook and take advantage of the cloud platform’s scalability.We just saw how easy it is to configure and schedule notebook runs within Vertex AI Workbench. This capability allows us to do seamless experiments while keeping our code in the notebook. This is also helpful in setting up recurring jobs in the development environment.ConclusionIn this chapter, we learned about Vertex AI Workbench, a managed platform for launching the Jupyter Notebook application on Google Cloud. We talked about the benefits of having notebooks in a cloud-based environment as compared to a local environment. Having Jupyter Notebook in the cloud makes it perfect for collaboration, scaling, adding security, and launching long-running jobs. We also discussed additional features of Vertex AI Workbench that are pretty useful while working on different aspects of ML project development.After reading this chapter, we should be able to successfully deploy, manage, and use Jupyter Notebooks on the Vertex AI platform for our ML development needs. As we understand the difference between managed and user-managed notebook instances, we should be in good shape to choose the best solution for our development needs. We should also be able to create custom Docker containerbased notebooks if required. Most importantly, we should now be able to schedule notebook runs for recurring as well as one-time execution based on the requirements. Notebook scheduling is also quite useful for launching multiple model training experiments in parallel with different input parameters. Now that we have a good background in Vertex AI Workbench, it will be easier for us to follow the code samples in the upcoming chapters.Author BioKartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more
  • 0
  • 0
  • 437
article-image-creating-and-using-kibana-dashboards
Huage Chen
02 Jul 2024
12 min read
Save for later

Creating and Using Kibana Dashboards

Huage Chen
02 Jul 2024
12 min read
This article is an excerpt from the book, Elastic Stack 8.x Cookbook, by Huage Chen and Yazid Akadiri. Unlock the full potential of Elastic Stack for search, analytics, security, and observability and manage substantial data workloads in both on-premise and cloud environmentsIntroductionIn this guide, we will integrate all previously created visualizations into a comprehensive dashboard consisting of multiple panels. Additionally, we will explore how to enhance user interaction using control-based drilldowns.Getting readyMake sure to complete the following recipes from this chapter:Creating visualizations with Kibana LensCreating visualizations from runtime fieldsCreating Kibana mapsAt the end of this recipe, you will have dashboards composed of the various visualizations and elements built into the aforementioned recipes.How to do it...Building dashboards is very straightforward in Kibana, especially if you’ve already created some visualizations. Follow these steps:1. Go to Kibana | Analytics | Dashboard and click on Create dashboard.This will bring you to a blank canvas, where you can start adding some visualizations.2. We will start by adding a nice image! You can be creative, but we provided a sample picture:A. Click on Add panel | Image. B. Select the Use link tab and set Link to image with the following URL: https://upload. wikimedia.org/wikipedia/commons/6/60/Ville_de_RENNES_Noir. svg. Then, click on Save:Figure 6.54 – Adding an image for a logoThe logo will be added to the panel. Including a picture is a great way to add some personalization and branding to your dashboards. Let’s add some proper visualizations from the ones we’ve built in the last three recipes.3. Click on Add from library and select the [Rennes Traffic] Number of locations visualization. Make sure to align it to the right with the image panel.4. Let’s add another visualization; this time, we’ll pick [Rennes Traffic] Average speed gauge.At this stage, your dashboard should look like the one shown in Figure 6.55:Figure 6.55 – Rennes traffic dashboard – first stepYou can easily rearrange the position of the different panels by clicking on the title section and moving the panel with your mouse anywhere you want on the canvas. To adjust the size and fit of the panel, position your mouse on the small arrow at the bottom right of the panel. Let’s keep adding more panels to our dashboard.5. Click on Add from library and add the following visualizations in the respective order:I. [Rennes Traffic] Traffic status waffleII. [Rennes Traffic] Speed by road hierarchyIII. [Rennes Traffic] Average speed & Traffic StatusIV. [Rennes Traffic] Traffic status by hour6. Finally, let’s add a Map visualization for a real-time view of the traffic; select the one named [Rennes Traffic] Traffic fluidity.By now, your dashboard should look like the one shown in Figure 6.56:Figure 6.56 – Rennes traffic dashboard – more visualizationsYou can start playing around with the dashboard to see the built-in interactivity of the panels. For example, clicking on a specific road hierarchy will automatically apply the filter to the entire dashboard.You can also have dedicated panels to filter and display only the data you are interested in with Controls. Let’s add some to our dashboard.7. On the dashboard toolbar, click on Controls:Figure 6.57 – Adding controls to the dashboard8. From the drop-down list, select Add control; the Create control flyout will appear on the right of the screen.9. Select the traffic_status field and click on Save and close.10. Back to the dashboard, you now have a new panel on top of the visualization named traffic_status. By clicking on it, you will see a drop-down list where you can select the values associated with the status of the traffic you want to filter, as shown in Figure 6.58. Select congested as an example:Figure 6.58 – Using controls in the dashboard11. You can see on your dashboard that all the panels have been updated according to the value selected in the traffic_status control.Imagine you want to filter your traffic data to analyze it within a specific time range, such as early in the morning or late in the afternoon, to better understand traffic patterns. This is where the time slider control proves to be incredibly useful.12. Go to the Controls menu again in the dashboard toolbar and select Add time slider control.You’ll see a new panel to the right of traffic_status:Figure 6.59 – Time slider controlBy clicking the play icon, you will see your dashboard animate and your data change over the defined time range. You can advance the time range forward as well as backward, which is especially useful when working with time series data.Your dashboard should now look as shown in Figure 6.60, with our two controls:                                                                                     Figure 6.60 – Rennes traffic dashboard with controls13. Save the dashboard by clicking the Save button in the upper-right corner. Name it [Rennes Traffic] Overview.To enhance our dashboard further, consider this: users frequently manage multiple dashboards, and the ability to navigate seamlessly from one to another is crucial, especially when aiming to refine analysis or focus on more detailed panels related to a specific dataset. Dashboard drilldowns are invaluable in this scenario as they allow you to transition between dashboards while maintaining the overall context. Let’s explore how to implement and use this feature effectively!For this exercise, we have already built a drilldown dashboard. Download and save the NDJSON file of the exported dashboard from the following location: https://github.com/PacktPublishing/ Elastic-Stack-8.x-Cookbook/blob/main/Chapter6/kibana-objects/rennesdata-drilldown-dashboard.ndjson. Then, follow these steps:1. To import the dashboard, go to Stack Management | Saved Objects.2. Click on Import and select the NDJSON file you have previously downloaded from the GitHub repository. Upon completing the import process, you will notice a warning in the flyout about data view conflicts. The reason is straightforward: our saved objects rely on an existing data view. To resolve the conflict, simply click on the drop-down list under the New data view column and select metrics-rennes_traffic-raw, as shown in Figure 6.61, then click on Confirm all changes to finalize the import procedure:Figure 6.61 – Importing saved objects and selecting the right data view3. Once all the objects have been imported, you will get a recap as shown in the following screenshot:                                                                                      Figure 6.62 – Saved objects successfully imported from the fileReturn to the [Rennes Traffic] Overview dashboard. Then, open the menu for the [Rennes Traffic] Speed by road hierarchie panel and select Create drilldown:Figure 6.63 – Creating drilldown from the panel4. Navigate to the drilldowns page and select the Go to Dashboard option. Here, you will need to name your drilldown—consider View Details for Road Hierarchy as a suggestion. Then, from the Choose destination dashboard drop-down menu, select [Rennes Traffic] Detailed traffic drilldown dashboard, which you have recently imported. This process sets up a targeted navigation path within your dashboard environment, allowing for a seamless transition between your overview and detailed analysis dashboards:Figure 6.64 – Configuring dashboard drilldown5. Click on Create drilldown. Save the dashboard to test our drilldown, click on one of the five charts in the [Rennes Traffic] Speed by road hierarchie panel. You will be redirected to the detailed dashboard filtered on the value you have selected.Figure 6.65 – Dashboard view after drilldownEt voilà! You have just built your first dashboard with a nice touch of interactivity thanks to controls and drilldowns.How it works...In Kibana, a dashboard is a collection of visualizations and saved searches that you can arrange and customize to display the data that is most important to you. You can create multiple dashboards for different use cases, and each dashboard can have its own set of visualizations and searches.Dashboards are a powerful tool for data analysis because they allow you to see multiple visualizations side by side and quickly identify patterns and trends in your data. You can also use dashboards to monitor key metrics in real time, which is especially useful for operational use cases. Kibana provides a wide range of visualization types that you can use to create custom dashboards, including bar charts, line charts, pie charts, tables, and more.The following table outlines a framework for choosing the right visualization:Use caseRecommended type of visualizationComparison and correlationMany items: Horizontal barFew items: Vertical barComparison over timeFew periods and categories: Stacked barFew time periods but many categories: Line graphDistribution of valuesFew numbers of points: Vertical bar histogramMany points: Line histogramComposition of a wholeSimple compositions with few items: Waffle or TreemapMultiple grouping dimensions for a few bottomlevel items: MosaicMultiple grouping dimensions for many bottomlevel items: TreemapEye-catching summaryOne value: MetricMany values: Table with color stylingVisualizing goals or targetsVertical bar or Line with reference linesMetricTable 6.2 – Choosing the right visualizationIn addition to visualizations, Kibana dashboards also support saved searches, which allow you to quickly filter your data based on specific criteria. You can save searches that you use frequently and add them to your dashboard for easy access.Overall, Kibana dashboards are a powerful tool for data analysis and monitoring. They allow you to quickly identify patterns and trends in your data, monitor key metrics in real time, and customize your view of the data to suit your needs.There’s more...In our recipe, we have used dashboard drilldowns, but you can also create URL and Discover drilldowns. With the former, you can link to data outside of Kibana, and with the latter, you can open Discover from a Lens panel while keeping all the contextual information.Dashboards are great when used in Kibana, but you can also share them with teams and colleagues outside of Kibana. You have many options that are easily accessible from the Share menu in the toolbar when it comes to sharing dashboards: you can interactively embed dashboards as an iFrame, export them as reports in various formats (PNG, CSV, PDF, etc.), and share them as direct links for easy access.When building dashboards, design thinking is a good practice. Start by asking yourself the following questions:What is the outcome or the goal of the dashboard? Is it about understanding high-level behaviors, visually correlating specific metrics at the same time, or finding the root cause of an issue?Who is using this dashboard to do their job? If you are building it for a team or someone else, step into their shoes to visualize their perspective when they will need that data.See alsoLooking for more design tips to elevate your dashboards? Look no further and check out this blog: https://www.elastic.co/blog/designing-intuitive-kibanadashboards-as-a-non-designerIf you’re interested in delving deeper into the topics of creating dashboards more efficiently, be sure to check out this technical blog: https://www.elastic.co/blog/buildingkibana-dashboards-more-efficientlyFor developers interested in debugging their Kibana dashboard, the following article will be very useful: https://www.elastic.co/blog/debugging-kibana-dashboardsConclusionIn this guide, we've explored the process of integrating various visualizations into a comprehensive Kibana dashboard, enhancing user interaction through control-based drilldowns. By following the steps outlined, you should now have a functional and interactive dashboard that can provide valuable insights into your data.We began by preparing the necessary visualizations and then moved on to assembling the dashboard by adding images for personalization and aligning various traffic visualizations. We also incorporated control panels for dynamic filtering, allowing for more precise data analysis. The final touch was adding drilldowns to enable seamless navigation between detailed and overview dashboards.Kibana dashboards offer powerful tools for data analysis and real-time monitoring. By displaying multiple visualizations side by side, you can quickly identify patterns and trends, making dashboards invaluable for operational and analytical use cases.Remember, the key to a successful dashboard is thoughtful design—consider the goals, the audience, and the specific data insights needed. Utilize the wide range of visualization types that Kibana offers and don't hesitate to leverage the sharing options to collaborate with your team effectively.For further reading and advanced tips on designing intuitive dashboards, building them efficiently, or debugging, check out the additional resources provided. Happy dashboarding!Author BioHuage Chen is a member of Elastic's customer engineering team and has been with Elastic for over five years, helping users throughout Europe to innovate and implement cloud-based solutions for search, data analysis, observability, and security. Before joining Elastic, he worked for 10 years in web content management, web portals, and digital experience platforms.Yazid Akadiri has been a solutions architect at Elastic for over four years, helping organizations and users solve their data and most critical business issues by harnessing the power of the Elastic Stack. At Elastic, he works with a broad range of customers, with a particular focus on Elastic observability and security solutions. He previously worked in web services-oriented architecture, focusing on API management and helping organizations build modern applications.
Read more
  • 0
  • 0
  • 409

article-image-hands-on-exploratory-data-analysis-with-duckdb
Ned Letcher
28 Jun 2024
7 min read
Save for later

Hands-On Exploratory Data Analysis with DuckDB

Ned Letcher
28 Jun 2024
7 min read
This article is an excerpt from the book, Getting Started with DuckDB, by Simon Aubury and Ned Letcher. Discover how Snowflake's unique objects and features can be used to leverage universal modeling techniques through real-world examples and SQL recipes.Introduction DuckDB is a versatile and highly optimized database management system designed for efficient data analysis workflows. Its capabilities allow practitioners to scale their data analysis efforts beyond traditional tools, making it an excellent choice for local machine data processing. In this excerpt, we will explore how to use DuckDB for hands-on exploratory data analysis, leveraging Python, Jupyter Notebooks, and Plotly for interactive data visualizations.Technical RequirementsTo follow along with the examples in this guide, you will need the following setup:Python environmentJupyter NotebookDuckDB installedJupySQL libraryPlotly libraryYou can find the necessary code examples in the chapter_11 folder in the book’s GitHub repository at [PacktPublishing](https://github.com/PacktPublishing/Getting-Started-with-DuckDB/tree/main/chapter_11).Obtaining the Dataset We will be using a pedestrian counting system dataset from the city of Melbourne, containing hourly pedestrian counts from sensors located in and around the Melbourne Central Business District (CBD). This dataset provides a comprehensive view of pedestrian traffic patterns over several years.To download the dataset, visit the dataset’s home page [Melbourne Pedestrian Counting System](https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour) and locate the ZIP file containing the 2009 to 2022 archive.Setting Up the Environment Before diving into the code, ensure your Python environment is set up with the necessary dependencies. You will need to: 1. Set up a Python virtual environment:python -m venv duckdb_env source duckdb_env/bin/activate 2. Install the required libraries:   pip install jupyter duckdb plotly jupysql pandas  3. Start Jupyter Notebook: jupyter notebook Loading and Cleaning DataFirst, we will load our dataset from a CSV file and perform some data cleaning steps before writing it to a DuckDB database.Loading CSV Data into DuckDBimport duckdb import pandas as pd # Load the dataset into a pandas DataFrame data_url = "path_to_downloaded_zip_file/2022/2022.csv" pedestrian_counts = pd.read_csv(data_url) # Display the first few rows of the dataframe print(pedestrian_counts.head()) # Create a DuckDB connection and write the DataFrame to a DuckDB table con = duckdb.connect(database=':memory:') con.execute("CREATE TABLE pedestrian_counts AS SELECT * FROM pedestrian_counts") ```Data Cleaning StepsPerform necessary data cleaning operations such as handling missing values, correcting data types, and filtering irrelevant records.# Convert the 'Date_Time' column to datetime format pedestrian_counts['Date_Time'] = pd.to_datetime(pedestrian_counts['Date_Time']) # Handle missing values by filling them with 0 pedestrian_counts = pedestrian_counts.fillna(0) # Write the cleaned data to DuckDB con.execute("DROP TABLE pedestrian_counts") con.execute("CREATE TABLE pedestrian_counts AS SELECT * FROM pedestrian_counts") # Verify the cleaned data result = con.execute("SELECT * FROM pedestrian_counts LIMIT 5").fetchdf() print(result)Using JupySQL for SQL QueriesJupySQL is a powerful library that allows you to run SQL queries directly in Jupyter Notebooks. This makes it easy to interact with your DuckDB database without switching contexts. #### Example JupySQL Query%load_ext sql %sql duckdb:///:memory: # Query to view the first few rows of the dataset %%sql SELECT * FROM pedestrian_counts LIMIT 5;Visualizing Data with Plotly Plotly is a versatile data visualization library that integrates well with Jupyter Notebooks. We will use it to create interactive visualizations of our dataset.Total Pedestrian Counts Over Timeimport plotly.express as px # Aggregate pedestrian counts by year yearly_counts = con.execute("""    SELECT strftime('%Y', Date_Time) AS Year, SUM(Counts) AS Total_Counts    FROM pedestrian_counts    GROUP BY Year    ORDER BY Year """).fetchdf() # Create a bar chart fig = px.bar(yearly_counts, x='Year', y='Total_Counts', title='Total Pedestrian Counts by Year') fig.show()Monthly Traffic Counts# Aggregate pedestrian counts by month for the years 2019 and 2020 monthly_counts = con.execute("""    SELECT strftime('%Y-%m', Date_Time) AS Month, SUM(Counts) AS Monthly_Counts    FROM pedestrian_counts    WHERE strftime('%Y', Date_Time) IN ('2019', '2020')    GROUP BY Month    ORDER BY Month """).fetchdf() # Create a line chart to compare the two years fig = px.line(monthly_counts, x='Month', y='Monthly_Counts', title='Monthly Pedestrian Counts for 2019 and 2020') fig.show()Hourly Traffic Patterns# Aggregate pedestrian counts by hour of the day hourly_counts = con.execute("""    SELECT strftime('%H', Date_Time) AS Hour, AVG(Counts) AS Average_Counts    FROM pedestrian_counts    GROUP BY Hour    ORDER BY Hour """).fetchdf() # Create a line chart for hourly patterns fig = px.line(hourly_counts, x='Hour', y='Average_Counts', title='Average Hourly Pedestrian Counts') fig.show()Exploratory Data Analysis With our dataset loaded and visualized, we can perform a more detailed exploratory data analysis.Comparing Traffic on Weekdays vs. Weekends# Add a column for day of the week pedestrian_counts['Day_of_Week'] = pedestrian_counts['Date_Time'].dt.day_name() # Aggregate pedestrian counts by day of the week daily_counts = con.execute("""    SELECT Day_of_Week, AVG(Counts) AS Average_Counts    FROM pedestrian_counts    GROUP BY Day_of_Week    ORDER BY FIELD(Day_of_Week, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday') """).fetchdf() # Create a bar chart for daily patterns fig = px.bar(daily_counts, x='Day_of_Week', y='Average_Counts', title='Average Pedestrian Counts by Day of the Week') fig.show()Peak Hours of Pedestrian Traffic# Identify peak hours by finding the hours with the highest average counts peak_hours = con.execute("""    SELECT strftime('%H', Date_Time) AS Hour, AVG(Counts) AS Average_Counts    FROM pedestrian_counts    GROUP BY Hour    ORDER BY Average_Counts DESC    LIMIT 5 """).fetchdf() # Create a bar chart for peak hours fig = px.bar(peak_hours, x='Hour', y='Average_Counts', title='Peak Hours of Pedestrian Traffic') fig.show()ConclusionDuckDB, combined with JupySQL and Plotly, provides a robust framework for performing hands-on exploratory data analysis. By leveraging DuckDB’s high-performance SQL capabilities and integrating with powerful visualization tools, you can efficiently uncover insights from your data. We encourage you to further explore DuckDB’s features and apply these techniques to your datasets.For a deeper dive into DuckDB's powerful data analysis capabilities and to explore more advanced topics, we highly recommend reading the book 'Getting Started with DuckDB' by Simon Aubury and Ned Letcher."Author BioSimon Aubury has been working in the IT industry since 2000 as a data engineering specialist. He has an extensive background in building large, flexible, highly available distributed data systems. Simon has delivered critical data systems for finance, transport, healthcare, insurance, and telecommunications clients in Australia, Europe, and Asia Pacific. In 2019, Simon joined ThoughtWorks as a principal data engineer and today is associate director of data platforms at Simple Machines in Sydney, Australia. Simon is active in the data community, a regular conference speaker, and the organizer of local and international meetups and data engineering conferences.Ned Letcher has worked as a data science and software engineering consultant since completing his PhD in computational linguistics in 2018 and currently works at Thoughtworks. He has designed and developed data-powered products and services across a range of industries and helped organizations and teams improve the effectiveness of their data processes and workflows. Ned has also worked as a Python trainer, supporting both tertiary students and data professionals across various organizations. He is active in the data community, speaking at and helping organize meetups and conferences, as well as contributing to a range of open source projects.
Read more
  • 0
  • 0
  • 424

article-image-top-100-essential-data-science-tools-repos-streamline-your-workflow-today
Merlyn Shelley
27 Jun 2024
14 min read
Save for later

Top 100+ Essential Data Science Tools & Repos: Streamline Your Workflow Today!

Merlyn Shelley
27 Jun 2024
14 min read
IntroductionAs data professionals, navigating the vast sea of Big Data often leaves us searching for the right tools to harness its potential. Whether we're defining intricate problems, identifying emerging trends, or crafting innovative solutions, the challenge is undeniable. Too often, this quest has us wandering aimlessly through the web, seeking elusive answers. Here at the DataPro Newsletter team, we understand this all too well. That's why, in celebration of our 100th edition, we're thrilled to present a special gift to our valued readers—a thorough reference module brimming with resources. This carefully curated collection features over 100 of the most popular tools and GitHub repositories. Each one is not only widely used and trusted but is also consistently updated with the latest breakthroughs to enhance your data processing capabilities. Think of this module as your treasure chest, designed to streamline your workflow and inspire innovative solutions. Bookmark this page for quick access whenever you encounter challenges in any area of data science and machine learning, from DataOps to Recommender Systems to Quantitative Finance—we've got it all covered! So, dive into this one-stop reference module, explore its depths, and let the spirit of data kinship propel you forward. Here's to more empowering tools and transformative insights from your DataPro team—cheers! DataOps/MLOps kestra-io/kestra: Kestra is an open-source orchestrator for scheduled and event-driven workflows, leveraging Infrastructure as Code for reliable management. open-metadata/OpenMetadata: OpenMetadata is a unified platform for data discovery, observability, and governance, featuring a central repository, column lineage, and team collaboration. dolthub/dolt: Dolt is a SQL database with Git-like version control features, accessible via MySQL or a command line interface. iterative/dvc: DVC is a tool for reproducible machine learning, enabling data and model versioning, lightweight pipelines, experiment tracking, and easy sharing. quiltdata/quilt: Quilt allows creating versioned datasets with Python and an S3 bucket. It supports data-driven teams, aiding rapid experimentation and collaboration. Real-time Data Processing allinurl/goaccess: GoAccess is a real-time web log analyzer for *nix systems and browsers, offering fast HTTP statistics. More details: goaccess.io. feathersjs/feathers: Feathers is a TypeScript/JavaScript framework for building APIs and real-time apps, compatible with various backends and frontends. apache/age: Apache AGE extends PostgreSQL with graph database capabilities, supporting both relational SQL and openCypher graph queries seamlessly. zephyrproject-rtos/zephyr: Real-time OS for diverse hardware, from IoT sensors to smart watches, emphasizing scalability, security, and resource efficiency. hazelcast/hazelcast: Hazelcast integrates stream processing and fast data storage for real-time insights, enabling immediate action on data-in-motion within unified platform. Data Quality Management WeBankFinTech/Qualitis: Qualitis manages data quality through verification, notification, and management across various data sources, solving data processing-related quality issues. raystack/optimus: Optimus is a robust workflow orchestrator for data transformation, modeling, pipelines, and quality management, emphasizing ease of use and reliability. Toloka/crowd-kit: Crowd-Kit is a Python library for crowdsourced annotation, featuring aggregation methods, metrics, and datasets to simplify working with crowd data. ydataai/ydata-profiling: ydata-profiling offers a streamlined, fast EDA solution akin to pandas' df.describe(), providing detailed DataFrame analysis exportable in formats like HTML and JSON. cleanlab/cleanlab: cleanlab automates data and label cleaning by detecting issues in ML datasets, enhancing model training with real-world data. Predictive Analytics spring-cloud/spring-cloud-dataflow: Spring Cloud Data Flow enables microservices-driven data processing pipelines on Cloud Foundry and Kubernetes, supporting diverse use cases like streaming and batch processing. ScottfreeLLC/AlphaPy: AlphaPy, a Python ML framework, caters to speculators and data scientists with scikit-learn, pandas, and additional tools for feature engineering and visualization. retentioneering/retentioneering-tools: Retentioneering simplifies analyzing clickstreams and user paths, offering deeper insights than funnel analysis, benefiting data and marketing analysts. genular/pandora: PANDORA offers advanced analytics for biomedical research, employing machine learning tools like clustering, PCA, UMAP, and interpretable models for discovery. nabeel-oz/qlik-py-tools: Qlik's SSE integrates modern data science into Qlik Sense, enabling business users to leverage advanced analytics through Python-based functions. Deep Learning Lightning-AI/pytorch-lightning: Lightning 2.0 simplifies PyTorch workflows with a stable API, enabling scalable training and deployment of AI models efficiently. ultralytics/yolov5: YOLOv5 by Ultralytics is a leading vision AI model, built on extensive open-source research and development for advanced performance. hpcaitech/ColossalAI: Colossal-AI simplifies distributed deep learning with user-friendly tools, enabling easy parallel training and inference similar to local model development. naptha/tesseract.js: Tesseract.js simplifies OCR with a webassembly-based Tesseract engine, supporting both browser and Node.js environments with easy integration and setup. microsoft/DeepSpeed: DeepSpeed enables efficient training of models like ChatGPT with significant speed improvements and cost reductions across all scales. Reinforcement Learning ray-project/ray: Ray is a unified framework that scales AI and Python applications with a distributed runtime and specialized AI libraries. d2l-ai/d2l-en: An open-source book using Jupyter notebooks to make deep learning accessible, blending concepts, context, and interactive code examples. Unity-Technologies/ml-agents: Unity ML-Agents enables games and simulations for training intelligent agents with deep reinforcement learning and imitation learning, fostering innovation in AI. google/trax: Trax is a Google Brain-endorsed deep learning library known for clear code and speed, demonstrated in a Colab notebook. wandb/wandb: The repository includes a CLI and Python API for visualizing and tracking machine learning experiments effectively. VowpalWabbit/vowpal_wabbit: Vowpal Wabbit advances machine learning with online, hashing, allreduce, and active learning techniques, pushing the frontier of ML capabilities. Time Series Analysis taosdata/TDengine: TDengine is a high-performance, open-source time-series database designed for IoT, connected cars, industrial IoT, and DevOps environments. timescale/timescaledb: An open-source SQL database for time-series data, optimized for rapid data ingestion and complex querying, available as a PostgreSQL extension. influxdata/telegraf: Telegraf is an agent for gathering and processing metrics, logs, and data, featuring 300+ plugins and community-driven development for flexibility. questdb/questdb: QuestDB is an open-source time-series database known for high throughput ingestion, fast SQL queries, and operational simplicity, ideal for various high-cardinality datasets. ccfos/nightingale: Nightingale is an all-in-one, open-source, cloud-native monitoring system combining data collection, visualization, and alerting capabilities seamlessly. Data Engineering  PrefectHQ/prefect: Prefect simplifies Python data pipeline orchestration, transforming scripts into dynamic workflows that react to changes and ensure resilience. airbytehq/airbyte: Airbyte, an open-source data integration platform, offers 300+ connectors for seamless ELT pipelines between diverse data sources and destinations. argoproj/argo-workflows: Argo Workflows orchestrates parallel jobs on Kubernetes via container-native workflows, supporting DAGs and accelerating compute-intensive tasks like ML and data processing. dagster-io/dagster:  Dagster is a cloud-native data pipeline orchestrator with integrated lineage, observability, declarative programming, and robust testability across the lifecycle. Avaiga/taipy: Taipy simplifies web app development for data scientists & ML engineers using Python, focusing on AI algorithms with no extra languages. Business Intelligence ankane/blazer: SQL-based tool for data exploration, chart creation, dashboard sharing. Supports various data sources, variables, checks, audits, and security integrations. evidence-dev/evidence: Open-source BI tool uses Markdown with SQL queries for data sourcing, rendering charts, and generating templated, dynamic web pages. lightdash/lightdash: Empower teams with self-service data insights using dbt: define metrics, visualize data, and share dashboards seamlessly across your organization. TuiQiao/CBoard: User-friendly open BI platform for self-service reporting and dashboards, simplifying data insights and sharing across teams effortlessly. quarylabs/quary: BI platform for engineers to connect databases, write SQL for table transformations, create charts, dashboards, and reports with collaboration and deployment capabilities. Data Visualization netdata/netdata: Real-time metrics collection and visualization for servers, cloud, Kubernetes, and edge/IoT devices, scaling effortlessly across diverse environments. directus/directus: Open-source API and dashboard for managing SQL database content with REST & GraphQL interfaces, supporting various databases, and customizable for on-premises or cloud deployment. airbnb/visx: Reusable low-level visualization components combining d3's power with React's DOM updating capabilities for dynamic data visualization. uber/react-vis: React component library for diverse data visualizations: line, bar, scatter, heatmaps, pie charts, sunbursts, radar charts, and more. bokeh/bokeh: Interactive visualization library for web browsers, offering versatile graphics creation and high-performance interactivity for large datasets and dashboards. apache/echarts: Free JavaScript library for intuitive, interactive, and customizable charts, ideal for enhancing commercial products with powerful visualizations. Recommender Systems NicolasHug/Surprise: Python scikit for building recommender systems with explicit rating data, emphasizing experiment control, dataset handling, and diverse prediction algorithms. gorse-io/gorse: Open-source recommendation system in Go, designed for universal integration into online services, automating model training based on user interaction data. recommenders-team/recommenders: Recommenders, a Linux Foundation project, offers Jupyter notebooks for building classic and cutting-edge recommendation systems, covering data prep, modeling, evaluation, optimization, and production deployment on Azure. alibaba/Alink: Alink, developed by Alibaba's PAI team, integrates Flink for ML algorithms. PyAlink supports various Flink versions, maintaining compatibility up to Flink 1.13. RUCAIBox/RecBole: RecBole, built on Python and PyTorch, facilitates research with 91 recommendation algorithms across general, sequential, context-aware, and knowledge-based categories. Quantitative Finance AI4Finance-Foundation/FinGPT: FinGPT is a cost-effective, adaptable financial large language model for quick updates and fine-tuning, enhancing accessibility compared to BloombergGPT. google/tf-quant-finance: This library leverages TensorFlow's hardware acceleration and automatic differentiation for high-performance mathematical methods, mid-level functions, and pricing models support. goldmansachs/gs-quant: GS Quant, a Python toolkit by Goldman Sachs, aids in developing quantitative trading strategies and risk management solutions with robust market experience. domokane/FinancePy: A Python finance library specializing in pricing and managing financial derivatives across fixed-income, equity, FX, and credit markets. romanmichaelpaolucci/Q-Fin: QFin is evolving with enhanced object-oriented principles, deprecating old modules like PDEs/SDEs, introducing 'stochastics' for model calibration and option pricing. avhz/RustQuant: This Rust library for quantitative finance covers diverse modules from autodiff and data handling to instruments pricing and stochastic processes. Responsible AI microsoft/responsible-ai-toolbox: Responsible AI Toolbox offers interfaces and libraries for model and data exploration, enabling developers to monitor and improve AI responsibly. Giskard-AI/giskard: Giskard, an open-source Python library, detects performance, bias, and security issues in AI applications, spanning LLMs to traditional ML models. fairlearn/fairlearn: Fairlearn, a Python package, helps developers assess and mitigate fairness issues in AI systems with algorithms and assessment metrics provided. Azure/PyRIT: PyRIT is an open-access Python tool for generative AI, aiding security professionals and ML engineers in identifying system risks. ModelOriented/DALEX: DALEX enhances model transparency to prevent failure through its explainability tools, supporting understanding and trust in complex AI systems. JohnSnowLabs/langtest: LangTest simplifies testing of AI models with over 60 tests in one line, covering robustness, bias, fairness, and accuracy across various NLP frameworks. Explainable AI (XAI) SeldonIO/alibi: Alibi is a Python library focused on machine learning model inspection, offering diverse explanation methods for classification and regression models. Trusted-AI/AIX360: AI Explainability 360 offers an open-source Python toolkit for detailed model interpretability across various data types, supporting diverse explanation methods. dssg/aequitas: Aequitas is an open-source toolkit for bias auditing and Fair ML, aiding data scientists and researchers in assessing and correcting model biases. albermax/innvestigate: iNNvestigate is a Python library providing a unified interface for various methods to analyze neural networks' predictions and understand their internal workings. mindsdb/lightwood: Lightwood is an AutoML framework simplifying machine learning pipelines with JSON-AI syntax, allowing customization and automation across diverse data types. Anomaly Detection SeldonIO/alibi-detect: Alibi Detect is a Python library for detecting outliers, adversarial attacks, and drift in tabular, text, image, and time series data. datamllab/tods: TODS automates outlier detection in multivariate time-series data with modules for data processing, feature analysis, and diverse detection algorithms. pygod-team/pygod: PyGOD is a Python library using PyTorch Geometric for graph outlier detection, offering 10+ algorithms and easy integration with PyOD. Jingkang50/OpenOOD: This repository replicates methods from the Generalized Out-of-Distribution Detection Framework for fair comparison across anomaly, novelty, and out-of-distribution detection methods. yzhao062/pyod: PyOD is a Python library for detecting anomalies in multivariate data, offering diverse algorithms for various project scales and datasets. chaos-genius/chaos_genius: Chaos Genius is an open-source ML-powered analytics engine for outlier detection and root cause analysis at scale. Supply Chain Analytics guacsec/guac: GUAC creates a high fidelity graph database for software security, facilitating organizational outcomes like audit, policy, and risk management. owasp-dep-scan/blint: BLint is a Binary Linter using lief to verify executable security and capabilities, now supporting SBOM generation for compatible binaries. samirsaci/picking-route: This repository focuses on improving warehouse productivity through Python-based tools and methodologies, particularly addressing order batching and optimizing picking routes using the Single Picker Routing Problem (SPRP). ragamarkely/scanalytics: Scanalytics automates Supply Chain Analytics & Design tasks in Python, streamlining analyses and reducing manual spreadsheet work for assignments. aitechtools/SunFlow: SunFlow optimizes supply chain design with comprehensive modeling of materials, components, suppliers, manufacturers, and customers, integrating costs, capacities, and constraints. CIOL-SUST/SupplyGraph: This repository introduces a benchmark dataset for applying Graph Neural Networks (GNNs) to supply chain networks, enabling research in optimization and prediction. Network Optimization ray-project/ray: Ray is a scalable framework with a distributed runtime and AI libraries designed to accelerate AI and Python applications. svg/svgo: SVGO optimizes SVG files by removing redundant metadata, comments, and hidden elements to improve file efficiency and rendering performance. zeux/meshoptimizer: meshoptimizer is a C/C++ library optimizing GPU rendering by reducing mesh complexity and storage overhead, compatible with Rust via meshopt crate. cvxpy/cvxpy: CVXPY is a Python-based modeling language designed for convex optimization problems, providing a natural expression format aligned with mathematical conventions. guofei9987/scikit-opt: The repository provides Python implementations of various swarm intelligence algorithms such as Genetic Algorithm, Particle Swarm Optimization, and others for optimization tasks. Speech Processing espnet/espnet: ESPnet is a detailed speech processing toolkit using PyTorch, covering recognition, synthesis, translation, enhancement, diarization, and understanding tasks. mozilla/DeepSpeech: DeepSpeech is an open-source Speech-To-Text engine based on Baidu's research, implemented using TensorFlow for accessibility and performance. microsoft/SpeechT5: The repository proposes SpeechT5, adapting T5's text-to-text approach for self-supervised speech and text representation learning using shared encoders and modality-specific nets. sloria/TextBlob: Python library simplifying NLP tasks like POS tagging, sentiment analysis, and classification with a straightforward API for textual data. pytorch/audio: Torchaudio integrates PyTorch with audio processing, emphasizing GPU acceleration, trainable features via autograd, and maintaining a consistent tensor-based style. Graph Data Science neo4j/graph-data-science: The Neo4j Graph Data Science (GDS) library offers graph algorithms, transformations, and ML pipelines, accessible via Cypher within Neo4j. cncf/landscape-graph: This repository explores open source project dynamics, evolution, and collaboration using a Graph Data Model for insightful community analysis. BlueBrain/nexus: Blue Brain Nexus organizes and enhances data with a Knowledge Graph ecosystem, featuring various products, libraries, and tools for comprehensive use. lynxkite/lynxkite: LynxKite is a robust graph data science platform with a user-friendly interface and powerful Python API for large datasets. dgraph-io/dgraph: Dgraph is a scalable GraphQL database optimized for performance, offering ACID transactions and distributed architecture for real-time queries. arangodb/arangodb: ArangoDB is a versatile multi-model database supporting documents, graphs, and key-values, empowering high-performance applications with SQL-like queries and JavaScript extensions. ETL/ELT (Extract, Transform, Load / Extract, Load, Transform) redpanda-data/connect: Redpanda Connect is a robust stream processor for seamless data integration, featuring a powerful mapping language and easy deployment options. turbot/steampipe: Steampipe simplifies data access from APIs with CLI, Postgres FDWs, SQLite extensions, export tools, and cloud-based Turbot Pipes. risingwavelabs/risingwave: RisingWave is a cost-efficient streaming database compatible with Postgres, designed for real-time event streaming data processing and analysis. apache/dolphinscheduler: Apache DolphinScheduler is a modern data orchestration platform with low-code workflow creation, robust task management, and cloud-native capabilities. rudderlabs/rudder-server: RudderStack is a privacy-focused, Segment-alternative platform in Golang and React. It simplifies data collection and integrates with warehouses and tools for enriched customer data pipelines. We hope this extensive collection of tools and techniques proves to be a valuable asset in your daily data practice. May it help you achieve smoother workflows and better outcomes! 
Read more
  • 1
  • 0
  • 1413
article-image-mastering-semi-structured-data-in-snowflake
Serge Gershkovich
27 Jun 2024
7 min read
Save for later

Mastering Semi-Structured Data in Snowflake

Serge Gershkovich
27 Jun 2024
7 min read
This article is an excerpt from the book, Data Modeling with Snowflake, by Serge Gershkovich. Discover how Snowflake's unique objects and features can be used to leverage universal modeling techniques through real-world examples and SQL recipes.Introduction In the era of big data, the ability to efficiently manage and analyze semi-structured data is crucial for businesses. Snowflake, a leading cloud-based data platform, offers robust features to handle semi-structured data formats like JSON, Avro, and Parquet. This article explores the benefits of using the VARIANT data type in Snowflake and provides a hands-on guide to managing semi-structured data.The Benefits of Semi-Structured Data in Snowflake Semi-structured data formats are popular due to their flexibility when working with dynamically varying information. Unlike relational schemas, where a precise entity structure must be predefined, semi-structured data can adapt to include or omit attributes as needed, as long as they are properly nested within corresponding parent objects. For example, consider the contact list on your phone. It contains a list of people and their contact details but does not capture those details uniformly. Some contacts may have multiple phone numbers, while others have only one. Some entries might include an email address and street address, while others have just a number and a vague description. To handle this type of data, Snowflake uses the VARIANT data type, which allows semi-structured data to be stored as a column in a relational table. Snowflake optimizes how VARIANT data is stored internally, ensuring better compression and faster access. Semi-structured data can sit alongside relational data in the same table, and users can access it using basic extensions to standard SQL, achieving similar performance. Another compelling reason to use the VARIANT data type is its adaptability to change. If columns are added or removed from semi-structured data, there is no need to modify ELT (extract, load, and transform) pipelines. The VARIANT data type does not require schema changes, and read operations will not fail for an attribute that no longer exists.Getting Hands-On with Semi-Structured Data Let's delve into a practical example of working with semi-structured data in Snowflake. This example uses JSON data representing information about pirates, such as details about the crew, weapons, and their ship. All this information is stored in a single VARIANT data type column. In relational data, a row represents a single entity; in semi-structured data, a row can represent an entire file containing multiple entities. Creating a Table for Semi-Structured Data Here is a sample SQL script to create a table with semi-structured data:CREATE TABLE pirates_data (    id NUMBER AUTOINCREMENT PRIMARY KEY,    load_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,    data VARIANT ); ``` In this example, the `AUTOINCREMENT` keyword generates a unique ID for each record inserted, and the `VARIANT` column stores the semi-structured JSON data.Loading Semi-Structured Data To load semi-structured data into Snowflake, you can use the `COPY INTO` command. Here’s an example of how to load JSON data from an external stage into the `pirates_data` table:COPY INTO pirates_data FROM @my_stage/pirates_data.json FILE_FORMAT = (TYPE = 'JSON'); ```Querying Semi-Structured Data Once the data is loaded, you can query it using standard SQL. For instance, to extract specific attributes from the JSON data, you can use the dot notation: SELECT    data:id::NUMBER AS pirate_id,    data:crew AS crew,    data:weapons AS weapons FROM pirates_data; ```This query extracts the `id`, `crew`, and `weapons` fields from the JSON data stored in the `data` column.Converting Semi-Structured Data into Relational Data Although semi-structured data offers flexibility, converting it into a relational format can provide better performance for certain queries. Snowflake allows you to transform VARIANT data into relational columns using the `FLATTEN` function. Here's an example of how to flatten a JSON array into a relational table:SELECT    value:id::NUMBER AS pirate_id,    value:name::STRING AS name,    value:rank::STRING AS rank FROM pirates_data, LATERAL FLATTEN(input => data:crew); ``` This query converts the `crew` array from the JSON data into individual rows in a relational format, making it easier to query and analyze.Schema-on-Read vs. Schema-on-Write One of the main advantages of using the VARIANT data type in Snowflake is the flexibility of schema-on-read. This approach allows you to ingest data without a predefined schema, and then define the schema at the time of reading the data. This contrasts with the traditional schema-on-write approach, where the schema must be defined before data ingestion.Benefits of Schema-on-ReadFlexibility: You can ingest data without worrying about its structure, which is particularly useful for unstructured or semi-structured data sources.Adaptability: Schema changes do not require re-ingestion of data, as the schema is applied at read time.Speed: Data can be loaded more quickly, as there is no need to enforce a schema during the ingestion process.Example: Using Schema-on-Read with VARIANT Data Here’s an example demonstrating schema-on-read with semi-structured data in Snowflake: SELECT    data:id::NUMBER AS pirate_id,    data:ship.name::STRING AS ship_name,    data:ship.type::STRING AS ship_type FROM pirates_data; ```In this query, the schema is defined at read time, allowing you to extract specific attributes from the nested JSON data.Handling Nested and Repeated Data Snowflake’s support for semi-structured data also extends to handling nested and repeated data structures. The FLATTEN function is particularly useful for working with such data, enabling you to transform nested arrays into a more manageable relational format.Example: Flattening Nested Data Consider a JSON structure where each pirate has a nested array of previous voyages. To flatten this nested data, you can use the following query: SELECT    data:id::NUMBER AS pirate_id,    value:date::DATE AS voyage_date,    value:destination::STRING AS voyage_destination FROM pirates_data, LATERAL FLATTEN(input => data:previous_voyages); ```This query extracts the nested `previous_voyages` array and converts it into individual rows in a relational format.Performance Considerations When working with semi-structured data in Snowflake, it’s important to consider performance implications. While the VARIANT data type offers flexibility, it can also introduce overhead if not managed properly.Tips for Optimizing PerformanceUse Caching: Take advantage of Snowflake’s caching mechanisms to reduce query times for frequently accessed data.Optimize Queries: Write efficient SQL queries, avoiding unnecessary complexity and ensuring that only the required data is processed.Monitor Usage: Regularly monitor your Snowflake usage and performance metrics to identify and address potential bottlenecks.ConclusionHandling semi-structured data in Snowflake using the VARIANT data type provides immense flexibility and performance benefits. Whether you are dealing with dynamically changing schemas or integrating semi-structured data with relational data, Snowflake’s capabilities can significantly enhance your data management and analytics workflows. By leveraging the techniques outlined in this article, you can efficiently model and transform semi-structured data, unlocking new insights and value for your organization.For more detailed guidance and advanced techniques, refer to the book "Data Modeling with Snowflake," which provides comprehensive insights into modern data modeling practices and Snowflake’s powerful features.Author BioSerge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more
  • 0
  • 0
  • 388

article-image-prompt-engineering-with-azure-prompt-flow
Shankar Narayanan SGS
06 May 2024
10 min read
Save for later

Prompt Engineering with Azure Prompt Flow

Shankar Narayanan SGS
06 May 2024
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe ability to generate relevant and creative prompts is one of the imperative aspects of the natural language processing system. Especially when the world is evolving in the landscape of artificial intelligence, it is one of the crucial prospects. During this situation, Microsoft's Azure prompt flow provides groundbreaking solutions while empowering the data, scientists, and developers to engineer prompts effectively.  Here, let us explore the nuances of Azure prompt flow while delving deep into the realm of prompt engineering. Significance of Prompt Engineering With the help of prompt engineering, one can construct problems, helping the user with the guide of machine learning models effectively. However, it involves Formulating contextually relevant and specific questions or statements that elicit the desired responses from the artificial intelligence models. Azure prompt flow is one of the sophisticated tools by Microsoft Azure that simplifies intricate processes while enabling the developers to create brands that can have meaningful and accurate outcomes. Getting started with Azure prompt flow Even before exploring the practical applications of Azure Prompt flow, it is necessary to understand the few essential components of Azure prompt flow. The core of prompt flow utilizes the GPT 3.5 architecture to generate various relevant responses to prompts. With the integration of Azure, one can expect a secure and seamless environment for prompt engineering.  Let us consider a practical example of a chatbot application.  from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential # Set up Azure Text Analytics client key = "YOUR_AZURE_TEXT_ANALYTICS_KEY" endpoint = "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" credential = AzureKeyCredential(key) text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=credential) # User input user_input = "Tell me a joke." # Generate a prompt using Azure Promptflow prompt = f"User: {user_input}\nChatbot:" # Get chatbot's response response = text_analytics_client.analyze_sentiment(prompt) # Output the response print(f"Chatbot: {response[0]['sentiment']}") In this particular example, we can see that the user inputs a request. Azure prompt flow constructs the required form for the chatbot while generating a sentiment analysis response. Here is the output:Chatbot: Positive Tuning prompts using Azure Promptflow  Crafting good prompts can be a challenging task. With the concept of variants, the user would be able to test the behavior of the model under various conditions. Example:  If the user wants to create a chatbot using Azure Promptflow, then this example might help one to respond creatively to the queries about movies.  Prompt Tuning: User: "Tell me about your favorite movie." Chatbot: "Certainly! One of my favorite movies is 'Inception.' Directed by Christopher Nolan, it's a mind-bending sci-fi thriller that explores the depths of the human mind." Python code:  from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential # Set up Azure Text Analytics client key = "YOUR_AZURE_TEXT_ANALYTICS_KEY" endpoint = "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" credential = AzureKeyCredential(key) text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=credential) # User input user_input = "Tell me about your favorite movie." # Generate a creative prompt using Azure Promptflow prompt = f"User: {user_input}\nChatbot: Certainly! One of my favorite movies is 'Inception.' Directed by Christopher Nolan, it's a mind-bending sci-fi thriller that explores the depths of the human mind." # Get chatbot's response response = text_analytics_client.analyze_sentiment(prompt) # Output the response print(f"Chatbot: {response[0]['sentiment']}") In this example, Azure Promptflow is used to create prompts tailored to specific user queries, providing creative and contextually relevant responses. The analyze_sentiment function from the Azure Text Analytics client is used to assess the sentiment of the generated prompts. Replace "YOUR_AZURE_TEXT_ANALYTICS_KEY" and "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" with your actual Azure Text Analytics API key and endpoint. Here are a few examples: URL: https://music.apple.com/us/app/apple-music/id1108187390 Text Content: Apple Music is a comprehensive music streaming app that boasts an extensive library of songs, albums, and playlists. Users can enjoy curated playlists, radio shows, and exclusive content from their favorite artists. Apple Music allows offline downloads and offers a family plan for multiple users. It also integrates with the user's existing music library, making it seamless to access purchased and uploaded music. OUTPUT: {"category": "App", "evidence": "Both"}​​​​ URL: https://www.youtube.com/user/premierleague Text Content: Premier League Pass, in collaboration with the English Premier League, delivers live football matches, highlights, and exclusive behind-the-scenes content on YouTube. Football aficionados can stay updated with their favorite teams and players through this official channel. Subscribing to Premier League Pass on YouTube ensures fans never miss a moment from the most exciting football league in the world. OUTPUT: {"category": "Channel", "evidence": "URL"} URL: https://arxiv.org/abs/2305.06858 Text Content: This research paper explores the realm of image captioning, where advanced algorithms generate descriptive captions for images. The study delves into techniques that combine computer vision and natural language processing to achieve accurate and contextually relevant image captions. The paper discusses various models, evaluates their performance, and presents findings that contribute to the field of image captioning technology. OUTPUT: {"category": "Academic", "evidence": "Text content"} URL: https://exampleconstructionsite.com/ Text Content: This website is currently under construction. Please check back later for updates and exciting content. OUTPUT: {"category": "None", "evidence": "None"}  For a given URL: {{url}}, and text content: {{text_content}}. Classified Category: Travel Evidence: The text contains information about popular tourist destinations, travel itineraries, and hotel recommendations. OUTPUT: After summarizing, here is the final Promptflow with 2 variants for the summarize_text_content node.  Benefits of using Azure ML prompt flow Apart from offering a wider range of benefits, Azure ML promptflow helps users to make the transition from ideation to experimentation. This ultimately results in production ready LLM based applications.  Prompt engineering agility  Azure prompt flow offers a visual representation of the struct of the flow structure. It allows the users to understand and navigate the projects while offering a notebook-like coding experience for debugging and efficient flow development. At the same time, users can create as well as compare more than one prompt variant which helps in facilitating an iterative refinement process. Enterprise readiness  The prompt flow streamlines the entire prompt engineering process and leverages robust enterprise readiness solutions. It thus offers a secure, reliable, and scalable foundation for experimentation and development. Besides, it supports team collaboration where multiple users can work together, share knowledge, and maintain version control. Application development  The well-defined process of Azure prompt facilitates the seamless development of AI applications. Only by leveraging it the user can progress effectively through the consequent stages of developing, testing, tuning, and deploying flows. All these ultimately result in creating a fully-fledged AI applications.   However, when the user follows this methodical and structured approach, it empowers them to develop fine-tune and test rigorously to deploy with confidence. Real-world applications of Azure Promptflow  Content creation One of the applications of Azure promptflow lies in the content creation tunes. Various content creators can generate outlines and creative ideas by creating engineering tailored to specific topics. One can even generate entire paragraphs using the prompt flow engineering method. This helps streamline the content creation process while making it look more inspiring and efficient. Language Translation Developers are now leveraging Azure promptflow to build large language translation applications. With the help of constructing prompts in the source language, one can let the system translate the inputs by providing accurate outputs required in the desired language. Such a profound implication can only be possible with the help of Azure prompt flow. It has the propensity to break all the language barriers in the globalized world. Custom support chat box By integrating Azure prompt flow within the customer support chatbots, one can enhance the user experience. However, the prompt engineering techniques help ensure the queries are accurately understood. This process would result in relevant and precise responses. It significantly reduces the response time while improving customer satisfaction. Azure prompt flow simplifies prompt engineering   Prompt engineering is an iterative and challenging process. With the help of Azure prompt flow, one can simplify the development, comparisons, and evaluation of problems. The process makes it easier for the user to find the best prompt for use cases.  Besides, developing a chatbot that utilizes large language models, including GPT3.5, can help companies provide personalized product recommendations based on customer input. Here, Azure prompt flow allows users to evaluate, create, and even deploy from the machine learning models. It speeds up the whole process of developing and deploying artificial intelligence solutions. At the same time, it also allows the user to create connections to the large language model.   Such models include GPT 3.5 and Azure open AI. Users can also use these models for different purposes, including chat computation or creating embeddings. Designing and modifying prompts Designing and modifying alarms for effective use is crucial, especially when using them for large language models. Azure prompt flow enables users to test, create, and deploy various prompt versions for recommendation purposes. To effectively utilize the large language model, especially while dealing with multiple prompts, it is imperative to modify them and design accordingly for better results. Once you can create the problems, it is time to evaluate and test them in multiple scenarios. For instance, if you are creating prompts for a product company, you must explain the process of prompts and their flow to handle the user queries. Also, one can mention the need for custom coding and deployment of end-to-end solutions with the help of Azure's prompt flow feature. Conclusion  With powerful prompt engineering capabilities, Azure prompt flow enables the developers to construct contextually relevant prompts. It enhances the efficiency and accuracy of AI applications over various domains. The potential of prompt engineering makes the future of AI development promising. However, it can only be possible with the help of Azure AI leading the way. Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 364