Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How-To Tutorials

7017 Articles
article-image-fabrics-code-first-automl-and-hyperparameter-tuning-google-cloud-cortex-framework-snowflakes-data-metric-functions-qliks-ai-accelerator
Merlyn Shelley
29 Apr 2024
12 min read
Save for later

Fabric’s Code-First AutoML and Hyperparameter Tuning, Google Cloud Cortex Framework, Snowflake’s Data Metric Functions, Qlik's AI Accelerator

Merlyn Shelley
29 Apr 2024
12 min read
Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to BI-Pro #54: Your Premier Destination for Data and Business Intelligence Insights! 🌟 In this edition, we dive deep into the cutting-edge solutions of business intelligence, data modeling, and advanced analytics. Prepare to explore an array of transformative topics and industry insights that will redefine how you interact with technology and data. 🧩 Highlights of This Issue: Python Practice Platforms: The top 7 platforms where you can sharpen your Python skills. Innovative Experiments: Dive into hands-on experiments with MLFlow and Microsoft Fabric to enhance your project’s efficiency. SAP Expertise: Master the complex data models of SAP and leverage them for optimal performance. AI-Powered Business Management: Learn how to integrate AI to streamline and enhance business management functions. Snowflake’s Surveillance: Monitor your data pipelines effectively using Snowflake’s Data Metric Functions. 🧬 Stay Informed with Industry Highlights: Power BI: Learn about the significant deprecation of AutoML in Power BI using Dataflows V1. Microsoft Fabric: Get the scoop on the new code-first AutoML and hyperparameter tuning, now available in public preview. AWS BI: Discover how to build SAP Golden AMIs with EC2 Image Builder and Ansible and explore the transformative impact of Amazon Q on business experiences. Google Cloud Data: Catch up with the latest updates from the Google Cloud Cortex Framework. Tableau: Uncover how Einstein Copilot for Tableau is building the next generation of AI-driven analytics. From the Experts at Packt Community: Gain insights from industry leaders on the fundamentals of Analytics Engineering. 🧮 What’s the Latest from the BI Community? Explore real-time AI capabilities with Datorios’ new observability tool. Learn about Snowflake's launch of Arctic, an enterprise-grade LLM. Discover how Qlik's AI Accelerator is integrating generative AI to deliver customer outcomes. Witness the future of AI with Avant Technologies’ new supercomputing advancements. Join us as we unpack these topics to keep you at the forefront of the data and BI world. Stay curious, stay informed! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktPackt BI-Pro is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Upgrade to paidSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos🧩 pixiedust/pixiedust: PixieDust is an open-source library enhancing Jupyter notebooks, improving data work experience, particularly for cloud-hosted notebooks without configuration access. 🧩 plotly/plotly.py: plotly.py is an interactive, open-source graphing library for Python, offering over 30 chart types, including scientific, 3D, statistical, and financial charts. 🧩 AykutSarac/jsoncrack.com: JSON Crack is a free, open-source data visualization app for JSON, YAML, XML, CSV, etc., offering interactive graphs for easy data exploration and analysis. 🧩 apexcharts/apexcharts.js: ApexCharts is a JavaScript charting library with a simple API, 100+ samples, and over a dozen chart types for beautiful, responsive visualizations in apps and dashboards. 🧩 antvis/G2: G2 is a visualization library inspired by "The Grammar of Graphics," offering an introduction, examples, tutorials, and API reference for learning and using its core concepts. 🧩 visgl/deck.gl: deck.gl simplifies high-performance, WebGL2/WebGPU-based visualization of large datasets. It offers pre-built layers for easy setup or customizable architecture for tailored needs. Email Forwarded? Join BI-Pro Here!🔮 Revolutionizing Analytics: New BI Tools🧬 7 Best Platforms to Practice Python: The article lists seven platforms—Practice Python, Edabit, CodeWars, Exercism, PYnative, LeetCode, and HackerRank—that offer various levels of programming challenges for learning and practicing Python, particularly for coding interviews and skill improvement. 🧬 Experimenting with MLFlow and Microsoft Fabric: The blog discusses the importance of systematic experimentation in machine learning (ML) to improve model performance, highlighting the use of MLFlow within Fabric for managing ML experiments. It covers setting up experiments, running them, logging results, and analyzing them, emphasizing the importance of tracking configurations and outcomes for iterative improvement in ML models. 🧬 Mastering SAP’s data models: The article discusses challenges faced in understanding SAP data models for analytics, focusing on integrating procurement data. It explains SAP's ERP software, data architecture basics, table types (master vs. transaction), and data mapping for procurement tables. 🧬 Building an AI-Powered Business Manager: The post explores the concept of consolidating business management into a single, chat-based platform powered by Large Language Models (LLMs). It discusses the advantages for small businesses, outlines project structure, sets up the database, and updates the Tool class to handle SQLModel instances. 🧬 Monitor Data Pipelines Using Snowflake’s Data Metric Functions: The author emphasizes the importance of data quality in gaining trust with stakeholders and focuses on using Google's Site Reliability Engineering principles to measure the health of data systems. It discusses defining service level indicators and objectives for data quality dimensions and provides a technical implementation example in Snowflake. ⚡Stay Informed with Industry HighlightsPower BI🧮 Deprecation of AutoML in Power BI using Dataflows V1: The update announces the deprecation of Power BI Automated Machine Learning (AutoML) models for Dataflows V1 in all regions as of the third week of April. Customers are encouraged to migrate to the AutoML solution based on Synapse Data Science in Microsoft Fabric, offering a more customizable AutoML experience with advanced tools and features. Microsoft Fabric🧮 Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science: The update introduces code-first automated machine learning (AutoML) and hyperparameter tuning in Public Preview for Fabric Data Science. Users can access both AutoML and Tune capabilities seamlessly within the Fabric 1.2 runtime, enhancing machine learning model optimization and accessibility. 🧮 Fabric Change the Game: Embracing Azure Cosmos DB for NoSQL. The post explores setting up Azure Cosmos DB for NoSQL and leveraging Vector Search capabilities of AI Search Services through Microsoft Fabric's Lakehouse features. It also discusses integrating Cosmos DB Mirror and using Python coding facilitated through Lakehouse, highlighting Fabric's integration capabilities for search or data mirroring. 🧮 Microsoft Fabric April 2024 Update: The April 2024 update brings various enhancements and previews to Microsoft Fabric, including new visuals like the 100% Stacked Area Chart, improvements to reporting, data connectivity, administration features, analytics, real-time analytics, data factory, and data pipelines. Additionally, the update includes the availability of Exam DP-600 for Fabric Analytics Engineer certification and free learning sessions. AWS BI  🧮 Build SAP Golden AMIs with EC2 Image Builder and Ansible: This blog post guides users on building a reusable Amazon Machine Image (AMI) for deploying Amazon Elastic Compute Cloud (EC2) instances for SAP installations. It covers using Terraform and Ansible to automate the process and provides sample code. 🧮 Transforming Business Experiences: The Impact of Amazon Q and Generative BI for AWS Partners. This post highlights how advances in AI, particularly Amazon Q and generative BI, are transforming business operations. It showcases how AWS partners like ZS Associates, Tiger Analytics, and Compass UOL are leveraging these innovations for industry-specific solutions. Google Cloud Data 🧮 What’s new with Google Cloud Cortex Framework? The article discusses Google Cloud Cortex Framework, emphasizing its role in unifying enterprise data for AI-driven insights. It highlights new solutions for marketing, sustainability management, and finance, showcasing how Cortex Framework accelerates innovation, enhances decision-making, and drives business efficiency in the AI era. Tableau🧮 Einstein Copilot for Tableau: Building the Next Generation of AI-Driven Analytics. The post delves into the development of Einstein Copilot for Tableau, an AI-driven tool revolutionizing data analysis. It highlights the challenges and solutions in building its infrastructure, improving accuracy and efficiency, and enhancing AI and core capabilities through collaboration and continuous improvement. ✨ Expert Insights from Packt CommunityFundamentals of Analytics Engineering - By Dumky De Wilde, Fanny Kassapian, Jovan Gligorevic and 4 more The role of dbt in analytics engineering dbt emerged as a solution to the challenges relating to data transformation faced in data analysis. Initially crafted as an open-source Python package, dbt aimed to bring software engineering best practices to the world of analytics. Over time, dbt matured beyond just a package, becoming a versatile cloud service. While the open-source package remains available and actively supported, dbt now offers a cloud-based version, packed with features such as an integrated development environment (IDE), scheduling tools, data lineage trackers, and hosted documentation. This is especially valuable for analysts who might not have a deep software engineering background. For more information on dbt’s history, read https://www.getdbt.com/blog/what-exactly-is-dbt. We will use dbt Cloud, which offers a free tier for a single developer: that’s you! You can learn more about its pricing here: https://www.getdbt.com/pricing. dbt seamlessly integrates into the ELT architecture. It does not store or process data but serves as a bridge between analysts and the data warehouse. dbt’s position in a data stack as an intermediary in the transformation layer. This is how it works: analysts draft SQL queries, enhanced with dbt’s unique capabilities. dbt then translates this specialized SQL into the native SQL of the data warehouse and dispatches it for execution. All the transformed data and results remain within the data warehouse, making dbt a lightweight yet powerful tool in the analytics toolkit. Because of dbt’s pivotal position in analytics engineering, we will spend more time discussing its features and zooming in on best practices. First, we will set up dbt for our use case. Setting up dbt Cloud The following steps are required for dbt: Creating a dbt Cloud account. Setting up a connection from dbt Cloud to BigQuery. Testing the connection by querying the data using dbt Cloud. Follow the step-by-step instructions here: https://github.com/PacktPublishing/Fundamentals-of-Analytics-Engineering/blob/main/chapter_8/guides/setting_up_dbt_cloud.md. Now, let’s focus on the various data layers in dbt. Data layers in dbt It is a widespread practice to separate the data we use for analytics into layers. This helps data practitioners communicate the distinct parts of the data transformation process. Broadly speaking, the process will fall into three layers in dbt, Raw, Preparation and Business.  Let’s take a closer look: Raw layer: The source data is stored in the form it arrives in. Whenever you receive data, it should be stored as-is so that you have a backup in case something goes wrong during the transformations. When you copied the Excel sheets using Airbyte, they became part of the raw layer inside BigQuery. Preparation layer: In the second layer, the raw data is cleaned, deduplicated, and transformed to conform to naming conventions and other rules. For our data, this could mean renaming fields for readability and standardizing sales figures from cents to euros. Business layer: In the final layer, business rules are applied to the prepared data, and different data is joined and modeled into datasets that are ready for consumption by BI tools and stakeholders. In our case, we might add a business rule to disregard negative sales amounts when summing the total stroopwafels sold, as these are likely an error. The resulting data can then be served to the BI tool for dashboarding. Discover more insights from Fundamentals of Analytics Engineering - By Dumky De Wilde, Fanny Kassapian, Jovan Gligorevic and 4 more. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!    Read Here💡 What's the Latest Scoop from the BI Community? 🧠 Datorios unleashes real-time AI with the first observability tool for streaming data: Datorios introduces the first observability tool for Apache Flink, offering deep insights into streaming data processing. It enables faster AI innovation and thorough auditability, providing developers with event visualization, event search, state monitoring, window analysis, and more. Datorios is now publicly available for free. 🧠 Snowflake Launches Arctic: The Most Open, Enterprise-Grade Large Language Model: Snowflake introduces Snowflake Arctic, an open, enterprise-grade large language model (LLM) with a Mixture-of-Experts architecture, optimized for complex enterprise workloads. Arctic sets new openness standards for AI technology, offering weights under an Apache 2.0 license and enhancing AI innovation. 🧠 Introducing Qlik's AI Accelerator - Delivering Tangible Customer Outcomes in Generative AI Integration: Qlik is at the forefront of integrating generative AI, particularly Large Language Models (LLMs), into data analysis and decision-making. They address key challenges like data privacy, technical complexity, and cost, offering seamless integration of popular LLMs and an AI Accelerator program to quickly prove the benefits of AI integration with minimal barriers to entry. 🧠 Avant Technologies Launches Advanced AI Supercomputing: Avant Technologies, an AI company, introduces a supercomputing network and licensable dataset with Wired4Tech, aiming to accelerate AI adoption. The offerings include a versatile AI dataset, dynamic resource scaling, accelerated AI processing, robust security measures, and seamless integration, designed to empower developers and drive innovation across industries. See you next time!
Read more
  • 0
  • 0
  • 427

article-image-chatgpt-for-coding
Jakov Semenski
25 Apr 2024
6 min read
Save for later

ChatGPT for Coding

Jakov Semenski
25 Apr 2024
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT's coding style is terrible:Verbosecomplexand outdated.Let's change that.ChatGPT promised to be our coding savior, but sometimes it feels more like a blast from the past.Remember those early 2000’s coding books? Yep, it's giving those vibes.It's like having a sports car with a tractor engine. Great potential, but the performance? Not quite there.Imagine harnessing the power of ChatGPT but with the finesse of a master coder.Ready for the upgrade?Here are 12 Pro prompts that will get you the right results.Tip #1: Specificity is the kingAs soon as you ask for some coding snippet from ChatGPT, by default, you will get the most basic HelloWorld example.The more vague your prompt is, the more mediocre your results will beInstead, specify exactlylanguageversionframeworkWrite backend code for Library app that uses Rest to communicate Cover endpoints for adding, removing, and filtering books by category and date published Use Java latest version. Use lambda streams instead of for loops Use Spring framework Tip #2: Avoid code vomitChatGPT loves to write a lot of code, the way I like to call it “code vomit”We are no longer rewarded by the amount of code we produce, but by the clarity and principles we follow.Give chat GPT instructions towrite clean codeuse latest principlescover logging and exception handlingWrite clean code Code needs to be covered with Logging and proper exception handling Use principles: Kiss & DRY, SOLID Keep in mind to use design patterns where it is applicable Using coding instructions I gave you, give me code for each class Tip #3: Make it easy to use with IDEEvery time ChatGPT writes code you getexplanationsimport statementscomments.This can be good for a beginner but is not something we need for our IDEOur IDE is already good with importing all the right packages, so let ChatGPT knowWhen writing code, avoid detailed explanations, just simple bullet points Don't add import statements, as IDE will do this instead Tip #4 Write testsYour code is not complete if you are not done with tests.But not just any tests. We want to have unit and integration tests in areadable format (give when then)covering the happy and unhappy pathuses the latest testing libraries such as AssertJ and BDDMockitoFor each class write a unit and integration tests Use given when then format For libraries use BDDMockito and AssertJ Cover happy and unhappy paths Tip #5 Give REST call request examplesWhat is the app if we cannot test it without some examplesInstead of creating them manually, ask ChatGPT to create Curl examples we can easily copy to Postman.For each request, generate curl examples Now go ahead and use your terminal or copy/paste them to PostmanTip #6 Create documentationWe don’t want just plain text, instead, we need a quick start guide for developersWrite a quick start guide for developers using markdown. Imagine this app has been published to github repository Cover - Introduction - how to install app - how to run it - how to use it Tip #7 Prepare deployment script for CloudThis app cannot live just in your local environment. Instead, we need a deployment script.Depending on where you want to deploy your changes, it might beKubernetes cluster scriptGoogle-specific terraform scriptsAWS cloud formation scriptAzure-specific deployment scriptOr ask ChatGPT to suggest a deployment scriptProvide me deployment script for one of most popular cloud providers Tip #8 Version ControlOur code for now is living only locally. Let’s ask chatGPT to give us instructions on how to set up Version ControlProvide Github version control setup instructions Tip #9 Define CI/CD pipelineCI/CD or continuous integration and continuous deployment is a must-have step for any serious development.There are plenty of options to choose from, such asJenkinsGitHub actionsBambooWith CI we guarantee we cansafely merge our changes by running build and testscheck if our code changes comply with sonar policiesWith CD we guarantee that we can safely deploy our changesProvide github actions that for each open pull request we run the build and run all the tests Also automatically include sonarqube scans Also create github action to run deployment on every code merge Tip #10 Performance optimizationOur backend rest service is now running, but the question we need to ask ourselveshow fast is ithow many requests it can handlewhat is the maximum limit of requestsFor that, we need to execute performance tests, e.g. using jmeter or gatling.We need to test what is the limit of our app. Write a load test script for gatling that tests how many book searches we can execute Tip #11 Run a security auditHow can we ensure our app is secure and not open to any threats?The best way is to run security scans.Our application might be open for security threats. Which security scan tools we can use for free and how can we use them. Give me step-by-step instruction on how to use it. Tip #12 Optimize for observabilityYou have your app running somewhere in the cloud.But did you optimize it for observability?How can you easily troubleshoot issues?How can you trace requests between different services?Did you set up monitoringWe want to make sure our application is optimized for observability Create guideline and configuration for the cloud environment for Traceability - tracing request from start to finish Monitoring - monitoring key performance metrics Logging - have a centralized logging system ConclusionYou can find the full prompt herehttps://chat.openai.com/share/f0bef1ca-062d-4a22-96aa-9711615329a5ChatGPT is a tool, and like any tool, it shines when used the right way.With these prompts, you get a coding assistant that keeps up with the latest trends, ensuring your code is not just functional but also follows modern standards.Author BioJakov Semenski is an IT Architect working at IBMiX with almost 20 years of experience.He is also a ChatGPT Speaker at the WeAreDevelopers conference and shares valuable tech stories on LinkedIn.
Read more
  • 0
  • 0
  • 556

article-image-writing-unit-tests-for-r-functions-with-chatgpt
Dan MacLean
23 Apr 2024
5 min read
Save for later

Writing unit tests for R functions with ChatGPT

Dan MacLean
23 Apr 2024
5 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, R Bioinformatics Cookbook - Second Edition, by Dan MacLean. Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem.IntroductionChatGPT is beneficial for generating R unit tests because of its knowledge of programming patterns and best practices. It can understand code snippets, including functions and their expected behaviors, and generate relevant unit tests. By leveraging its training on diverse programming contexts, ChatGPT can assist in creating comprehensive test cases that cover different scenarios and edge cases. It helps ensure code correctness, identify potential bugs, and improve code reliability. ChatGPT’s ability to generate unit tests saves developers time and effort, promotes software quality, and contributes to building robust and well-tested R code. In this recipe, we’ll learn how to get ChatGPT to do the tedious work of generating test cases for code that we are working on.Getting readyWe just need a ChatGPT account: https://chat.openai.com/.How to do it…To write unit tests for an R function with ChatGPT, do the following:1. Paste the following request into ChatGPT:write unit tests for the following R code factorise_cols <- function(df, col_list ){ not_all_na <- function(x) any(!is.na(x)) for ( i in col_list){ df <- df %>% dplyr::mutate(!!(i) := as.factor(!!(i)) ) } df %>% dplyr::select_if(not_all_na) }2.  Read and check the output:library(dplyr) library(testthat) # Create a test data frame test_df <- data.frame( col1 = c("A", "B", "C", NA), col2 = c(1, 2, NA, 4), col3 = c("X", "Y", "Z", NA) ) # Define the expected output for the test data frame expected_df <- data.frame( col1 = factor(c("A", "B", "C", NA)), col3 = factor(c("X", "Y", "Z", NA)) ) # Start defining the unit tests test_that("factorise_cols correctly factorises selected columns", { # Call the factorise_cols function result_df <- factorise_cols(test_df, c("col1", "col3")) # Check if the selected columns are factorised expect_equal(result_df$col1, expected_df$col1) expect_equal(result_df$col3, expected_df$col3) # Check if other columns are not modified expect_equal(result_df$col2, test_df$col2) }) test_that("factorise_cols returns a data frame with non-NA columns", { # Call the factorise_cols function result_df <- factorise_cols(test_df, c("col1", "col3")) # Check if the returned data frame only contains non-NA columns expect_true(all(!is.na(result_df$col1))) expect_true(all(!is.na(result_df$col3))) expect_true(all(is.na(result_df$col2))) })And with that, we should have an output that can test some of the facets of our code at least and a good start to a more dependable and workable code base.How it works…The recipe here takes advantage of ChatGPT’s representations of the structure of the R programming language in order to write code that will effectively test some example code. In step 1, we simply define the function we wish to test and ask for tests.In step 2, we see the output that ChatGPT generated in this instance. It has given us a pretty good set of unit tests. As with everything to do with ChatGPT, there isn’t a guarantee that they are correct, but we can read and verify them very easily – certainly in much less time than it would take to write them. One thing to note is that, in this case at least, ChatGPT hasn’t generated tests for the case with only NA in a column, which we may decide we need. It is true that this isn’t clear in the initial code, so generating the test has given us a new thought on the safe running of this function.ConclusionIn conclusion, leveraging ChatGPT for unit testing R functions offers a transformative approach. Its adept understanding of programming nuances simplifies the arduous task of generating comprehensive tests, fostering code reliability and quality assurance. By effortlessly crafting diverse test cases, ChatGPT significantly reduces developers' workload, ensuring code correctness, identifying potential bugs, and fortifying the codebase against edge cases. While it doesn't guarantee absolute correctness, its output provides a solid foundation for enhancing code robustness. Embracing ChatGPT's capabilities not only saves time and effort but also contributes profoundly to building more dependable and well-tested R code, elevating the development process to new levels of efficiency and reliability.Author BioProfessor Dan MacLean has a Ph.D. in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now Head of Bioinformatics at the world-leading Sainsbury Laboratory in Norwich, UK where he works on bioinformatics, genomics, and machine learning. He teaches undergraduates, post-graduates, and post-doctoral students in data science and computational biology. His research group has developed numerous new methods and software in R, Python, and other languages with over 100,000 downloads combined.
Read more
  • 0
  • 0
  • 298

article-image-gemini-10-pro-vision-in-bigquery-python-ui-library-feature-engineering-with-fabric-and-pyspark-power-analytics-with-redshift-amazon-rds-for-mysql
Merlyn Shelley
19 Apr 2024
14 min read
Save for later

Gemini 1.0 Pro Vision in BigQuery, Python UI Library, Feature Engineering with Fabric and PySpark, Power analytics with Redshift, Amazon RDS for MySQL

Merlyn Shelley
19 Apr 2024
14 min read
Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!Get the first look at Sigma's new features and functionality at our virtual product launch on May 2nd at 12pm ET/9am PT.The virtual event will showcase talks and demos from Sigma's CEO, co-founders, and product managers about what's next in the future of analytics.Don't miss out. See how Sigma is reinventing BI.👋 Hello,Welcome to BI-Pro #52: Your Premier Destination for Data and BI Insights! 🌟 In This Edition: 🔮 Data Viz with Python Libraries Exploring causality with Python. Meet NiceGUI: Your Soon-to-be Favorite Python UI Library. Feature Engineering with Microsoft Fabric and PySpark. 10 GitHub Repositories to Master Python. 🔌 Power BI On-premises data gateway April 2024 release. Copilot in Power BI expansion. 🛠️ Microsoft Fabric Introducing Optimistic Job Admission for Fabric Spark. Introducing Job Queueing for Notebook in Microsoft Fabric. ☁️ AWS BI Meet Amazon QuickSight expert Sanjeeb Mohapatra. Handle tables without primary keys for Amazon Aurora MySQL and Amazon RDS for MySQL. Power analytics with Amazon Redshift. 🌐 Google Cloud Data Gemini 1.0 Pro Vision in BigQuery. BigQuery data canvas. Gemini in Looker AI-powered BI. Memorystore for Redis Cluster updates. Firestore launch updates. 📊Tableau Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools. Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever. ✨ Expert Insights from Packt Community ChatGPT for Cybersecurity Cookbook by Clint Bodungen. 💡 What's the Latest Scoop from the BI Community? Geospatial Data Analysis with Geemap. Microsoft Fabric Table Maintenance - Checkpoint and Statistics. Identifying Customer Buying Pattern in Power BI - Part 1. Full vs. Incremental Loads – Data Engineering with Fabric. Joining Queries in Azure Data Factory on Cosmos DB Sources. Feature Engineering with Microsoft Fabric and Dataflow Gen2. Stay ahead in the ever-evolving landscape of business intelligence with BI-Pro. Unleash the full potential of your data today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos 🐾 altair - Vega-Altair is a Python library for statistical visualization, offering simplicity, friendliness, and consistency for creating beautiful and effective visualizations. 🐾 bokeh - Bokeh is a Python library for creating interactive plots and data applications in web browsers, offering elegant and versatile graphics. 🐾 bqplot - bqplot is a 2-D visualization system for Jupyter, based on the Grammar of Graphics, enabling interactive plots with other Jupyter widgets. 🐾 cartopy - Cartopy simplifies map drawing in Python, offering easy projection definitions, point transformations, and integration with Matplotlib for advanced mapping. 🐾 diagrams - Diagrams simplifies cloud system architecture design in Python, supporting major providers and frameworks, allowing prototyping and visualization of existing architectures. Email Forwarded? Join BI-Pro Here!🔮 Data Viz with Python Libraries   🐍 Exploring causality with Python. Difference-in-differences: The series dives into causal inference, crucial in modern analytics, explaining tools like difference-in-differences. It explores how events impact outcomes, using examples such as minimum wage effects on employment. The setup involves treatment and control groups to establish cause-and-effect relationships in diverse real-world scenarios. 🐍 Meet the NiceGUI: Your Soon-to-be Favorite Python UI Library. NiceGUI is a Python UI framework for web and desktop apps, offering a simple interface for small projects, dashboards, and robotics. It simplifies state management and interaction, boasting features like easy layout, visualization tools, and integration with popular libraries. 🐍 Feature Engineering with Microsoft Fabric and PySpark: The post delves into feature engineering in Microsoft Fabric, emphasizing its importance in ML development. It explores PySpark's role in handling large datasets and provides a basic overview and example of using PySpark for feature engineering. 🐍 10 GitHub Repositories to Master Python: The blog explores 10 essential GitHub repositories for mastering Python, emphasizing hands-on experience and real-world projects to enhance skills. It covers a range of topics, from beginner to advanced, including machine learning, web development, and data analysis. Asabeneh/30-Days-Of-Python  trekhleb/learn-python  Avik-Jain/100-Days-Of-ML-Code  realpython/python-guide  zhiwehu/Python-programming-exercises  geekcomputers/Python  practical-tutorials/project-based-learning  avinashkranjan/Amazing-Python-Scripts  TheAlgorithms/Python  vinta/awesome-python   ⚡Stay Informed with Industry Highlights Power BI 📊 On-premises data gateway April 2024 release: This update to the on-premises data gateway aligns it with the April 2024 release of Power BI Desktop, ensuring consistency in query execution. Additionally, the gateway now supports refreshes longer than one hour, allowing tokens to be refreshed mid-stream for continuous operation.  📊 Copilot in Power BI: Soon available to more users in your organization. The update introduces changes to Copilot in Power BI, including enabling Copilot by default for all tenants starting May 20th, 2024. It also addresses features reported by customers and community, updates abuse monitoring to not store prompts, and improves geo mapping for EU data boundary customers. Microsoft Fabric📊 Introducing Optimistic Job Admission for Fabric Spark: The post introduces Optimistic Job Admission for Spark in Microsoft Fabric, a new feature aimed at improving concurrency and job admission experience. It explains how this feature optimizes resource allocation and increases the number of concurrent jobs that can be admitted to the cluster. 📊 Introducing Job Queueing for Notebook in Microsoft Fabric: Microsoft Fabric introduces Job Queueing for Notebook Jobs to streamline data engineering and data science processes. This feature automatically queues notebook jobs when Fabric capacity is maxed out, eliminating manual retries and improving user experience. Jobs are retried when resources become available, enhancing efficiency for enterprise users. AWS BI  📊 Meet one of Amazon QuickSight’s Top Community Experts: Sanjeeb Mohapatra. The Amazon QuickSight Community, launched in 2022, is a hub for BI authors and developers to collaborate, ask and answer questions, and learn about QuickSight. Sanjeeb Mohapatra, the top Community Expert for 2023, exemplifies the community's spirit by providing over 1,700 replies and 235 solutions in one year. 📊 Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift: AWS is advancing its zero-ETL vision with Amazon Aurora zero-ETL integration to Amazon Redshift, combining transactional data with analytics capabilities. This integration, along with four new ones announced at re:Invent 2023, empowers customers to implement near real-time analytics for various use cases. 📊 Power analytics as a service capabilities using Amazon Redshift: Analytics as a service (AaaS) leverages cloud-based analytic capabilities to enable cost-effective, scalable solutions for organizations. Amazon Redshift, a cloud data warehouse service, facilitates real-time insights and predictive analytics, empowering AaaS providers to embed rich data analytics capabilities. Delivery models include managed, bring-your-own-Redshift (BYOR), and hybrid options, offering flexibility to meet customer needs. Google Cloud Data 📊 How to use Gemini 1.0 Pro Vision in BigQuery? BigQuery integrates with Vertex AI to leverage Gemini 1.0 Pro, PaLM, Vision AI, Speech AI, Doc AI, Natural Language AI, enabling analysis of unstructured data like images, audio, and documents. New integrations support multimodal generative AI, enhancing capabilities for object recognition, info seeking, captioning, digital content understanding, and structured content generation, allowing structured data output for deeper analysis. 📊 Get to know BigQuery data canvas: BigQuery Data Canvas simplifies the data-to-insights journey by offering a natural language-driven experience. It centralizes data tasks, accelerates analysis, and fosters collaboration, all within a unified workspace, enabling faster and more efficient data analytics. 📊 Gemini in Looker to bring intelligent AI-powered BI to everyone: Gemini in Looker introduces Conversational Analytics, transforming how businesses engage with data. It offers a natural language-driven experience, simplifying data analytics and fostering collaboration, all within a unified workspace. 📊 Memorystore for Redis Cluster updates at Next ‘24: The article elaborates on the rapid adoption and recent enhancements of Google Cloud's Memorystore for Redis Cluster. It features customer testimonials from companies like Statsig, Character.AI, and AXON Networks, showcasing the service's performance, scalability, and cost-effectiveness. It also highlights new features such as data persistence, new node types, and ultra-fast vector search. 📊 Firestore launches at Next ‘24: Firestore is beloved by developers for its speed in app development. Updates include improved developer productivity, AI-enabled app building, richer queries, and enterprise-level scalability. Gemini Code Assist now supports Firestore, allowing natural language queries and data model definitions, enhancing the development experience. Firestore also supports AI applications and integrations with LangChain and LlamaIndex for generative AI. Tableau📊 Tableau vs Power BI: A Comparison of AI-Powered Analytics Tools. The comparison delves into the unique strengths of Tableau and Power BI, showcasing how each excels in different areas of data visualization and analytics. It outlines Tableau's robust visualizations and analytics capabilities, especially for large datasets, contrasting with Power BI's integration with Microsoft services and affordability for small to medium-sized businesses. 📊 Salesforce-Informatica Deal Could Transform Enterprise GenAI Forever: Salesforce is reportedly in advanced talks to acquire Informatica, a data-management software provider, for $11 billion. This aligns with Salesforce's strategy to expand beyond CRM, bolstered by recent AI advancements like Einstein Copilot, complementing Informatica's data integration expertise and potential synergy with Tableau and MuleSoft. Additionally, it aligns with Salesforce's strategy to expand beyond CRM and become a comprehensive data journey platform. ✨ Expert Insights from Packt Community ChatGPT for Cybersecurity Cookbook - By Clint Bodungen Sending API Requests and Handling Responses with PythonIn this recipe, we will explore how to send requests to the OpenAI GPT API and handle the responses using Python. We’ll walk through the process of constructing API requests, sending them, and processing the responses using the openai module. Getting ready Ensure you have Python installed on your system. Install the OpenAI Python module by running the following command in your Terminal or command prompt: pip install openai How to do it… The importance of using the API lies in its ability to communicate with and get valuable insights from ChatGPT in real time. By sending API requests and handling responses, you can harness the power of GPT to answer questions, generate content, or solve problems in a dynamic and customizable way. In the following steps, we’ll demonstrate how to construct API requests, send them, and process the responses, enabling you to effectively integrate ChatGPT into your projects or applications: Start by importing the required modules: import openai from openai import OpenAI import os Set up your API key by retrieving it from an environment variable, as we did in the Setting the OpenAI API key as an Environment Variable recipe: openai.api_key = os.getenv("OPENAI_API_KEY") Define a function to send a prompt to the OpenAI API and receive a response:client = OpenAI() def get_chat_gpt_response(prompt):  response = client.chat.completions.create(    model="gpt-3.5-turbo",    messages=[{"role": "user", "content": prompt}],    max_tokens=2048,    temperature=0.7  )  return response.choices[0].message.content.strip() Call the function with a prompt to send a request and receive a response:prompt = "Explain the difference between symmetric and asymmetric encryption." response_text = get_chat_gpt_response(prompt) print(response_text) How it works… First, we import the required modules. The openai module is the OpenAI API library, and the os module helps us retrieve the API key from an environment variable. We set up the API key by retrieving it from an environment variable using the os module. Next, we define a function called get_chat_gpt_response() that takes a single argument: the prompt. This function sends a request to the OpenAI API using the openai.Completion.create() method. This method has several parameters: engine: Here, we specify the engine (in this case, chat-3.5-turbo). prompt: The input text for the model to generate a response. max_tokens: The maximum number of tokens in the generated response. A token can be as short as one character or as long as one word. n: The number of generated responses you want to receive from the model. In this case, we’ve set it to 1 to receive a single response. stop: A sequence of tokens that, if encountered by the model, will stop the generation process. This can be useful for limiting the response’s length or stopping at specific points, such as the end of a sentence or paragraph. temperature: A value that controls the randomness of the generated response. A higher temperature (for example, 1.0) will result in more random responses, while a lower temperature (for example, 0.1) will make the responses more focused and deterministic. Discover more insights from ChatGPT for Cybersecurity Cookbook - By Clint Bodungen. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!Read Here💡 What's the Latest Scoop from the BI Community?  🧠 Geospatial Data Analysis with Geemap: This article introduces geospatial data analysis, focusing on raster data from Google Earth Engine, accessed and analyzed using the Geemap Python library. Earth Engine offers a vast catalog of geospatial datasets, and Geemap simplifies access and analysis, making it easier to work with such data in Python. 🧠 Microsoft Fabric Table Maintenance - Checkpoint and Statistics: This article discusses the maintenance requirements for warehouse tables in Microsoft Fabric, particularly focusing on tasks like updating statistics, removing fragmentation, and managing log files. While some maintenance tasks, such as data compaction and log file checkpointing, are automated, others, like managing statistics, may require manual intervention. 🧠 Identifying Customer Buying Pattern in Power BI - Part 1: This article is part 1 of a retail analytics analysis in Power BI, focusing on customer purchasing frequency for various products over the years. It includes identifying data elements, creating calculated columns, and analyzing trends to aid in business decision-making. 🧠 Full vs. Incremental Loads – Data Engineering with Fabric: This article discusses using Apache Spark in Microsoft Fabric to achieve data quality zones (bronze and silver) in a data lake. It explores loading weather data, transforming it with Spark SQL and DataFrames, and implementing full and incremental load patterns. 🧠 Joining Queries in Azure Data Factory on Cosmos DB Sources: This article provides a detailed guide on joining two queries in Azure Data Factory (ADF). It covers prerequisites, creation of data sources, defining queries for each dataset, and using the "Join" transformation in ADF to merge data. Different join types such as inner, left outer, right outer, and full outer joins are explained. 🧠 Feature Engineering with Microsoft Fabric and Dataflow Gen2: This article introduces Dataflow Gen2 as a low-code data transformation and integration engine for creating data pipelines in Microsoft Fabric. It focuses on using Dataflow Gen2 to create features needed for training a machine learning model with college basketball game data, offering different approaches from no code to all code. See you next time!
Read more
  • 0
  • 0
  • 569

article-image-elevate-your-llm-mastery
Merlyn Shelley
18 Apr 2024
13 min read
Save for later

Elevate Your LLM Mastery

Merlyn Shelley
18 Apr 2024
13 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,🚀 Welcome to DataPro Newsletter #84!  Dive into the dynamic world of data science and AI, where breakthroughs and trends shape our future.   🔍 Highlights:  Google's Genie   Meta AI's Priority Sampling   DeepMind's Hawk and Griffin   CMU's OmniACT   Qualcomm's GPTVQ   Azure PyRIT   Microsoft's ChunkAttention   ✨ Data Community Blogs:  ML Workflow with Scikit-learn Pipelines   Text Embeddings   AI System Design   Mixture of Thought LLM Cascades   GNN with Pytorch Implementation  Vertex AI MLOps Platform   🏭 Industry Updates:  Anthropic’s Claude 3 Sonnet in Amazon Bedrock    Anthropic’s Claude 3 models in Vertex AI    Microsoft’s Orca-Math   Table Meets LLM  OpenAI and Elon Musk   📚 New in Packt Library:  "Building AI Applications with ChatGPT APIs" by Martin Yanev   DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, Packt Sign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ VAST-AI-Research/TripoSR: TripoSR, developed by Tripo AI and Stability AI, is an open-source model for fast 3D reconstruction from a single image. It outperforms others in speed and quality, generating 3D models in under 0.5 seconds on NVIDIA A100 GPUs. 🛠️ facebookresearch/ViewDiff: ViewDiff creates consistent, high-quality images of 3D objects in real-world settings from multiple angles. 🛠️ YubiaoYue/MedMamba: MedMamba, inspired by visual state space models, sets a new baseline for medical image classification, excelling across diverse datasets. 🛠️ BAAI-Agents/Cradle: Cradle framework pioneers General Computer Control, enhancing agent capabilities for any task through reasoning and self-improvement. 📚 Expert Insights from Packt CommunityBuilding AI Applications with ChatGPT APIs - By Martin Yanev Setting Up the Code Bug Fixer Project Open PyCharm: Double-click on the PyCharm icon on your desktop or search for it in your applications folder to open it. On the PyCharm welcome screen, click on Create New Project or go to File | New Project. Choose the directory where you want to save your project. You can either create a new directory or select an existing one. Select the Python interpreter: Choose the version of Python you want to use for your project. Configure project settings: Give your project the name CodeBugFixer, and choose a project location. Once you’ve configured all the settings, click Create to create your new PyCharm project. After creating a new PyCharm project, the next step is to create the necessary files and folders for the CodeBugFixer project. Firstly, create two new Python files, called app.py and config.py, in the root directory of the project. The app.py file is where the main code for the CodeBugFixer app will be written, and the config.py file will contain any sensitive information such as API keys and passwords. Next, create a new folder called templates in the root directory of the project. This folder will contain the HTML templates that the Flask app will render. Inside the templates folder, create a new file called index.html. This file will contain the HTML code for the home page of the CodeBugFixer app. The project structure should look like the following: CodeBugFixer/ ├── config.py ├── app.py ├── templates/ │   └── index.html By following these steps, you have created the necessary files and folders for your CodeBugFixer project in your PyCharm project. You can now start writing the code for your Flask app in the app.py file and the HTML code in the index.html file. Once you have the correct interpreter, you can open the terminal within PyCharm by going to View | Tool Windows | Terminal. Check your terminal and ensure that you can see the (venv) indicator to confirm that you are working within your virtual environment. This is an essential step to prevent conflicting package installations between projects and guarantee that you are using the correct set of dependencies. In the terminal window, you can install any necessary libraries as follows: (venv)$ pip install flask (venv)$ pip install openai Finally, in order to establish the foundation for utilizing the ChatGPT API in your CodeBugFixer app, you’ll need to add the following code to config.py and app.py: config.py API_KEY = <Your API Key> app.py from flask import Flask, request, render_template import openai import config app = Flask(__name__) # API Token openai.api_key = config.API_KEY @app.route("/") def index():     return render_template("index.html") if __name__ == "__main__":     app.run() The config.py file will securely hold your OpenAI API key. Make sure to replace <Your API Key> with the actual API key that you obtained from OpenAI. Discover more insights from 'Building AI Applications with ChatGPT APIs' by Martin Yanev. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here!Message from our Partners!👉 Octane AI Insights Analyst: Explore how Octane AI is revolutionizing ecommerce. Over 3,000 Shopify merchants have harnessed AI Quiz Funnels and Insights, generating over $500 million in revenue. It's more than growth; it's understanding and engaging customers on a new level. Join the community and see the difference.  👉 Cognism: Transform your sales strategy with Cognism. Experience a 3x boost in connect rate, gain access to verified B2B contacts, and enjoy seamless integration with your CRM tools. Expand globally with our comprehensive data coverage. Streamline your outreach for better conversions. 👉 Freshdesk: Revolutionize your customer service with Freshworks Smart Suite's focus on analytics. Unlock actionable insights, anticipate needs, and streamline support through AI-driven dashboard. Empower your team with the tools to excel in efficiency and personalization. Start with a free trial and transform your service today! 👉 Murf AI: Enhance your projects with Murf's AI-powered voices, offering a range of realistic options for any use case. From corporate presentations to entertainment, find the perfect voice in over 20 languages. With Murf Studio, seamlessly integrate voice with your videos, music, or images, bringing your creative vision to life. Start your free trial and experience the difference. Thanks for reading Packt DataPro! Subscribe for free to receive new posts and support my work.⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Anthropic’s Claude 3 Sonnet foundation model is now available in Amazon Bedrock: Amazon announced a collaboration with Anthropic to accelerate the development of Claude foundation models, making them accessible to AWS customers. Recently, Claude 3 was introduced, offering three models with varying levels of intelligence, speed, and cost. Claude 3 Sonnet is now available in Amazon Bedrock, providing faster speeds, increased steerability, and image-to-text vision capabilities. Mastering ML with Google 🌀 Announcing Anthropic’s Claude 3 models in Google Cloud Vertex AI: Google Cloud is enhancing customer choice and innovation in Vertex AI with the addition of Anthropic's Claude 3, a new family of state-of-the-art AI models. These models, optimized for various enterprise applications, include the highly capable Claude 3 Opus, the balanced Claude 3 Sonnet, and the fast, compact Claude 3 Haiku. Customers can soon access all three models via API in Vertex AI Model Garden, starting with private preview access to Claude 3 Sonnet. The Claude 3 models offer improved reasoning, content creation, language fluency, and vision capabilities, enabling customers to focus on applications while benefiting from flexible scaling, cost optimization, and Google Cloud's security and compliance. Microsoft Research Insights🌀 Orca-Math: Demonstrating the potential of SLMs with model specialization. The study on Orca and Orca 2 demonstrated how improved training methods can enhance the reasoning abilities of smaller language models, bringing them closer to larger models. Orca-Math, a 7 billion parameter model, specializes in solving math problems and outperforms larger models in this area. The research highlights the value of smaller models in specialized tasks and the potential of continual learning. The dataset and training procedure are available for further research. 🌀 Table Meets LLM: Improving LLM understanding of structured data and exploring advanced prompting methods: This paper explores how large language models (LLMs) understand structured table data. It investigates effective prompts, inherent structured data detection, leveraging existing knowledge, and trade-offs among input designs for better understanding and utilization of table-based data in LLMs. OpenAI Updates 🌀 OpenAI and Elon Musk: In a recent blog post, OpenAI shared its mission to ensure AGI benefits all of humanity, emphasizing the need for substantial resources. The post recounts disagreements with Elon Musk over funding and control, leading to his departure. OpenAI highlights its efforts to create widely available beneficial tools, such as GPT-4, and addresses ongoing legal disputes with Musk while reaffirming its commitment to its mission. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🧞 Google’s Genie: Generative Interactive Environments. Genie introduces a new generative AI paradigm for creating interactive, playable environments from a single image prompt. It can generate virtual worlds from unseen images, including real-world photos or sketches. Trained on a large dataset of Internet videos without action labels, Genie learns fine-grained controls, identifying controllable parts of an observation and inferring consistent latent actions across different environments.  🌀 Meta AI's Priority Sampling: Revolutionizing Machine Learning with Deterministic Code Generation. This research introduces Priority Sampling, a deterministic sampling technique for large language models that generates unique and confident code samples. It aims to improve code generation and optimization by providing a more structured and controllable exploration process, outperforming traditional sampling methods and enhancing model performance. 🌀 Google DeepMind Launches Hawk and Griffin: Efficient Language Models with Advanced Attention Mechanisms. This paper introduces Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model combining gated linear recurrences and local attention. Hawk outperforms Mamba on downstream tasks, while Griffin matches Llama-2's performance with significantly less training data. Both models are hardware-efficient, with Griffin showing exceptional scalability and the ability to extrapolate on long sequences. The study also details efficient distributed training for large-scale models. 🌀 CMU Unveils OmniACT: Groundbreaking AI Dataset for Measuring Program Execution Skills. OmniACT is a new dataset and benchmark designed to test if virtual agents can automate computer tasks by creating executable scripts. Initial tests show a significant gap between agent and human performance, highlighting the challenge and encouraging advancements in multimodal AI models. 🌀 Qualcomm's GPTVQ: Speeding Up Large AI Networks with Vector Quantization. GPTVQ is a new fast method for post-training vector quantization of Large Language Models (LLMs), improving size vs. accuracy trade-offs. It uses column-wise quantization and updates with Hessian information, efficient codebook initialization, and further compression techniques. GPTVQ sets new standards in LLM quantization efficiency and latency, even on mobile CPUs.   🌀 Azure PyRIT: Elevating ML Engineers with Python's Generative AI Risk Tool. PyRIT, a Python Risk Identification Tool for generative AI, automates AI Red Teaming tasks to assess the security of Language Model (LLM) endpoints. It employs proactive methods, categorizes risks, and offers detailed metrics, enabling researchers to mitigate potential risks in LLM deployment effectively. 🌀 Microsoft Introduces ChunkAttention: Accelerating Self-Attention for LLMs! This research introduces ChunkAttention, a novel self-attention module for large language models (LLMs) that optimizes compute and memory operations by detecting shared prefixes in LLM requests. It breaks key/value tensors into chunks and uses a prefix tree to share them, speeding up the self-attention kernel by 3.2-4.8×. ✨ On the Radar: Catch Up on What's Fresh🌀 Streamline Your Machine Learning Workflow with Scikit-learn Pipelines: This blog explores the benefits of using Scikit-learn pipelines for simplifying machine learning workflows. It covers how pipelines can streamline preprocessing, modeling, hyperparameter tuning, and workflow organization, making code more efficient and maintaining consistency in data preprocessing. 🌀 Do text embeddings perfectly encode text? The rapid advancement of generative AI has led to the widespread adoption of Retrieval Augmented Generation (RAG) systems, where AI retrieves relevant documents from a database to generate responses. This has given rise to vector databases, designed to store and search through embeddings, vector representations of documents. The paper "Text Embeddings Reveal as Much as Text" explores the security of embedding vectors, questioning whether they can be inverted back to text, posing challenges for privacy and information security. 🌀 End to End AI Use Case-Driven System Design: This blog explores the complexities of AI system performance beyond TOPs (Tera Operations Per Second), focusing on real AI use cases. It dives into optimizing an AI system for an infinite zoom feature, emphasizing power efficiency through model and memory optimizations, dynamic power scaling, and specialized hardware accelerators. 🌀 Navigating Cost-Complexity: Mixture of Thought LLM Cascades Illuminate a Path to Efficient Large Language Model Deployment: This post discusses how to significantly reduce costs while maintaining accuracy in utilizing Large Language Models (LLMs), crucial for various applications. It introduces a novel approach called Mixture of Thought (MoT) Cascades, employing a blend of weaker and stronger LLMs, along with innovative prompting techniques and consistency measurements.🌀 Structure and Relationships: Graph Neural Networks and a Pytorch Implementation. This article introduces Graph Neural Networks (GNNs), a powerful method for modeling spatial and graphical structures in data, such as molecular structures, social networks, and city designs. It covers the mathematical description of GNNs, including graph convolution networks (GCNs) and graph attention networks (GATs), and provides a regression example using the PyTorch library. The article aims to make GNNs more accessible by explaining their principles and demonstrating their potential applications. 🌀 Extensible and Customisable Vertex AI MLOps Platform: The article describes the development of an MLOps platform for scalable machine learning models on Vertex AI using Kubeflow pipelines. It aims to provide a modular, flexible, and integrated solution for building operationalized ML models, serving as an educational resource and foundation for teams. The platform addresses common challenges and emphasizes testing, configuration, and CI/CD orchestration. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support! 
Read more
  • 0
  • 0
  • 335

article-image-llmops-in-action
Mostafa Ibrahim
16 Apr 2024
6 min read
Save for later

LLMOps in Action

Mostafa Ibrahim
16 Apr 2024
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn an era dominated by the rise of artificial intelligence, the power and promise of Large Language Models (LLMs) stand distinct. These colossal architectures, designed to understand and generate human-like text, have revolutionized the realm of natural language processing. However, with great power comes great responsibility – the onus of managing, deploying, and refining these models in real-world scenarios. This article delves into the world of Large Language Model Operations (LLMOps), an emerging field that bridges the gap between the potential of LLMs and their practical application.BackgroundThe last decade has seen a significant evolution in language models, with models growing in size and capability. Starting with smaller models like Word2Vec and LSTM, we've advanced to behemoths like GPT-3, BERT, and T5.  With that said, as these models grew in size and complexity, so did their operational challenges. Deploying, maintaining, and updating these models requires substantial computational resources, expertise, and effective management strategies.MLOps vs LLMOpsIf you've ventured into the realm of machine learning, you've undoubtedly come across the term MLOps. MLOps, or Machine Learning Operations, encapsulates best practices and methodologies for deploying and maintaining machine learning models throughout their lifecycle. It caters to the wide spectrum of models that fall under the machine learning umbrella.On the other hand, with the growth of vast and intricate language models, a more specialized operational domain has emerged: LLMOps. While both MLOps and LLMOps share foundational principles, the latter specifically zeros in on the challenges and nuances of deploying and managing large-scale language models. Given the colossal size, data-intensive nature, and unique architecture of these models, LLMOps brings to the fore bespoke strategies and solutions that are fine-tuned to ensure the efficiency, efficacy, and sustainability of such linguistic powerhouses in real-world scenarios.Core Concepts of LLMOpsLarge Language Models Operations (LLMOps) focuses on the management, deployment, and optimization of large language models (LLMs). One of its foundational concepts is model deployment, emphasizing scalability to handle varied loads, reducing latency for real-time responses, and maintaining version control. As these LLMs demand significant computational resources, efficient resource management becomes pivotal. This includes the use of optimized hardware like GPUs and TPUs, effective memory optimization strategies, and techniques to manage computational costs.Continuous learning and updating, another core concept, revolve around fine-tuning models with new data, avoiding the pitfall of 'catastrophic forgetting', and effectively managing data streams for updates. Parallelly, LLMOps emphasizes the importance of continuous monitoring for performance, bias, fairness, and iterative feedback loops for model improvement. To cater to the vastness of LLMs, model compression techniques like pruning, quantization, and knowledge distillation become crucial.How do LLMOps workPre-training Model DevelopmentLarge Language Models typically start their journey through a process known as pre-training. This involves training the model on vast amounts of text data. The objective during this phase is to capture a broad understanding of language, learning from billions of sentences and paragraphs. This foundational knowledge helps the model grasp grammar, vocabulary, factual information, and even some level of reasoning.This massive-scale training is what makes them "large" and gives them a broad understanding of language. Optimization & CompressionModels trained to this extent are often so large that they become impractical for daily tasks.To make these models more manageable without compromising much on performance, techniques like model pruning, quantization, and knowledge distillation are employed.Model Pruning: After training, pruning is typically the first optimization step. This begins with trimming model weights and may advance to more intensive methods like neuron or channel pruning.Quantization: Following pruning, the model's weights, and potentially its activations, are streamlined. Though weight quantization is generally a post-training process, for deeper reductions, such as very low-bit quantization, one might adopt quantization-aware training from the beginning.Additional recommendations are:Optimizing the model specifically for the intended hardware can elevate its performance. Before initiating training, selecting inherently efficient architectures with fewer parameters is beneficial. Approaches that adopt parameter sharing or tensor factorization prove advantageous. For those planning to train a new model or fine-tune an existing one with an emphasis on sparsity, starting with sparse training is a prudent approach.Deployment Infrastructure After training and compressing our LLM, we will be using technologies like Docker and Kubernetes to deploy models scalably and consistently. This approach allows us to flexibly scale using as many pods as needed. Concluding the deployment process, we'll implement edge deployment strategies. This positions our models nearer to the end devices, proving crucial for applications that demand real-time responses.Continuous Monitoring & FeedbackThe process starts with the Active model in production. As it interacts with users and as language evolves, it can become less accurate, leading to the phase where the Model becomes stale as time passes.To address this, feedback and interactions from users are captured, forming a vast range of new data. Using this data, adjustments are made, resulting in a New fine-tuned model.As user interactions continue and the language landscape shifts, the current model is replaced with the new model. This iterative cycle of deployment, feedback, refinement, and replacement ensures the model always stays relevant and effective.Importance and Benefits of LLMOpsMuch like the operational paradigms of AIOps and MLOps, LLMOps brings a wealth of benefits to the table when managing Large Language Models.MaintenanceAs LLMs are computationally intensive. LLMOps streamlines their deployment, ensuring they run smoothly and responsively in real-time applications. This involves optimizing infrastructure, managing resources effectively, and ensuring that models can handle a wide variety of queries without hiccups.Consider the significant investment of effort, time, and resources required to maintain Large Language Models like Chat GPT, especially given its vast user base.Continuous ImprovementLLMOps emphasizes continuous learning, allowing LLMs to be updated with fresh data. This ensures that models remain relevant, accurate, and effective, adapting to the evolving nature of language and user needs.Building on the foundation of GPT-3, the newer GPT-4 model brings enhanced capabilities. Furthermore, while ChatGPT was previously trained on data up to 2021, it has now been updated to encompass information through 2022.It's important to recognize that constructing and sustaining large language models is an intricate endeavor, necessitating meticulous attention and planning.ConclusionThe ascent of Large Language Models marks a transformative phase in the evolution of machine learning. But it's not just about building them; it's about harnessing their power efficiently, ethically, and sustainably. LLMOps emerge as the linchpin, ensuring that these models not only serve their purpose but also evolve with the ever-changing dynamics of language and user needs. As we continue to innovate, the principles of LLMOps will undoubtedly play a pivotal role in shaping the future of language models and their place in our digital world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 412
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-ai-for-investment
Louis Owen
12 Apr 2024
12 min read
Save for later

AI for Investment

Louis Owen
12 Apr 2024
12 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionOne of the most important activities for an investor is to always keep up to date with the latest and relevant news. Usually, it’s done by reading at least a dozen news articles starting from macroeconomic issues, political issues, news related to the sector of the corresponding stock, analyst reports, and whatnot. This, of course, takes a lot of time and also sometimes can be overwhelming for new investors since the amount of information to be processed is too much.Many ML developers have tried to solve this issue by building a traditional ML workflow usually called the sentiment analyzer. This system will take text from the news as the input and return the sentiment score as the output. This is no doubt helpful for the investor, but it doesn’t solve the bigger problem which is the need to curate relevant articles and also knowing what’s the impact of each news toward their investment decision. In other words, it’s lacking of broader insight. What if there’s an AI assistant that can act as our personal investment news analyst? What if there’s an AI assistant that is able to analyze dozens of news articles and generate the insights summary along with the investment recommendation? And, what if I told you that this AI assistant is personalized toward your risk appetite and investment portfolio allocation? In this article, I’ll guide you on how to build an AI assistant that can do all the above-mentioned things with only a few lines of code - thanks to GPT4! We’ll discuss several ways to get the news data in bulk and in real-time. We’ll discuss what are the important search keywords we need to use to get relevant news data. We’ll also discuss how to construct the prompt to fulfill all of the above-mentioned criteria while also getting a great generated output. Finally, we’ll see how to put all of this together to build our AI assistant!Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build your personal AI investment news analyst!News Data SourcesGetting as much news data as possible is important since we don’t want to miss any important information out there. Once we get all the information, we just need to filter them out with the help of our AI assistant.SerpAPI is one of the best all-in-one scraping tools that we can utilize to get news data from Google, Yahoo, Bing, DuckDuckGo, and many other search engines. It also provides a free plan with a 100 searches/month limit. However, this limit is surely not enough for our use case. If you don’t mind spending some money and want to get multiple search results from different search engines, then this tool is suitable for you.Another solution that is more budget-friendly is by utilizing DuckDuckGo search engine API directly. DuckDuckGo is a search engine that offers data privacy as their main unique selling point. No search history will be stored. Moreover, they also open their search engine API for free. We will use DuckDuckGo in this article and learn how to utilize it via Python!The more effective way to widen our search results is actually not by using different search engines but by having a diverse yet mutually exclusive set of search keywords. The goal of our AI investment assistant is to summarize the important insights that are relevant to a particular stock that we’re interested in. Hence, we need to provide relevant news data to be able to achieve our goal.The following are some of the search keywords that we can use. Note that this list is not exhaustive, you can surely expand the search keywords based on your own needs. We’ll use AAPL as the ticker example. You can change it to any ticker you want.$AAPL stock $AAPL industry and competitors $AAPL business model and strategy $AAPL management and leadershipBesides ticker-specific search keywords, we can also search for more general information that is not ticker-specific. The following is an example list of such keywords.economic growth this yearmonetary and fiscal policies todaypolitic todayeconomic todayinflation rate todayinterest rate todayreal estate todayDuckDuckGo APIOnce we have the keywords list, we can easily get the news data using DuckDuckGo via Python. First, we need to install the duckduckgo package by running the following command. pip install duckduckgo-searchOnce it is installed we can create the general Python function that can take the search keyword as the input and return the search results.from duckduckgo_search import DDGS import json ddgs = DDGS() def web_search(query: str, num_results: int = 4,debug=True) -> str:    """Useful for general internet search queries."""    if debug:        print("Searching with query {0}...".format(query))    search_results = []    if not query:        return json.dumps(search_results)    results = ddgs.text(query)    if not results:        return json.dumps(search_results)    total_added = 0    for j in results:        search_results.append(j.get('body',''))        total_added += 1        if total_added >= num_results:            break    return search_resultsUsing this function is very simple. We just need to pass the search keyword along with the number of search results to this function and get the list of search results.apple_competitors_news = web_search(“$AAPL industry and competitors”, num_results = 10)Prompt EngineeringThe next important thing to do is to build our AI assistant. Here, we’ll utilize GPT4 to build our assistant. Since it’s an LLM, we just need to provide the prompt without the need to train it from scratch. However, creating the prompt itself is indeed not an easy task. I have published another article regarding prompt engineering if you’re interested to learn more about it.Remember that the goal of our assistant is to analyze the provided news data dump and return the summary insights along with the recommendation as the output. However, to be able to give a recommendation, our assistant needs to know our risk appetite along with our portfolio condition. The following is an example of the system prompt that we can give to GPT4.system_prompt = “””You are an expert in giving recommendation to BUY / SELL / HOLD for {} ({}). You can only return in JSON format with 5 fields: "Investment Thesis" (dictionary of string. Consist of elaborated decision reasoning (in bullet points) based on the risk profile of the investor, unrealized profit, and all of the factors as the basis of your recommendation. Provide numbers to justify your assertions, a lot ideally. The deeper the analysis the better.), "Investor Profiling" (dictionary of string. Connect the investment thesis with each of the investor profiles, including risk profile and unrealized profit.) "Summary Thesis" (string. Summary of your all investment thesis as the basis of the given recommendation.  You have to take into account all factors in the investment thesis as well as the investor profiles.), "recommendation" ("BUY"/"SELL"/"HOLD") In the investment thesis, please cover the following factors. If a particular factor needed to write the investment thesis does not exist, don't try to make up the answer, just write "The information needed is unavailable". (1) Industry and Competitive Analysis: Assess the company's position within its industry and analyze industry trends, competition, barriers to entry, and market dynamics. (2) News and Events: Stay updated on relevant news, earnings announcements, product launches, regulatory changes, and other events that can impact the company or the overall market. (3) Market and Economic Conditions: Assess broader macroeconomic factors from news, including economic growth, interest rates, inflation, monetary and fiscal policies, geopolitical events, gold price, bond price, index price, real estate.”””And here’s an example of the user prompt that consists of all necessary data points. Risk profiles can be “Moderate”, “Aggresive”, or “Conservative”. user_prompt = “””<INVESTOR PROFILE> Risk Profile: {} Unrealized Profit: {}% {}”””Putting All TogetherNow, we just need to create the main function that will act as our personal AI investment assistant. def personal_investment_assistant(company_name:str, ticker:str, risk_profile: str,  unrealized_profit_perc: float):    news_data = []    for search_keyword in search_kwrds_lst:          news_data.extend(web_search(search_keyword))    news_data = "\n".join(news_data)            messages = [                        {                            "role": "system",                            "content": system_prompt.format(company_name,ticker)                        },                        {                            "role": "user",                            "content": user_prompt.format(risk_profile,unrealized_profit_perc,news_data)                }            ]    response = get_gpt_response("gpt-4",                                temperature = 0.0,                                messages = messages                                                )    return response["choices"][0]["message"]["content"].strip() import requests import json import os def get_gpt_response(model: str,temperature: float,messages: list): headers = {                       'content-type': "application/json",                       'Authorization': "Bearer " + os.environ["OPENAI_API_KEY"]                       } endpoint = 'https://api.openai.com/v1/chat/completions'           data = json.dumps({                                   "model": model, "messages": messages,                                   "temperature": temperature,                                   })             try: data = requests.post(endpoint, data=data, headers=headers)                       openai_response = json.loads(data.text)                       return openai_response           except Exception as e:                       print(e)                       return ""ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned how to build your own personal AI investment analyst based on news data. You have learned how to get the news data, a list of useful search keywords, also the code implementation to build the AI assistant. Hope the best for your investment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects. Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.
Read more
  • 0
  • 0
  • 191

article-image-apples-realm-google-deepminds-gecko-xais-grok-15-salesforce-ais-moira-stability-ais-stable-audio-20-twin-gpt-chatgpt-instant-usage
Merlyn Shelley
08 Apr 2024
12 min read
Save for later

Apple’s ReALM, Google DeepMind’s Gecko, X.ai's Grok 1.5, Salesforce AI’s Moira, Stability AI’s Stable Audio 2.0, TWIN-GPT, ChatGPT Instant usage

Merlyn Shelley
08 Apr 2024
12 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#88 – Your portal to the innovations in Data Science & Machine Learning! 🚀 In this edition, you'll find: ⚙️ LLMs & GPTs Unleashed TWIN-GPT: Digital Twins for Clinical Trials. Apple’s ReALM: AI with contextual understanding. Stability AI’s Stable Audio 2.0: Audio synthesis revolution. Salesforce AI’s Moira: Enhancing customer engagement. Google DeepMind’s Gecko: Versatile Text Embeddings. X.ai's Grok 1.5: Enhanced reasoning and context. ✨ What's Fresh & Exciting Distribute LLMs with llamafile: 5 Simple Steps. Dockerized Python Environment: The Elegant Way. Knowledge Distillation: Clone Powerful LLMs. Sora’s Diffusion Transformer (DiT): A Deep Dive. Generative AI: Copyright Reckoning. OpenAI Agent: Function Calling Capabilities. ⚡ Industry Pulse AWS & Mistral AI: Democratizing generative AI. Amazon SageMaker: No-code to code-first ML. Google Cloud Next: Database success stories. Google’s SEEDS in Weather Forecasting: AI quantifies uncertainty. Microsoft’s LLMs in the Imaginarium: Tool Learning. OpenAI: Fine-tuning API and custom models. ChatGPT: Instant usage. Synthetic Voices: Challenges and Opportunities. 📚 Packt's Latest Gem MATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ UpstageAI/dataverse: Dataverse simplifies ETL pipelines in Python, providing a user-friendly solution for data processing and management, accessible to all. 🛠️ GAP-LAB-CUHK-SZ/gaustudio: GauStudio is a modular framework for 3D Gaussian Splatting, providing streamlined pipelines and tools for easier implementation and deployment. 🛠️ TencentARC/BrushNet: BrushNet is a text-guided image inpainting model that enhances pre-trained diffusion models, focusing on divided features and dense control. 🛠️ agiresearch/AIOS: AIOS embeds LLMs into OS, enhancing resource allocation, context switch, concurrent execution, tool service, access control, and toolkit availability for developers. 🛠️ jasonppy/VoiceCraft: VoiceCraft excels in speech editing and zero-shot text-to-speech, requiring only a few seconds of reference to clone or edit voices. 📚 Expert Insights from Packt CommunityMATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro.Anomaly Detection in MATLAB Throughout the life cycle of a physical system, the occurrence of failures or malfunctions poses a potential threat to its normal functioning. To safeguard against critical interruptions, it becomes imperative to implement an anomaly detection system within the facility. Termed as a fault diagnosis system, this mechanism is designed to identify potential malfunctions within the monitored system. The pursuit of fault detection stands as a pivotal and defining phase in maintenance interventions, demanding a systematic and deterministic approach to comprehensively analyze all conceivable causes that might have led to the malfunction. Anomaly detection overview Anomaly detection is a technique used in data analysis and ML to identify data points or patterns that deviate significantly from the expected or normal behavior within a dataset. Anomalies, also known as outliers, are data points that do not conform to most of the data and may indicate errors, fraud, unusual events, or other important information. Anomaly detection has various applications across different domains, such as cybersecurity, industrial quality control (QC), finance, healthcare, and more. We can start to get an overview of different types of anomalies to understand what is intended with this term, we will list some types of anomalies: Point anomalies: These are individual data points that are considered anomalies, such as a single fraudulent transaction in a credit card dataset. Contextual anomalies: These are anomalies that are context-dependent. A data point might not be an anomaly on its own but is unusual in a particular context or time, such as a sudden spike in web traffic during a holiday sale. Collective anomalies: These are anomalies that are identified by examining a group of data points collectively. These anomalies involve patterns or relationships between data points. There are several methods for addressing anomaly detection problems, ranging from simple statistical techniques to complex ML algorithms. The choice of method depends on the nature of the data and the specific problem you are trying to solve. Here, we are listing the most used ones: Statistical methods: Statistical techniques such as z-scores, percentiles, and boxplots can be used to identify anomalies based on deviations from the mean or median of the data distribution. ML: Supervised, unsupervised, and semi-supervised ML algorithms can be used for anomaly detection. Some popular methods include Isolation Forest, One-Class Support Vector Machine (One-Class SVM), autoencoders (AEs), and k-means clustering. Time series analysis: Specialized techniques are used for detecting anomalies in time series data, such as autoregressive (AR) models, exponential smoothing, and moving averages (MAs). Density estimation: Methods such as kernel density estimation (KDE) and Gaussian Mixture Models (GMMs) are used to estimate the probability density function of the data and identify anomalies as low-density regions. Deep learning (DL): Neural networks (NNs), especially deep AEs (DAEs) and recurrent NNs (RNNs), are used for anomaly detection in high-dimensional data or sequences. Ensemble methods: Combining multiple anomaly detection models can improve overall performance and robustness. In addressing anomaly detection problems, we have to face some challenges. For example, determining an appropriate threshold for defining anomalies can be challenging. Imbalanced datasets, where anomalies are rare, can make model training and evaluation tricky. Handling high-dimensional data and noisy datasets can also be challenging. Anomaly detection is a valuable tool for identifying rare but potentially important events or patterns in large datasets. The choice of method depends on the specific domain, data characteristics, and the nature of anomalies that need to be detected. Discover more insights from "MATLAB for Machine Learning - Second Edition" by Giuseppe Ciaburro. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 AWS and Mistral AI commit to democratizing generative AI with a strengthened collaboration: The article discusses the growing use of generative AI applications across industries, facilitated by Amazon Bedrock. It highlights Mistral AI's Mistral Large model, now available on Amazon Bedrock, offering advanced language capabilities. This collaboration aims to provide customers with diverse model options to suit their specific business needs, promoting innovation in AI technology. 🌀 Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio: This post discusses Amazon SageMaker Studio, an integrated ML development environment, and SageMaker Canvas, a no-code ML tool, highlighting their features and integration for seamless collaboration between non-ML and ML experts. Google Research 🌀 Get inspired: Database success stories at Google Cloud Next. This blog post previews Google Cloud Next '24, focusing on customers using Google Cloud databases for transformative purposes. It highlights sessions featuring Nuro, Lightricks, Bayer, Yahoo!, and Statsig, showcasing their innovative use cases.🌀 Generative AI to quantify uncertainty in weather forecasting: Google is advancing weather forecasting with innovations like MetNet-3 and SEEDS, a generative AI model. SEEDS efficiently generates probabilistic ensembles, addressing the butterfly effect's uncertainty, and offers cost-effective solutions for extreme weather events. Microsoft Research🌀 LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error. This research enhances large language models' (LLMs) tool usage accuracy through simulated trial and error (STE), inspired by biological systems. STE improves learning by simulating tool use scenarios, interacting with tools, and leveraging short and long-term memory. Results show significant performance boosts over existing methods.OpenAI Updates🌀 Introducing improvements to the fine-tuning API and expanding our custom models program: This update discusses techniques to improve model performance, such as retrieval-augmented generation (RAG) and fine-tuning and introduces new API features for developers to control their fine-tuning jobs, enhancing model quality, reducing costs, and latency. 🌀 Start using ChatGPT instantly: This new initiative aims to make AI more accessible by allowing instant access to ChatGPT without the need to sign up. It targets those curious about AI's potential but hesitant to set up an account, offering a seamless experience for learning, creative inspiration, and answering questions. 🌀 Navigating the Challenges and Opportunities of Synthetic Voices: Voice Engine is a model by OpenAI that generates natural-sounding speech from text input and a short audio sample, closely resembling the original speaker. They're sharing insights from a small-scale preview, highlighting its potential for various applications like reading assistance and personalized responses in education. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 TWIN-GPT: Digital Twins for Clinical Trials via LLM. The research explores virtual clinical trials' benefits in healthcare, emphasizing patient safety and cost reduction. Existing methods struggle with prediction accuracy due to limited data. TWIN-GPT, a proposed approach, uses large language models to create personalized digital twins, improving predictions and showcasing digital twins' potential in healthcare. 🌀 Apple’s ReALM: AI that can “See” to understand the context: ReALM (Reference Resolution As Language Modeling) addresses the challenge of context understanding, including non-conversational entities like on-screen elements. By leveraging Language Models (LLMs), it demonstrates significant improvements in reference resolution, even outperforming GPT-4, offering over 5% gains for on-screen references. 🌀 Stability AI’s Stable Audio 2.0: Stable Audio 2.0 introduces a groundbreaking AI-generated audio standard, offering high-quality, full tracks up to three minutes long at 44.1kHz stereo. It features audio-to-audio generation, honoring creator rights, and expands creative possibilities, available for free on the Stable Audio website. 🌀 Salesforce AI’s Moira: Moirai is a universal time series forecasting model designed to address diverse forecasting tasks across various domains, frequencies, and variables in a zero-shot manner. It tackles key challenges in forecasting and offers robust performance, making it valuable for IT operations, sales forecasting, and more. 🌀 Google DeepMind’s Gecko: Versatile Text Embeddings Distilled from LLMs. Gecko is a compact text embedding model that achieves strong retrieval performance by distilling knowledge from large language models (LLMs). Its two-step distillation process, generating synthetic paired data and refining data quality, outperforms larger models on the Massive Text Embedding Benchmark. Gecko with 256 dimensions outperforms all entries with 768 dimensions; Gecko with 768 dimensions competes with models 7x larger and 5x higher dimensional embeddings. 🌀 X.ai Unveils Grok 1.5: Enhanced Reasoning and Long Context Features. Grok-1.5, the latest version of x.ai's Grok model, offers improved reasoning and long context capabilities. It excels in coding and math tasks, scoring 50.6% on MATH and 90% on GSM8K benchmarks. Grok-1.5 can process long contexts up to 128K tokens and boasts robust infrastructure for large-scale training. Early testers and existing Grok users on the x.ai platform will soon have access to Grok-1.5, with further features expected to roll out gradually. ✨ On the Radar: Catch Up on What's Fresh🌀 Distribute and Run LLMs with llamafile in 5 Simple Steps: This blog introduces llamaFile, a framework that simplifies using large language models (LLMs) by providing a one-file executable that runs locally without installation. It explains how to use llamaFile with the LLaVa model, a 7-billion-parameter model quantized to 4 bits, for tasks like chat, image uploading, and question-answering. 🌀 Setting A Dockerized Python Environment — The Elegant Way. This blog post demonstrates a more elegant method for setting up a dockerized Python development environment using VScode and the Dev Containers extension. It provides step-by-step instructions and prerequisites, including Docker Desktop, a Docker Hub account, and VScode with the Dev Containers extension installed. The tutorial focuses on using the official Python image (`python:3.10`) and explains the Dev Containers extension's role in creating an isolated VScode session inside a docker container. 🌀 Clone the Abilities of Powerful LLMs into Small Local Models Using Knowledge Distillation: This post explores the use of specialized, smaller-scale language models for specific NLP tasks, such as grammatical error correction. It discusses the process of constructing tailored models through data annotation and fine-tuning, and the use of knowledge distillation to automate labeling. The post provides a workflow for distilling knowledge from a large language model to a smaller one, using prompts and APIs, and demonstrates this process in the context of building a grammatical error correction model. 🌀 Deep Dive into Sora’s Diffusion Transformer (DiT) by Hand: This blog introduces Sora, OpenAI's text-to-video model, explaining its unique approach combining diffusion transformer and transformer strength for video prediction. It explores key concepts like diffusion, dimension reduction, and noise addition, offering insights into how Sora converts text prompts into realistic videos. Ideal for AI enthusiasts and those interested in video generation technologies. 🌀 The Coming Copyright Reckoning for Generative AI: This blog explores the complexities of copyright law in America, particularly in the context of generative AI. It discusses key concepts like original works, fair use, and the implications of generative AI on copyright. It also delves into legal cases and future considerations, offering insights for data scientists and AI enthusiasts. 🌀 Create an Agent with OpenAI Function Calling Capabilities: This article explores the advancements and challenges in developing AI-powered applications in 2024. It discusses how AI streamlines app features for a better user experience and introduces OpenAI's Function Calling to simplify structured data extraction. The article also highlights the ongoing innovations and the future of AI applications. See you next time!
Read more
  • 0
  • 0
  • 315

article-image-bi-pro49-microsoft-fabric-lifecycle-management-data-factory-adds-cicd-to-fabric-data-pipelines-database-mirroring-aws-well-architected-data-analytics-lens
Merlyn Shelley
04 Apr 2024
11 min read
Save for later

BI-Pro#49: Microsoft Fabric Lifecycle Management, Data Factory Adds CI/CD to Fabric Data Pipelines, Database Mirroring, AWS Well-Architected Data Analytics Lens

Merlyn Shelley
04 Apr 2024
11 min read
Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to BI-Pro #49, your ultimate guide to data and BI insights! 🚀 ⏩ What's Inside? Python Simplified: Master data validation with Pydantic. Visualize Like a Pro: 30+ tools for stunning data visuals. R for Bioinformatics: Custom visuals for bio data. Interactive Data: JavaScript meets Handsontable. Seaborn Stories: Craft data tales with line plots. MetaGPT Insights: Next-gen data solutions unveiled. 🏭 Industry Scoop: Power BI’s Latest: March's must-know features. Fabric Innovations: Updates and new tools from Microsoft Fabric. AWS Well-Architected Data Analytics Lens: Analytics strategies for the real world. Google Cloud Savings: Cut costs on ETL workflows. Tableau Journeys: From student to BI analyst. 💎 Expert Takes: Deep Dive into Python Deep Learning: The latest from Packt. 👉 Community Buzz: Twitch Chat Analysis, Graph Networks, LLM Data Quality, and Ethical AI: Key conversations this week! Dive into the trends shaping data and BI today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktThanks for reading Packt BI-Pro! Subscribe for free to receive new posts and support our work.Pledge your supportSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos🌀 man-group/ArcticDB: ArcticDB is a high-performance DataFrame database designed for Python Data Science, with a Python-centric API for Pandas DataFrames. 🌀 gradio-app/gradio: Gradio is an open-source Python package for building demos or web apps for ML models or Python functions, with easy sharing via built-in features. 🌀 Sinaptik-AI/pandas-ai: PandasAI is a Python library using generative AI to explore, clean, and analyze data with natural language queries.🌀 OpenRefine/OpenRefine: OpenRefine is a powerful Java-based tool for loading, understanding, cleaning, reconciling, and augmenting data, accessible from a web browser. 🌀 Kanaries/pygwalker: PyGWalker simplifies Jupyter Notebook workflows by converting pandas dataframes into interactive user interfaces for data analysis and visualization. 🌀 cleanlab/cleanlab: cleanlab aids in data and label cleaning by identifying issues in ML datasets automatically, enabling better model training with real-world data.Email Forwarded? Join BI-Pro Here!🔮 Data Viz with Python Libraries  🌀 Pydantic Tutorial: Data Validation in Python Made Simple. This blog tutorial explains how to use Pydantic, a data validation and serialization library in Python, to validate and serialize data classes, offering support for custom validators and Python's type hints for field validation. 🌀 30+ Data Visualization Libraries, Frameworks and Apps, Mastering Data Presentation: Explore over 30 data visualization tools like Metabase, Gephi, and Grafana, offering a range of features to transform raw data into meaningful visualizations for better decision-making in industries like tech, healthcare, finance, and marketing. 🌀 Mastering Data Visualization in R for Bioinformatics:  The article delves into data visualization in R for bioinformatics, stressing its role in understanding complex biological data, communicating findings, hypothesis generation, and decision-making. It also discusses Anscombe's Quartet, highlighting the importance of visualizing data before analysis and the limitations of summary statistics. 🌀 Integrating JavaScript charting libraries with Handsontable: The article guides developers on integrating Highcharts, Recharts, and Chart.js with Handsontable for data visualization. It explains the features of each library and provides demos for creating a stock portfolio with interactive charts. 🌀 Data Visualization with Seaborn Line Plot: The article introduces Seaborn, a Python library for data visualization, built on top of Matplotlib. It covers installation and demonstrates creating single line plots and customizing styles for better presentation of data. 🌀 MetaGPT’s Data Interpreter: SOTA Open Source LLM-based Data Solutions. MetaGPT introduces its Data Interpreter, a new agent for streamlined data interpretation and analysis. The Data Interpreter employs advanced techniques for real-time data adaptability, tool integration, and logical inconsistency identification, showcasing superior performance in machine learning tasks. ⚡Stay Informed with Industry HighlightsPower BI 🌀 Power BI March 2024 Feature Summary: The Power BI update introduces visual calculation editing, data model editing in the Power BI Service, and report subscription delivery to OneDrive SharePoint. A new Microsoft Fabric certification exam, DP-600, is also available, with free certification opportunities through the Fabric AI Skills Challenge. 🌀 Announcing the Public Preview of Database Mirroring in Microsoft Fabric: Mirroring, now in Public Preview, allows seamless integration of databases into Microsoft Fabric's OneLake, providing real-time insights without ETL. It simplifies data replication and warehousing, enabling easy data access and analysis across different sources, including data lakes and warehouses. 🌀 Get data with Power Query available in Power BI Report Builder (Preview): Power BI Report Builder now allows connecting to 100+ data sources like Snowflake, Databricks, and AWS Redshift. You can transform data using M-Query for paginated reports. Install the latest version and connect from the "Data" tab. Microsoft Fabric🌀 Microsoft Fabric March 2024 Update: This update brings new features like OneLake File Explorer, Autotune Query Tuning, and Test Framework for Power Query SDK in VS Code to Power BI, enhancing reporting, modeling, service, mobile, and developer experiences. 🌀 Data Factory Adds CI/CD to Fabric Data Pipelines: Fabric engineers with Azure Synapse Analytics and Azure Data Factory experience can now utilize Git integration and built-in Deployment Pipelines in Data Factory data pipelines in Fabric. This public preview offers source control, CI/CD features, and collaborative development environments, enhancing data analytics projects. 🌀 Microsoft Fabric Lifecycle Management – Getting started with Git Integration and Deployment Pipelines: Microsoft Fabric makes Lifecycle Management easy, enabling continuous releases through Git and Deployment Pipelines. Git allows reliable updates for supported items like Lakehouse, Notebooks, and Reports, while Deployment Pipelines clone content between stages like DEV, TEST, UAT, and PROD. AWS BI  🌀 Announcing the AWS Well-Architected Data Analytics Lens: The Data Analytics Lens helps assess and improve analytics platforms on AWS. It offers best practices, such as building ACID-compliant data lakes and leveraging Serverless for data pipelines, aligned with the AWS Well-Architected Framework's pillars for secure, efficient, and cost-effective solutions. 🌀 Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics. The post discusses how healthcare providers can improve patient care by leveraging AWS services for real-time analytics and personalized healthcare, focusing on a zero-ETL approach to data integration.Google Cloud Dat🌀 Enrich streaming data in Bigtable with Dataflow: The post discusses the importance of event stream processing in data engineering and introduces Apache Beam's Enrichment transform, which simplifies the process of enriching streaming data with Bigtable, improving data context and enabling more meaningful analysis.🌀 Dataflow at-least-once vs. exactly-once streaming modes: The post compares exactly-once and at-least-once processing modes in Dataflow Streaming Engine for streaming jobs. It explains the trade-offs between the two modes and provides guidance on choosing the right mode based on use case requirements. Tableau🌀 Data is both art and science - My Tableau Story: Andy Cotgreave. The post highlights Andy Cotgreave's journey from a data analyst at Oxford to becoming a Senior Technical Evangelist at Tableau. It emphasizes the importance of community engagement, innovation, building a portfolio, and having fun in data visualization. 🌀 Student to BI Analyst, How Tableau Can Lead to a Successful Data Career: This blog discusses Karolina Grodzinska's data visualization journey, from discovering Tableau to winning Iron Viz: Student Edition and becoming a Business Intelligence Analyst at Schneider Electric. Karolina emphasizes the importance of an active Tableau Public profile in career development and shares tips for building a strong portfolio and networking with the Tableau Community. ✨ Expert Insights from Packt CommunityPython Deep Learning - Third Edition - By Ivan VasilevDeveloping NN models for edge devices with TF Lite TF Lite is a TF-derived set of tools that allows us to run models on mobile, embedded, and edge devices. Its versatility is part of TF’s appeal for industrial applications (as opposed to research applications, where PyTorch dominates).The key paradigm of TF Lite is that the models run on-device, contrary to client-server architecture, where the model is deployed on remote, more powerful, hardware. This organization has the following implications (both good and bad): Low-latency execution: The lack of server-round trip significantly reduces the model inference time and allows us to run real-time applications. Privacy: The user data never leaves the device. Internet connectivity: Internet connectivity is not required. Small model size: The devices have limited computational ability, hence the need for small and computationally efficient models. More specifically, TF Lite models are stored in the FlatBuffers (https://flatbuffers.dev/) special efficient portable format, identified by the .tflite file extension. Besides its small size, it allows us to access data directly without parsing/unpacking it first. TF Lite models support a subset of the TF Core operations and allow us to define custom ones: Low power consumption: The devices often run on battery. Divergent training and inference: NN training is a lot more computationally intensive compared to inference. Because of this, the model training runs on a different, more powerful, piece of hardware than the actual devices, where the models will run inference. In addition, TF Lite has the following key features: Multi-platform and multi-language support, including Android (Java), iOS (Objective-C and Swift) devices, web (JavaScript), and Python for all other environments. Google provides a TF Lite wrapper API called MediaPipe Solutions (https://developers.google.com/mediapipe, https://github.com/google/mediapipe/), which supersedes the previous TF Lite API. Optimized for performance. It has end-to-end solution pipelines. TF Lite is oriented toward practical applications, rather than research. Because of this, it includes different pipelines for common ML tasks such as image classification, object detection, text classification, and question answering among others. The computer vision pipelines use modified versions of EfficientNet or MobileNet, and the natural language processing pipelines use BERT-based models. So, how does TF Lite model development work? First, we’ll select a model in one of the following ways:  An existing pre-trained .tflite model (https://tfhub.dev/s?deployment-format=lite). Use MediaPipe Model Maker (https://developers.google.com/mediapipe/solutions/model_maker) to apply feature engineering transfer learning on an existing .tflite model with a custom training dataset. Model Maker only works with Python. Convert a full-fledged TF model into .tflite format. Discover more insights from 'Python Deep Learning - Third Edition' by Ivan Vasilev. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here💡 What's the Latest Scoop from the BI Community? 🌀 Real-Time Twitch Chat Sentiment Analysis with Apache Flink: This blog explores building a real-time sentiment analysis application for Twitch chat using Apache Flink. It covers setting up the project, reading Twitch chat messages, performing sentiment analysis, and concludes with a demo. 🌀 Entity Type Prediction with Relational Graph Convolutional Network (PyTorch): This post discusses a Python setup for predicting entity types on heterogeneous graphs using the Relational Graph Convolutional Network (R-GCN) and the RGCNConv module from PyTorch. It explains knowledge graphs, entity type prediction, and the R-GCN model. 🌀 Data Quality Error Detection powered by LLMs: This article explores automating the identification of data errors in tabular datasets using Large Language Models (LLMs). It discusses the Data Dirtiness Score, challenges in data cleaning, and the potential of LLMs in detecting data quality issues. 🌀 Building Ethical AI Starts with the Data Team — Here’s Why: This article discusses the ethical considerations of AI, focusing on model bias, AI usage, and data responsibility. It emphasizes the role of data teams in ensuring ethical AI and suggests steps for data teams to take towards a more ethical future. See you next time!
Read more
  • 0
  • 0
  • 607

article-image-databricks-dbrx-stability-ais-stable-code-instruct-3b-sambanovas-samba-coe-v02-frugalgpt-advanced-rag-patterns-on-amazon-sagemaker
Merlyn Shelley
02 Apr 2024
10 min read
Save for later

Databricks' DBRX, Stability AI's Stable Code Instruct 3B, SambaNova's Samba CoE v0.2, FrugalGPT, Advanced RAG Patterns on Amazon SageMaker

Merlyn Shelley
02 Apr 2024
10 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#87 – Your Gateway to the Cutting-Edge of Data Science & Machine Learning! 🚀 Dive into this edition to explore: ⚙️ LLMs & GPTs Unleashed Samba CoE v0.2: SambaNova's Speedy AI Models Efficient Training of Language Models with OpenAI AI21's Revolutionary SSM-Transformer Model: Jamba Databricks' DBRX: The New Open LLM Benchmark Stable Code Instruct 3B: Stability AI's Latest Offering HyperLLaVA: Boosting Multimodal Language Models ✨ What's Fresh & Exciting FrugalGPT: Cutting LLM Operating Costs Building a Reliable AI Agent from Scratch with OpenAI Tool Calling Fine-Tuning Instruct Models over Raw Text Data Crafting an OpenAI-Compatible API ⚡ Industry Pulse:  Deciphering Advanced RAG Patterns on Amazon SageMaker Unveil the Future with AutoBNN: Mastering Probabilistic Time Series Forecasting! Engaging with Microsoft Copilot (web): Learning from Interaction 📚 Packt's Latest Gem "Principles of Data Science - Third Edition" by Sinan Ozdemir DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ Zejun-Yang/AniPortrait: AniPortrait is a new framework for creating high-quality animations using audio input and a reference portrait image, with face reenactment capabilities.🛠️ agiresearch/AIOS: AIOS embeds large language models into operating systems, enabling smarter resource allocation, context switching, and concurrent agent execution, advancing AGI. 🛠️ lichao-sun/Mora: Mora is a multi-agent framework for video generation, enhancing OpenAI's Sora capabilities through collaborative visual agents for diverse tasks. 🛠️ jasonppy/VoiceCraft: VoiceCraft is a high-performing neural codec language model for speech editing and zero-shot text-to-speech, excelling with diverse real-world data. 🛠️ dvlab-research/MiniGemini: Mini-Gemini enhances LLMs (Large Language Models) from 2B to 34B, integrating image understanding, reasoning, and generation, inspired by LLaVA. 🛠️ Picsart-AI-Research/StreamingT2V: StreamingT2V is a technique for creating long videos with rich motion dynamics, ensuring temporal consistency and high image quality. 📚 Expert Insights from Packt Community"Principles of Data Science - Third Edition" by Sinan Ozdemir. The Five Steps of Data Science A question I’ve gotten at least once a month for the past decade is What’s the difference between data science and data analytics? One could argue that there is no difference between the two; others will argue that there are hundreds of differences! I believe that, regardless of how many differences there are between the two terms, the following applies: Data science follows a structured, step-by-step process that, when followed, preserves the integrity of the results and leads to a deeper understanding of the data and the environment the data comes from. As with any other scientific endeavor, this process must be adhered to, or else the analysis and the results are in danger of scrutiny. On a simpler level, following a strict process can make it much easier for any data scientist, hobbyist, or professional to obtain results faster than if they were exploring data with no clear vision. While these steps are a guiding lesson for amateur analysts, they also provide the foundation for all data scientists, even those in the highest levels of business and academia. Every data scientist recognizes the value of these steps and follows them in some way or another. Overview of the five steps The process of data science involves a series of steps that are essential for effectively extracting insights and knowledge from data. These steps are presented as follows: Asking an interesting question: The first step in any data science project is to identify a question or challenge that you want to address with your analysis. This involves finding a topic that is relevant, important, and that can be addressed with data. Obtaining the data: Once you have identified your question, the next step is to collect the data that you will need to answer it. This can involve sourcing data from a variety of sources, such as databases, online platforms, or through data scraping or data collection methods. Exploring the data: After you have collected your data, the next step is to explore it and get a better understanding of its characteristics and patterns. This might involve examining summary statistics, visualizing the data, or applying statistical or machine learning (ML) techniques to identify trends or relationships. Modeling the data: Once you have explored your data, the next step is to build models that can be used to make predictions or inform decision-making. This might involve applying ML algorithms, building statistical models, or using other techniques to find patterns in the data. Communicating and visualizing the results: Finally, it’s important to communicate your findings to others in a clear and effective way. This might involve creating reports, presentations, or visualizations that help to explain your results and their implications. By following these five essential steps, you can effectively use data science to solve real-world problems and extract valuable insights from data. It’s important to note that different data scientists may have different approaches to the data science process, and the steps outlined previously are just one way of organizing the process. Some data scientists might group the steps differently or include additional steps such as feature engineering or model evaluation. Despite these differences, most data scientists agree that the steps listed previously are essential to the data science process. Whether they are organized in this specific way or not, these steps are all crucial for effectively using data to solve problems and extract valuable insights. Let’s dive into these steps one by one.Discover more insights from "Principles of Data Science - Third Edition" by Sinan Ozdemir. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!    Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Advanced RAG patterns on Amazon SageMaker: This post discusses how customers across various industries are utilizing large language models (LLMs) like Mixtral-8x7B Instruct to build generative AI applications such as QnA chatbots and search engines. It highlights the challenges and solutions in improving the accuracy and performance of these applications, focusing on Retrieval Augmented Generation (RAG) patterns implemented with LangChain.Google Research 🌀 AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks. This research introduces AutoBNN, an open-source package for automated, interpretable time series forecasting using Bayesian neural networks (BNNs). It addresses limitations of traditional methods like Gaussian processes (GPs) and Structural Time Series by combining the interpretability of GPs with the scalability and flexibility of neural networks. AutoBNN automates model discovery, provides high-quality uncertainty estimates, and scales effectively for large datasets. Microsoft Research🌀 Learning from interaction with Microsoft Copilot (web): This research focuses on how AI systems like Bing and Microsoft Copilot learn and improve from user interactions, particularly through reinforcement learning from human feedback (RLHF). It also explores how Bing has evolved its search capabilities and how Copilot is changing user interactions to be more conversational and workflow oriented. The research introduces frameworks like TnT-LLM and SPUR to improve taxonomy generation and user satisfaction estimation in AI interactions. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Samba CoE v0.2 from SambaNova delivers accurate AI models at blazing speeds: This blog post highlights Samba's advancements in AI architecture, specifically focusing on the introduction of Samba-1, a CoE architecture for enterprise AI. It discusses the features and benefits of Samba-1, its performance benchmarks, and plans for future releases, emphasizing the role of RDUs in driving efficiency and speed in AI models. 🌀 OpenAI’s Efficient Training of Language Models to Fill in the Middle: OpenAI demonstrates that autoregressive language models can effectively learn to infill text by moving a span of text from the middle of a document to its end, without harming generative capability. They propose training models with this method by default and provide benchmarks and best practices. 🌀 Jamba: AI21's Groundbreaking SSM-Transformer Model. Jamba is a groundbreaking model that merges Mamba SSM with Transformer elements, offering a 256K context window and outperforming similar models. Released under Apache 2.0, it will be available in the NVIDIA API catalog. Jamba optimizes memory, throughput, and performance, delivering remarkable efficiency. 🌀 Databricks’ DBRX: A New State-of-the-Art Open LLM. Databricks introduces DBRX, an open LLM setting new benchmarks in language understanding, programming, and math. With a 256K context window, it outperforms GPT-3.5 and competes with Gemini 1.0 Pro. DBRX is 40% smaller than Grok-1, offering 2x faster inference than LLaMA2-70B. 🌀 Introducing Stable Code Instruct 3B — Stability AI: Stable Code Instruct 3B, built on Stable Code 3B, offers state-of-the-art performance in code completion and natural language interactions for programming tasks. It outperforms Codellama 7B Instruct and matches StarChat 15B, with a focus on popular languages like Python and Java. Available for commercial use with a Stability AI Membership, the model is accessible on Hugging Face. 🌀 HyperLLaVA: Enhancing Multimodal Language Models with Dynamic Visual and Language Experts. This blog explores the advancements in Multimodal Large Language Models (MLLMs) and introduces HyperLLaVA, a dynamic model that improves performance by adaptively tuning parameters for handling diverse multimodal tasks, surpassing existing benchmarks and opening new avenues for multimodal learning systems. ✨ On the Radar: Catch Up on What's Fresh🌀 FrugalGPT and Reducing LLM Operating Costs: The blog discusses the high cost of running Large Language Models (LLMs) and introduces the "FrugalGPT" framework, which reduces operating costs significantly while maintaining quality. It explains how different models cost different amounts and proposes using a cascade of LLMs to minimize costs while maximizing answer quality. 🌀 Leverage OpenAI Tool calling: Building a reliable AI Agent from Scratch. The blog discusses the future role of AI in everyday tasks, focusing on text creation, correction, and brainstorming. It highlights the importance of Retrieval-Augmented Generation (RAG) pipelines and aims to provide Large Language Models with better context to generate more valuable content. 🌀 Fine-tune an Instruct model over raw text data: The blog explores the challenges of integrating modern chatbots with large datasets, focusing on context window sizes and the use of Retrieval-Augmented Generation (RAG) techniques. It proposes a lighter approach to fine-tuning chatbots on smaller datasets, aiming to bridge the gap between the constraints of a 128K context window and the complexities of models fine-tuned on billions of tokens. The experiment involves fine-tuning a model on The Guardian's dataset and aims to provide reproducible instructions for cost-effective model training using accessible hardware. 🌀 How to build an OpenAI-compatible API: The blog discusses the dominance of OpenAI in the Gen AI market, and the reasons developers might choose alternative LLM providers. It explores implementing a Python FastAPI server compatible with the OpenAI API specs to wrap any LLM, aiming for flexibility and cost-effectiveness. See you next time!
Read more
  • 0
  • 0
  • 310
article-image-revolutionising-work-and-everyday-life-with-chatgpt
M.T. White
01 Apr 2024
14 min read
Save for later

Revolutionising Work and Everyday Life with ChatGPT

M.T. White
01 Apr 2024
14 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT is a revolutionary new technology that is making a large impact on society.  The full impact of ChatGPT cannot be fully known at the time of writing this article because of how novel the technology is.  However, what can be said is that since its introduction many industries have been trying to leverage it and increase productivity.  Simultaneously, everyday people are trying to learn to leverage it as well.  Overall, ChatGPT and similar systems are very new and the full impact of how to leverage them will take some more time to fully manifest.  This article is going to explore how ChatGPT can be used for everyday life by exploring a few use cases.What is ChatGPT?       Before we begin, it is important to understand what ChatGPT is and what it isn’t.  To begin ChatGPT is in a lay sense a super advanced chatbot.  More specifically, ChatGPT is known as a generative AI that uses Natural Language Processing (NLP) to create a dialog between a user and itself.  ChatGPT and similar systems are what are known as Large Language Models (LLMs).  In short, for AI models to work they have to be trained using data.  To train LLMs engineers use vast amounts such as books, articles, journals, and so on.  The result is a system like ChatGPT that has a vast knowledge base on many different subjects.  Before we can explore how to use ChatGPT for everyday life we need to explore how NOT to use ChatGPT. How not to use ChatGPT?ChatGPT is very powerful and can be used for many different things; however, is important to understand that ChatGPT is not a sage nor infallible.  Remember ChatGPT only knows about what it was trained on.  This means if the information it was taught was wrong or outdated so too will be the response it provides.  This means that as of writing this article ChatGPT cannot and should not be used as a replacement for humans.  Even with the answer ChatGPT gives, a decent level of domain knowledge is still required properly format and use its response.  As such, it is important to take suggestions with a certain amount of skepticism.  The best way to think of ChatGPT is as an extremely smart friend with a wide range of knowledge.  Though that friend may be very knowledgeable they can and will be wrong from time to time.  With that how does one use ChatGPT?How to use ChatGPT for everyday life?Using ChatGPT starts with what is called a prompt.  A prompt can be thought of as a query that the user feeds into ChatGPT.  This prompt is kind of like a Google search.  It will be a statement that asks ChatGPT a question or to perform a certain task.  Since ChatGPT used NLP, the prompt does not have to be formatted a certain way nor does it require a specific syntax.  In other words, a prompt can be entered the same way a user would pose the instruction to another person.  However, it is good to provide ChatGPT will enough information to adequately describe the problem.  An example, of a prompt is as follows:How can ChatGPT be used for everyday life?Once the prompt is run ChatGPT will produce text with suggestions, answers, programming software, or whatever was asked for.  That output that is produced is called a response which will have the answer to the request/question. There are a lot of ways ChatGPT can be used for everyday life.  In terms of professional life, ChatGPT can make an excellent assistant.  For most “text-based” jobs, that is jobs that don’t require design such as drawing diagrams, ChatGPT can be an excellent assistant.  For example, suppose a programmer needs a small piece of software for a project, assume they need a SQL query that can pull all the rows between two dates.  The user could use a prompt like the following:Write a SQL query that will pull all rows between January 3 and March 31.This prompt will produce the following response:To pull all rows between January 3 and March 31 from a table, you'll need to use the WHERE clause to filter the results based on the date column. Assuming you have a date column named "date_column" in your table, here's the SQL query: sqlCopy code SELECT * FROM your_table WHERE date_column >= '2023-01-03' AND date_column <= '2023-03-31'; Replace "your_table" with the actual name of your table, and "date_column" with the name of the column that contains the dates you want to filter. The query will select all rows where the "date_column" is on or after January 3, 2023, and on or before March 31, 2023.For those that are not familiar with SQL, the prompt produced a simple query with generic placeholders that will pull the correct data out of a database. Not everyone is a programmer, so how could a non-programmer use ChatGPT as an assistant?  One area in which ChatGPT can be used is by composing emails.  Suppose a salesperson needs to inform Tom, their boss, that a $1 million deal was closed today via email.  They could use a prompt like the following:Write Tom an email saying that the deal with Company X went through, and we made $1 million.The response would be something along the lines of:Subject: Successful Deal with Company X - $1 Million Revenue Achieved Hi Tom, I hope this email finds you well. I am thrilled to share some exciting news with you! The deal with Company X has successfully gone through, and I am delighted to inform you that we have made an astounding $1 million in revenue from this collaboration. This achievement wouldn't have been possible without the hard work and dedication of our entire team. It reflects the trust and confidence our partners have in our capabilities, and I believe it marks a significant milestone for our company's growth and success. I want to express my gratitude to you for your valuable contributions and expertise that played a crucial role in making this deal a reality. Your efforts have been commendable, and I am truly grateful for your commitment to the success of this venture. Let's take a moment to celebrate this triumph together. Your exceptional skills and commitment to excellence have made this accomplishment possible, and I am genuinely proud to have you as part of our team. Looking forward to even more prosperous ventures in the future. Once again, congratulations to you and the entire team! Best regards, [Your Name] [Your Title/Position] [Company Name] [Email] [Phone Number]In this case, all the users would have to do is proofread the email, copy it, and send it.  Now, for applications such as this proofreading is very important.  As stated before, ChatGPT is not a mind reader nor a sage, as such it may produce emails with wrong information, illogical wording, or anything else.  So, it’ll pay to read the message before sending it. Learning ToolA very powerful use of ChatGPT is as a learning tool.  For example, students can use this tool to help them master a new subject, generate practice problems, and even check answers to an extent.  Assume the user needs a few extra practice problems for their English homework, they could enter a prompt like the following:Generate some grammar practice problems for a 1st year English classWhich would yield the following output:Of course! Here are some grammar practice problems suitable for a 1st-year English class: 1. Subject-Verb Agreement: Choose the correct verb form that agrees with the subject in each sentence: a) The cat (purr/purrs) when it is happy. b) My friend and I (is/are) going to the movies tonight. c) Neither the students nor the teacher (understands/understand) the assignment. 2. Pronoun Usage: Select the appropriate pronoun to complete each sentence: a) She invited Sarah and (I/me) to the party. b) My sister and (her/she) are going shopping. c) The book is for (he/him) and his brother. 3. Articles (a/an/the): Insert the correct article (a, an, the, or no article) in each sentence: a) Can you pass me (a/an) apple, please? b) I need to buy (the) new shoes for the dance. c) My friend has (no article) car. 4. Prepositions: Choose the correct preposition to complete each sentence: a) We arrived (in/on/at) the airport fifteen minutes early. b) The cat is sleeping (under/below/above) the table. c) She's going to the store (by/with/for) her mom. 5. Verb Tenses: Identify the correct verb tense used in each sentence: a) They (will be/was) studying for the exam all night. b) By the time we arrived, the party (had started/has started). c) I (am/was) watching a movie right now. 6. Adjective vs. Adverb: Decide whether the word in parentheses should be an adjective or adverb in the sentence: a) She sings (beautiful/beautifully). b) The (happy/happily) children played in the park. c) The (quick/quickly) fox jumped over the fence. 7. Negation: Add the necessary negative word to each sentence: a) I have (no/not) seen him today. b) She does (not/none) like coffee. c) We will (not/none) be attending the party. 8. Passive Voice: Rewrite the following sentences in the passive voice: a) The chef cooked the meal. b) The teacher is explaining the lesson. c) The storm destroyed the old building. 9. Comparative and Superlative Forms: Complete each sentence with the correct comparative or superlative form of the adjective: a) This book is (interesting/more interesting/most interesting) than the last one. b) She is (tall/taller/tallest) than her sister. c) It was the (big/biggest/bigger) cake at the party. 10. Reported Speech: Rewrite the following sentences using reported speech: a) Mary said, "I am going to the store." b) "I will call you later," said John. c) "They have arrived," she said. These grammar practice problems aim to reinforce essential grammar rules and concepts, helping 1st-year English students develop their language skills and proficiency.As can be seen, ChatGPT generated a number of questions that one would usually find in an English 101 class.  Now, if the user wanted a more specific answer that could reword the prompt to accommodate. ChatGPT is also a great teaching tool as well.  Teachers can also use this tool to generate lesson plans, write intros for learning modules, generate problems, and more.  Overall, ChatGPT can be used as an excellent tool for academia in general.Now, it is important to remember that ChatGPT is not all-knowing.  It can only generate responses based on the data that it was trained on.  This means if the user is asking questions about a subject where there is little or no publicly available data the response probably isn’t going to be of the highest quality.  For example, if someone askedHow do I build a working time machine?They’re going to get the following:Building a working time machine, as depicted in science fiction, is currently beyond our technological capabilities and goes against the known laws of physics. Time travel, as popularly imagined, involves traveling backward or forward in time, similar to how we move through space. As of my last update in September 2021, time travel remains a concept explored mainly in theoretical physics, and no practical time machine has been built or proven to be possible. Several theories, such as the theory of relativity, suggest that time travel could be theoretically feasible under certain conditions, but these ideas are still purely theoretical and have not been put into practice. It's important to remember that building a time machine would present enormous scientific, technological, and ethical challenges. It could raise paradoxes, such as the "grandfather paradox," where a time traveler could potentially alter past events and create inconsistencies in history. As of now, time travel remains a fascinating topic in science fiction and theoretical physics, but it is not something that can be practically accomplished with our current understanding and technology. The scientific community continues to explore the nature of time and the universe, but building a working time machine as portrayed in movies and books is not yet a reality.ChatGPT gave some general information on time travel but unfortunately, it didn’t generate any schematics.  Again, this goes back to ChatGPT not being a sage as it only knows what it was “taught”.  Since there is no data on how to build a working time machine ChatGPT could not fully answer the prompt. Now, If one looks at the English question prompt above, one will notice that practice questions were quite broad.  It is common to have to “dig” with ChatGPT.  In other words, a person may have to refine the queriers to get what they need.  To get some practice try re-wording the prompt to generate practice questions for only passive writing.  SummaryChatGPT is a tool, and like any other tool, what it’s used for is up to the user.  As was seen in this article, ChatGPT is an excellent tool for helping a person through their day by generating software, emails, and so on.  ChatGPT can also be used as a great learning or teaching device to help students and teachers generate practice problems, create lesson plans, and so much more.  However, as was stated so many numerous times.  Unless ChatGPT has been trained on something it does not know about it.  This means that asking it things like how to build a time machine or domain specific concepts aren’t going to return quality responses.  Also, even if ChatGPT has been trained on the prompt, it may not always generate a quality response.  No matter the use case, the response should be vetted for accuracy.  This may mean doing a little extra research with the response given, testing the output, or whatever needs to be done to verify the response. Overall, ChatGPT at the time of writing this article is less than a year old.  This means that the full implication of using ChatGPT are not fully understood.  Also, how to fully leverage ChatGPT is not understood yet either.  What can be said is that ChatGPT and similar LLM systems will probably be the next Google.  In terms of everyday use, the only true inhibitors are the user's imagination and the data that was used to train ChatGPT.Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming
Read more
  • 0
  • 0
  • 419

article-image-elevate-your-bi-dashboards-with-figma
Merlyn Shelley
28 Mar 2024
12 min read
Save for later

Elevate Your BI Dashboards with Figma

Merlyn Shelley
28 Mar 2024
12 min read
Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Figma Want to take your BI dashboards to the next level? Figma is the way to go!  It's all about ramping up the design, making things work better, and giving your Power BI projects a real boost.  With Figma, you'll speed up your projects, get more creative, and see better performance. So, why not give your reports a makeover with Figma? It's where design and data come together to make a big impact! Here's what Figma offers: ✅ Figma Professional: An all-in-one tool for seamless team collaboration. ✅ FigJam: Enables real-time teamwork and brainstorming. ✅ FigJam AI: Integrates ChatGPT for smarter collaboration. Guess what? You also have the Power BI UI Kit from the Figma Community! Sign Up Now! 👋 Hello,Welcome to BI-Pro #48, your ultimate guide to data and BI insights! 🚀In this issue: 🔮 Python Data Viz Matplotlib Data Visualization Seaborn: Visualizing Data in Python Use pandas for CSV Data Visualization Guides on SQL, Python, Data Cleaning, and Analysis Build An AI App with Python in 10 Steps ⚡ Industry Highlights Power BI Hybrid Workforce Experience Report Lakeview Dashboards Overview Grouping and Binning in Power BI Desktop Dashboards in Operations Manager Microsoft Fabric Analyze Dataverse Tables Bridging Fabric Lakehouses AWS Big Data Multicloud Analytics with Amazon Athena Analyze Fastly CDN Logs with QuickSight Google Cloud Data Spark Procedures in BigQuery  Gemini Pro 1.0 in BigQuery via Vertex AI ✨ Expert Insights from Packt Community Unlocking the Secrets of Prompt Engineering 💡 BI Community Scoop Creating Interactive Power BI Dashboards Using Report Templates in Power BI Desktop 10 Analytics Dashboard Examples for SaaS Future of Data Storytelling: Actionable Intelligence Power BI: Transforming Banking Data Power BI vs Tableau vs Qlik Sense | 2024 Winner Get ready to supercharge your skills with BI-Pro! 🌟 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos🌀 sdv-dev/SDV: The Synthetic Data Vault (SDV) is a Python library that creates tabular synthetic data by learning patterns from real data using machine learning algorithms. 🌀 hyperspy/hyperspy: HyperSpy is a Python library for analyzing multidimensional datasets, making it easy to apply analytical procedures and access tools. 🌀 hi-primus/optimus: Optimus is a Python library for loading, processing, plotting, and creating ML models that works with pandas, Dask, cuDF, dask-cuDF, Vaex, or Spark. It simplifies data processing and offers various functions for data quality, plotting, and cross-platform compatibility. 🌀 mingrammer/diagrams: Diagrams simplifies cloud system architecture design in Python, supporting major providers and tracking changes in version control. 🌀 kayak/pypika: PyPika simplifies building SQL queries in Python with a flexible, easy-to-use interface, leveraging the builder design pattern for clean, efficient queries. Email Forwarded? Join BI-Pro Here!Partnering with Webflow   Transform your BI reporting with Webflow Enterprise.  Create visually stunning, scalable websites without coding, using a visual canvas.Seamlessly integrate with popular BI platforms and let Webflow handle the code.Start building smarter, faster, and more reliable websites for your data-driven decisions today! Get Started for Free! 🔮 Data Viz with Python Libraries  🌀 Matplotlib Data Visualization in Python: This blog introduces Matplotlib, a Python library for 2D visualizations, covering its capabilities and plot types like line, scatter, bar, histograms, and pie charts. It highlights Matplotlib's versatility, customization, and integration with other libraries, making it essential for data science and research. 🌀 Visualizing Data in Python With Seaborn:  This article introduces the seaborn library for statistical visualizations in Python. It covers creating various plots, such as bar, distribution, and relational plots, using seaborn's functional and objects interfaces. It emphasizes seaborn's clear and concise code for effective data visualization. 🌀 Use pandas to Visualize CSV Data in Python: This blog discusses using the CData Python Connector for CSV with pandas, Matplotlib, and SQLAlchemy to analyze and visualize live CSV data in Python. It highlights the ease of integration and superior performance of the connector, along with step-by-step instructions for connecting to CSV data, executing SQL queries, and visualizing the results in Python. 🌀 Collection of Guides on Mastering SQL, Python, Data Cleaning, Data Wrangling, and Exploratory Data Analysis: This guide is tailored for business intelligence professionals new to data science, offering step-by-step instructions on mastering SQL, Python, data cleaning, wrangling, and exploratory analysis. It emphasizes practical skills for extracting insights and showcases essential tools and techniques for effective data analysis. 🌀 Build An AI Application with Python in 10 Easy Steps: This blog outlines a 10-step guide to building and deploying AI applications with Python, covering objectives, data collection, model selection, training, evaluation, optimization, web app development, cloud deployment, and sharing the AI model, with practical advice for each step. ⚡Stay Informed with Industry HighlightsPower BI 🌀 Hybrid Workforce Experience Power BI report: This tutorial explains using the Power BI Hybrid Workforce Experience report to analyze the impact of hybrid work models on employees working onsite, remotely, or in a hybrid manner. It covers setup, key metrics analysis, and improving employee experience, with prerequisites outlined. 🌀 What are Lakeview dashboards? This article discusses Lakeview dashboards, designed for creating and sharing data visualizations within teams. It highlights their advanced features, comparison with Databricks SQL dashboards, and dataset optimizations for better performance, including handling various dataset sizes and query efficiency. 🌀 Use grouping and binning in Power BI Desktop: This article explains how to use grouping and binning in Power BI Desktop to refine data visualization. Grouping allows you to combine data points into larger categories for clearer analysis, while binning lets you define the size of data chunks for more meaningful visualization. The article provides step-by-step instructions for creating, editing, and applying groups and bins to numerical and time fields, enhancing the exploration of data and trends in visuals. 🌀 Dashboards in Operations Manager: This article covers dashboard templates and widgets in Operations Manager, outlining their layouts and functions. It highlights various dashboard types, such as Service Level, Summary, and Object State, each with specific widgets. Users can create, share, and view dashboards across different consoles. Microsoft Fabric🌀 Analyze Dataverse tables from Microsoft Fabric: The article announces new features for Dynamics 365 and Power Apps customers, allowing easy integration of insights into Fabric. Users can now create shortcuts to Dataverse environments in Fabric for quick data access and analysis across multiple environments, enhancing business insights. 🌀 Bridging Fabric Lakehouses: Delta Change Data Feed for Seamless ETL. This article explains using Delta Tables and the Delta Change Data Feed in Microsoft Fabric for efficient data synchronization across lakehouses. It highlights Delta Tables' features and demonstrates updating tables across Silver and Gold Lakehouses in a medallion architecture. AWS BI  🌀 Multicloud data lake analytics with Amazon Athena: This post discusses creating a unified query interface using Amazon Athena connectors to seamlessly query across multiple cloud data stores, simplifying analytics in organizations with data spread over different clouds. It also explores managing analytics costs using Athena workgroups and cost allocation tags. 🌀 How to Analyze Fastly Content Delivery Network Logs with Amazon QuickSight Powered by Generative BI? This post discusses using Fastly, a content delivery network (CDN), to enhance web performance and security. It highlights creating a dashboard with Amazon QuickSight for analyzing CDN logs, using AWS services like S3 and Glue for data storage and cataloging. Google Cloud Data 🌀 Apache Spark stored procedures in BigQuery are GA: BigQuery now supports Apache Spark stored procedures, enabling users to integrate Spark-based data processing with BigQuery's SQL capabilities. This simplifies using Spark within BigQuery, allowing seamless development, testing, and deployment of PySpark code, and installation of necessary packages in a unified environment. 🌀 Gemini Pro 1.0 available in BigQuery through Vertex AI: This post advocates for a unified platform to bridge data and AI teams, ensuring smooth workflows from data ingestion to ML training. It introduces BigQuery ML, enabling ML model creation, training, and execution in BigQuery using SQL. It supports various models, including Vertex AI-trained ones like PaLM 2 and Gemini Pro 1.0, and enables sharing trained models, promoting governed data usage and easy dataset discovery. Gemini Pro 1.0 integration into BigQuery via Vertex AI simplifies generative AI, enhancing collaboration, security, and governance in data workflows. ✨ Expert Insights from Packt CommunityUnlocking the Secrets of Prompt Engineering - By Gilbert Mizrahi Exploring LLM parameters LLMs such as OpenAI’s GPT-4 consist of several parameters that can be adjusted to control and fine-tune their behavior and performance. Understanding and manipulating these parameters can help users obtain more accurate, relevant, and contextually appropriate outputs. Some of the most important LLM parameters to consider are listed here: Model size: The size of an LLM typically refers to the number of neurons or parameters it has. Larger models can be more powerful and capable of generating more accurate and coherent responses. However, they might also require more computational resources and processing time. Users may need to balance the trade-off between model size and computational efficiency, depending on their specific requirements. Temperature: The temperature parameter controls the randomness of the output generated by the LLM. A higher temperature value (for example, 0.8) produces more diverse and creative responses, while a lower value (for example, 0.2) results in more focused and deterministic outputs. Adjusting the temperature can help users fine-tune the balance between creativity and consistency in the model’s responses. Top-k: The top-k parameter is another way to control the randomness and diversity of the LLM’s output. This parameter limits the model to consider only the top “k” most probable tokens for each step in generating the response. For example, if top-k is set to 5, the model will choose the next token from the five most likely options. By adjusting the top-k value, users can manage the trade-off between response diversity and coherence. A smaller top-k value generally results in more focused and deterministic outputs, while a larger top-k value allows for more diverse and creative responses. Max tokens: The max tokens parameter sets the maximum number of tokens (words or subwords) allowed in the generated output. By adjusting this parameter, users can control the length of the response provided by the LLM. Setting a lower max tokens value can help ensure concise answers, while a higher value allows for more detailed and elaborate responses. Prompt length: While not a direct parameter of the LLM, the length of the input prompt can influence the model’s performance. A longer, more detailed prompt can provide the LLM with more context and guidance, resulting in more accurate and relevant responses. However, users should be aware that very long prompts can consume a significant portion of the token limit, potentially truncating the model’s output. Discover more insights from 'Unlocking the Secrets of Prompt Engineering' by Gilbert Mizrahi. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here💡 What's the Latest Scoop from the BI Community? 🌀 Creating Interactive Power BI Dashboards That Engage Your Audience: This blog discusses the challenges faced by stakeholders and clients unfamiliar with using dashboards, preferring traditional tools like Excel. It emphasizes the importance of creating user-friendly and interactive dashboards to bridge this gap, offering techniques to enhance engagement and accessibility.🌀 Create and use report templates in Power BI Desktop: This tutorial explains how to create and use report templates in Power BI Desktop, enabling users to streamline report creation and standardize layouts, data models, and queries. Templates, saved with the .PBIT extension, help jump-start and share report creation processes across an organization. 🌀 10 Analytics Dashboard Examples to Gain Data Insights for SaaS: This article discusses the importance of analytics dashboards in simplifying the tracking of SaaS metrics and extracting insights. It provides 10 examples of analytics dashboards, including web, digital marketing, and user behavior, and highlights the top 5 analytics tools. The article emphasizes the need for clear, customizable, and intuitive dashboards for effective decision-making. 🌀 The Future of Data Storytelling: Actionable Intelligence [AI, Power BI, and Office]: This blog post discusses Zebra BI's solutions for reporting, planning, and presenting, emphasizing the importance of clarity, consistency, and actionability in data visualization. It introduces the concept of a reporting-planning-presenting cycle and highlights upcoming features and innovations, including the integration of AI. The post also mentions Zebra BI's adherence to the IBCS standard for clear and consistent business communication. 🌀 Power BI: Transforming Banking Data. This blog post discusses how Power BI can help banks analyze complex data for better decision-making. It covers challenges in banking, how Power BI integrates data sources, develops dashboards, and optimizes analytics. Benefits include improved operations, customer experience, risk management, and cost savings. 🌀 Power BI vs Tableau vs Qlik Sense | Which Wins In 2024? This blog compares Power BI, Tableau, and Qlik Sense for business intelligence (BI) and analytics. It highlights Power BI's advantages in data management, Tableau's strong visualization capabilities, and Qlik Sense's modern self-service platform. The article concludes with a comparison of features and recommendations for different needs. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support! 
Read more
  • 0
  • 0
  • 1290

article-image-transforming-web-data-with-browse-ai
Merlyn Shelley
26 Mar 2024
14 min read
Save for later

Transforming Web Data with Browse AI

Merlyn Shelley
26 Mar 2024
14 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Browse AI Turn Web Data into Your Business Superpower!👉 Train a robot in 2 minutes, no coding needed. 🤖 👉 Ideal for web scraping and data monitoring. 🌐 Here’s what you get: Monitor Websites for Changes ✅ Download Data from Any Website ✅ Turn Any Website into an API ✅ Product data extraction ✅ Also, extract data from news, stocks, jobs, social media, and more. Check out this 1-minute explainer video on how to extract data to Excel, Airtable, and connect to 5,000+ apps using Zapier! Start for free with up to 50 credits, and for a limited time, enjoy free setup and onboarding for Team and Company plans, saving up to 20% on Annual plans. Get Scraping Today!👋 Hello,Welcome to DataPro#85 – Your one-stop shop for the latest in Data Science and ML Algorithms! 🚀 In this issue:⚙️ Keeping Up with LLMs & GPTs  Meet Devin: The pioneering AI software engineer. Google's Croissant: A fresh take on metadata for ML-ready datasets. INSTRUCTIR by Kaist AI: Setting new standards in instruction-following for information retrieval models. Spyx by Sussex AI: Turbocharging spiking neural networks with just-in-time compiled optimization. SynCode by VMware: Enhancing LLM code generation with a touch of grammar. Chatbot Arena: The ultimate battleground for evaluating LLMs by human preference. Apollo: Bringing medical AI to the masses with a multilingual medical LLM. ✨ On the RadarTop AI tools for code generation in 2024. Setting up a Pypi mirror in AWS with Terraform. Ensuring safer code changes with custom pre-commit hooks. Deciphering the AQLM Quantization Algorithm. AI's role in revolutionizing web browsing. Tackling tensors through three tricky errors. Running RStudio inside a container. Harnessing PyTorch and MLX for Apple Silicon. 🏭 Industry Highlights Google Research: Boosting LLMs with Cappy, evolving tables with Chain-of-table, and Scalable Instructable Multiworld Agent (SIMA). AWS: Streamlining code review with generative AI using Amazon Bedrock. OpenAI Updates: Leadership continuity and global news partnerships. 📚 New in Packt Library Practical Guide to Applied Conformal Prediction in Python by Valery Manokhin. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ deepseek-ai/DeepSeek-VL: Open-source Vision-Language (VL) model for real-world tasks, handling logical diagrams, web pages, formulas, scientific literature, and more. 🛠️ OpenGVLab/VideoMamba: VideoMamba enhances 3D CNNs and video transformers, excelling in long-term video understanding with scalability and modality compatibility. 🛠️ showlab/DragAnything: DragAnything uses entity representation for motion control in video generation, offering user-friendly interaction and outperforming existing methods. 🛠️ pkunlp-icler/FastV: FastV accelerates large vision language models by pruning redundant visual tokens, achieving 45% FLOPs reduction without performance loss. 🛠️ cnulab/RealNet: RealNet introduces SDAS for anomaly strength control, AFS for feature selection, and RRS for anomaly region identification. Partnering with SurfsharkSurfshark is allowing our readers to enjoy a full 2 years of their award-winning VPN protection for 79% off, plus 2 months free. With Surfshark One, you get: Unlimited devices and connections ✅ One account for the entire household ✅ Your online activity, made safe, secure, and invisible ✅ Plus, identity protection, ad blocking, antivirus, and data breach monitoring.Claim your VPN protection today! 📚 Expert Insights from Packt CommunityPractical Guide to Applied Conformal Prediction in Python - By Valery Manokhin Basic components of a conformal predictor We will now look at the basic components of a conformal predictor: Nonconformity measure: The nonconformity measure is a function that evaluates how much a new data point differs from the existing data points. It compares the new observation to either the entire dataset (in the full transductive version of conformal prediction) or the calibration set (in the most popular variant – ICP. The selection of the nonconformity measure is based on a particular machine learning task, such as classification, regression, or time series forecasting, as well as the underlying model. This will examine several nonconformity measures suitable for classification and regression tasks. Calibration set: The calibration set is a portion of the dataset used to calculate nonconformity scores for the known data points. These scores are a reference for establishing prediction intervals or regions for new test data points. The calibration set should be a representative sample of the entire data distribution and is typically randomly selected. The calibration set should contain a sufficient number of data points (at least 500). If the dataset is small and insufficient to reserve enough data for the calibration set, the user should consider other variants of conformal prediction – including TCP (see, for example, Mastering Classical Transductive Conformal Prediction in Action – https://medium.com/@valeman/how-to-use-full-transductive-conformal-prediction-7ed54dc6b72b). Test set: The test set contains new data points for generating predictions. For every data point in the test set, the conformal prediction model calculates a nonconformity score using the nonconformity measure and compares it to the scores from the calibration set. Using this comparison, the conformal predictor generates a prediction region that includes the target value with a user-defined confidence level. All these components work in tandem to create a conformal prediction framework that facilitates valid and efficient uncertainty quantification in a wide range of machine learning tasks. Discover more insights from 'Practical Guide to Applied Conformal Prediction in Python' by Valery Manokhin. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!   Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Enhance code review and approval efficiency with generative AI using Amazon Bedrock: This post discusses the challenges faced by managers in overseeing code review and approval processes in software development, such as lack of technical expertise, time constraints, volume of change requests, manual effort, and the need for documentation. It also introduces a solution that leverages generative artificial intelligence and integrates it with AWS deployment tools to streamline the review and approval process. The solution includes automated change analysis, summarization, and an approval workflow. Google Research 🌀 Cappy: Outperforming and boosting large multi-task language models with a small scorer. This blog discusses advancements in large language models (LLMs) and their use in natural language processing (NLP). It introduces the concept of multi-task LLMs, such as T0, FLAN, and OPT-IML, which excel at understanding and solving various tasks. It also presents a new approach called Cappy, a lightweight pre-trained scorer that enhances the performance and efficiency of multi-task LLMs. 🌀 Chain-of-table: Evolving tables in the reasoning chain for table understanding. This research focuses on improving how large language models (LLMs) reason over tabular data, which is challenging due to the structured nature of tables. The proposed framework, Chain-of-Table, trains LLMs to iteratively update tables, mimicking human reasoning, resulting in improved performance on table understanding tasks. 🌀 Talk like a graph: Encoding graphs for large language models. This research explores how to teach large language models (LLMs) to reason with graph information, crucial for understanding interconnected data. They introduce GraphQA, a benchmark to evaluate LLMs on graph problems, revealing insights into effective graph encoding methods and improving LLM performance on graph tasks by up to 60%. 🌀 Scalable Instructable Multiworld Agent (SIMA): A generalist AI agent for 3D virtual environments. Google DeepMind has developed SIMA, a versatile AI agent trained on multiple video games to follow natural-language instructions, akin to human behavior. Collaborating with game studios, SIMA navigates various environments, showcasing potential for AI to understand and execute diverse tasks. OpenAI Updates 🌀 Review completed & Altman, Brockman to continue to lead OpenAI: The OpenAI Board completed a review by WilmerHale, expressing full confidence in Sam Altman and Greg Brockman's leadership. They also elected new board members and adopted governance enhancements. WilmerHale's review found a breakdown in trust between the prior Board and Mr. Altman, leading to his removal, but concluded that his conduct did not mandate removal. Following the review, the Board endorsed the decision to rehire Mr. Altman and Mr. Brockman. 🌀 Global news partnerships: Le Monde and Prisa Media: OpenAI has partnered with Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT. This partnership aims to enhance user interaction with news content and contribute to the training of OpenAI's models. Through these partnerships, users will access summaries and links to original articles, expanding their news consumption experience. This collaboration supports the news industry and its role in providing reliable information globally. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Introducing Devin, the first AI software engineer: Meet Devin, the autonomous AI software engineer, skilled in long-term reasoning and planning. Devin can learn new technologies, build and deploy apps, find and fix bugs, train AI models, and contribute to open source. Devin excels in resolving real-world GitHub issues, outperforming previous models. Cognition, the AI lab behind Devin, aims to unlock new possibilities beyond coding. 🌀 Google’s Croissant: a metadata format for ML-ready datasets. Croissant is a new metadata format for ML datasets, aiming to simplify the use of existing datasets for training ML models. It standardizes dataset descriptions and organization, supporting responsible AI practices. Croissant builds upon schema.org and is supported by major tools and repositories like Kaggle, Hugging Face, and OpenML. It includes a specification, example datasets, a Python library, and a visual editor to facilitate dataset usage and publication. 🌀 Kaist AI’s INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. This research focuses on enhancing search accuracy by improving retrievers to understand users' intentions, similar to language models. It introduces INSTRUCTIR, a benchmark for evaluating retrievers' ability to follow user-aligned instructions in retrieval tasks. The study addresses limitations in existing benchmarks and highlights potential overfitting issues in instruction-aware retrieval datasets.  🌀 Sussex AI’s Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks. Advancements in large neural architectures have led to powerful AI accelerators for training deep neural networks. However, these networks often incur high costs. Neuromorphic computing with Spiking Neural Networks (SNNs) offers energy-efficient alternatives, but training SNNs is challenging. Spyx, a new lightweight SNN simulation and optimization library designed in JAX, aims to facilitate SNN architecture investigation by bridging Python-based deep learning frameworks with custom compute kernels, achieving optimal hardware utilization. 🌀 VMware’s SynCode: Improving LLM Code Generation with Grammar Augmentation. SynCode is a novel framework for efficient syntactical decoding of code with large language models (LLMs). It leverages grammar of a programming language using an offline-constructed efficient lookup table called Deterministic Finite Automaton (DFA) mask store. SynCode seamlessly integrates with any context-free grammar (CFG) defined language, reducing syntax errors by 96.07% when combined with LLMs. 🌀 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. Chatbot Arena is an open platform designed to evaluate Large Language Models (LLMs) by considering human preferences. Utilizing a pairwise comparison method and crowdsourced input, it assesses LLMs' alignment with user preferences. The platform, operational for months with over 240K votes, provides a credible and valuable resource for ranking LLMs. Check out the tool here. 🌀 Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. The project aims to develop medical Large Language Models (LLMs) in the six most spoken languages, benefiting 6.1 billion people. This includes creating the ApolloCorpora multilingual medical dataset and the XMedBench benchmark, with Apollo models achieving top performance among models of similar sizes. The project will open-source training data, code, model weights, and evaluation benchmarks. You can check for the demo here. ✨ On the Radar: Catch Up on What's Fresh🌀 Top Artificial Intelligence (AI) Tools That Can Generate Code To Help Programmers (2024): The article discusses how AI is changing programming, with tools like OpenAI Codex and GitHub Copilot generating code. It explores AI's impact on code quality and development speed, showcasing various AI-powered tools like Tabnine, CodeT5, and Polycoder. Additionally, it mentions AI tools for code review, static code analysis, and AI-assisted coding in IDEs like PyCharm and Visual Studio. 🌀 Pypi mirror in a private AWS environment Terraform: This article explains how to install Python packages in an AWS Sagemaker Studio environment without internet access. It covers setting up Sagemaker in VPC Only mode, using VPC Endpoint interfaces for network communications, and accessing the Pypi package repository through AWS Codeartifact, which allows defining Pypi as an upstream repository. 🌀 Custom pre-commit hooks for safer code changes: This blog post explains the importance of using pre-commit hooks in software development, particularly with the git version control system. It discusses the challenges of maintaining coding standards in collaborative projects and provides a step-by-step tutorial on how to set up and use custom pre-commit hooks for a Python project, using the example of validating dataflow definitions for the Hamilton library. 🌀 AQLM Quantization Algorithm, explained: A new quantization algorithm, AQLM (Additive Quantization of Language Models), was recently released and integrated into HuggingFace Transformers and HuggingFace PEFT. AQLM sets a new state-of-the-art for 2-bit quantization while providing improvements for 3-bit and 4-bit ranges, pushing the boundaries of model accuracy and memory footprint. 🌀 Revolutionize Web Browsing with AI: This article explores creating an AI agent using the gpt-4-vision-preview model from OpenAI, enabling it to navigate the web like a human. It discusses the agent's browser control, content browsing, and decision-making processes, showcasing potential use cases such as aiding visually challenged users and automating web browsing tasks. 🌀 Understanding Tensors: Learning a Data Structure Through 3 Pesky Errors. This article discusses transitioning from managing tabular data to working with tensors in TensorFlow, offering debugging tips and code recipes. It covers visualizing TensorFlow datasets, understanding tensor specs, and augmenting model summaries, while addressing common errors related to tensor rank and shape. 🌀 Running RStudio Inside a Container: This tutorial focuses on setting up RStudio using Docker, particularly leveraging the Rocker RStudio image. It covers pulling the image, launching RStudio in a container, and ensuring persistence of data by using volume mapping. The tutorial provides step-by-step instructions and explanations for each stage. 🌀 PyTorch and MLX for Apple Silicon: The blog discusses Apple's MLX framework, which is optimized for Apple Silicon and serves as a bridge between PyTorch, NumPy, and Jax. It details a comparison between MLX and PyTorch through a custom convolutional neural network implementation for image classification tasks. The discussion includes insights into MLX's features, such as its array class, lazy computation, and compilation for performance optimization. The post also highlights the ease of converting PyTorch code to MLX, despite some differences in API compatibility and coding conventions. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support! 
Read more
  • 0
  • 0
  • 206
article-image-chatgpt-for-data-governance
Jyoti Pathak
22 Mar 2024
11 min read
Save for later

ChatGPT for Data Governance

Jyoti Pathak
22 Mar 2024
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe digital landscape is ever-accelerating. Here, data reigns supreme. The synergy between advanced technologies and effective governance practices is pivotal. ChatGPT, a revolutionary Artificial Intelligence marvel, is poised to transform the realm of data governance. Let us see ChatGPT's impact on data governance, explore its capabilities, unravel its applications, and understand how it stands as a beacon of innovation in AI-powered data management. We will delve into the heart of this transformative technology and discover the future of data governance, redefined by ChatGPT's prowess.                                                                                                                           SourceUnderstanding Data GovernanceData governance refers to managing, protecting, and ensuring high data quality within an organization. It involves defining policies, procedures, and roles to ensure data accuracy, privacy, and security.Best Practices for Data GovernanceThe best practices for data governance include:Define clear data governance policiesData Quality assuranceData classification and sensitivityMetadata managementData Security and encryptionCompliance with regulationsData access controlsData lifecycle managementData governance trainingData monitoring and auditingEthical considerationsCollaboration and communicationsThe 3 Key Roles of Data Governance1. Data Stewards:Data stewards act as custodians, overseeing the quality, integrity, and data compliance within an organization. They define and enforce data policies, ensuring data is accurate, consistent, and compliant with regulatory requirements. Think of them as the vigilant gatekeepers, ensuring that data remains trustworthy and reliable.Practical Example:Imagine a data steward in a financial institution verifying customer information. By meticulously cross-referencing data from various sources, they ensure the customer's details are accurate and consistent, preventing errors in financial transactions.2. Data Custodians:Data custodians offer technical implementation of data governance policies. They manage data storage, access controls, and encryption, safeguarding data against unauthorized access or tampering. Custodians are the architects behind the secure data infrastructure.Practical Example:A data custodian in a healthcare organization implements encryption protocols for patient records. This encryption ensures that sensitive patient data is protected, even if unauthorized access is attempted, maintaining confidentiality and compliance with data protection laws.3. Data Users:Data users are individuals or departments that utilize data for decision-making processes. They must adhere to data governance policies while extracting insights from data. Data users rely on accurate and reliable data to make informed choices, making them integral to the governance framework.Practical Example:Marketing professionals analyzing customer behavior data to tailor marketing campaigns are data users. By adhering to data governance policies, they ensure that the insights derived are based on trustworthy data, leading to effective and targeted marketing strategies.Data Governance ToolsData governance tools facilitate the implementation of governance policies. Let's explore some powerful data governance tools, including code snippets and practical insights, illuminating their transformative impact.                                                                                                                                Source1. Collibra: Unifying Data Governance EffortsPractical Insight: Collibra acts as a centralized hub, unifying data governance efforts across an organization. It enables collaboration among data stakeholders, streamlining policy management and ensuring consistent data definitions.Code Snippet: Automating Data Quality Checksimport collibra # Connect to Collibra API collibra.connect(api_key="your_api_key", base_url="https://collibra_instance/api") # Define data quality checks data_quality_checks = {    "Check for Missing Values": "SELECT COUNT(*) FROM table_name WHERE column_name IS NULL;",    # Add more checks as needed } # Execute data quality checks for check_name, sql_query in data_quality_checks.items():    result = collibra.execute_sql_query(sql_query)    print(f"{check_name}: {result}")2. IBM InfoSphere: Ensuring Data AccuracyPractical Insight: IBM InfoSphere offers advanced data profiling and data quality capabilities. It analyzes data sources, identifies anomalies, and ensures data accuracy, laying the foundation for trustworthy decision-making.Code Snippet: Data Profiling with IBM InfoSpherefrom ibm_infosphere import InfoSphereClient # Connect to InfoSphere client = InfoSphereClient(username="your_username", password="your_password") # Profile data from a CSV file data_profile = client.profile_data(file_path="data.csv") # Analyze profile results print("Data Profile Summary:") print(f"Number of Rows: {data_profile.num_rows}") print(f"Column Statistics: {data_profile.column_stats}")3. Apache Atlas: Navigating Data LineagePractical Insight: Apache Atlas enables comprehensive data lineage tracking. It visualizes how data flows through the organization, aiding compliance efforts and ensuring a clear understanding of data origins and transformations.Code Snippet: Retrieve Data Lineage Informationfrom apache_atlas import AtlasClient # Connect to Apache Atlas server atlas_client = AtlasClient(base_url="https://atlas_instance/api") # Get data lineage for a specific dataset dataset_name = "your_dataset" data_lineage = atlas_client.get_data_lineage(dataset_name) # Visualize data lineage graph (using a visualization library) visualize_data_lineage(data_lineage)How Can AI Be Used in Governance?Artificial Intelligence (AI) holds immense potential in enhancing governance processes, making them more efficient, transparent, and data-driven. Here are several ways AI can be used in governance, along with relevant examples and code snippets:● Automated Data AnalysisApplication: AI algorithms can analyze vast datasets, extracting meaningful insights and patterns to aid decision-making in governance.Example: Code Snippet for Automated Data Analysisimport pandas as pd from sklearn.ensemble import RandomForestClassifier # Load governance data governance_data = pd.read_csv("governance_data.csv") # Extract features and target variable X = governance_data.drop(columns=["outcome"]) y = governance_data["outcome"] # Train AI model (Random Forest Classifier) model = RandomForestClassifier() model.fit(X, y) # Make predictions for governance decisions predictions = model.predict(new_data)● Natural Language Processing (NLP) for Policy AnalysisApplication: NLP algorithms can analyze legal documents, policies, and public opinions, providing insights to policymakers.Example: Code Snippet for Policy Text Analysisimport nltk from nltk.sentiment.vader import SentimentIntensityAnalyzer # Sample policy text policy_text = "The new governance policy aims to enhance transparency and accountability." # Sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) analyzer = SentimentIntensityAnalyzer() sentiment_score = analyzer.polarity_scores(policy_text) print("Sentiment Score:", sentiment_score)● Predictive Analytics for Resource AllocationApplication: AI models can predict trends and demands, enabling governments to allocate resources efficiently in healthcare, transportation, or disaster management.Example: Code Snippet for Predictive Resource Allocationimport pandas as pd from sklearn.linear_model import LinearRegression # Load historical data (e.g., healthcare admissions) historical_data = pd.read_csv("historical_data.csv") # Extract features and target variable X = historical_data.drop(columns=["resource_allocation"]) y = historical_data["resource_allocation"] # Train AI model (Linear Regression for prediction) model = LinearRegression() model.fit(X, y) # Predict resource allocation for future scenarios predicted_allocation = model.predict(new_data)● Chatbots for Citizen EngagementApplication: AI-powered chatbots can handle citizen queries, provide information, and offer assistance, improving public services.Example: Code Snippet for Chatbot Implementationfrom chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer # Initialize chatbot chatbot = ChatBot("GovernanceBot") # Train chatbot with corpus data trainer = ChatterBotCorpusTrainer(chatbot) trainer.train("chatterbot.corpus.english") # Get response for citizen query citizen_query = "How to pay property taxes online?" response = chatbot.get_response(citizen_query) print("Chatbot Response:", response)● Fraud Detection and SecurityApplication: AI algorithms can detect patterns indicative of fraud or security breaches, enhancing the integrity of governance systems.Example: Code Snippet for Fraud Detectionfrom sklearn.ensemble import IsolationForest # Load transaction data transaction_data = pd.read_csv("transaction_data.csv") # Extract features X = transaction_data.drop(columns=["transaction_id"]) # Detect anomalies using Isolation Forest algorithm model = IsolationForest(contamination=0.05) anomalies = model.fit_predict(X) # Identify and handle potential fraud cases fraud_cases = transaction_data[anomalies == -1] Example Code Snippet: AI-Powered Anomaly Detection from sklearn.ensemble import IsolationForest # Assume 'X' is the feature matrix model = IsolationForest(contamination=0.1) anomalies = model.fit_predict(X) print("Anomalies Detected:\n", anomalies)How Does AI Affect Data Governance?AI affects data governance by automating tasks related to data management, analysis, and compliance. Machine learning algorithms can process large datasets, identify trends, and predict potential governance issues. AI-driven tools enable real-time data monitoring, allowing organizations to proactively address governance challenges ensuring that data remains accurate, secure, and compliant with regulations.Example Code Snippet: AI-Driven Predictive Analyticsfrom sklearn.linear_model import LinearRegression # Assume 'X' is the feature matrix and 'y' is the target variable model = LinearRegression() model.fit(X, y) # Predict future values using the trained AI model future_data = prepare_future_data()  # Function to prepare future data predicted_values = model.predict(future_data) print("Predicted Values:\n", predicted_values)Critical Role of Data Governance in AIData governance plays a pivotal role in shaping the trajectory of Artificial Intelligence (AI) applications, influencing their accuracy, reliability, and ethical implications.Let's explore why data governance is indispensable for AI, illustrated through practical examples and code snippets.1. Ensuring Data Quality and AccuracyImportance: Inaccurate or inconsistent data leads to flawed AI models, hindering their effectiveness.Example: Code Snippet for Data Cleaningimport pandas as pd # Load dataset data = pd.read_csv("raw_data.csv") # Handle missing values data_cleaned = data.dropna() # Handle duplicates data_cleaned = data_cleaned.drop_duplicates() # Ensure consistent data formats data_cleaned['date_column'] = pd.to_datetime(data_cleaned['date_column'])2. Addressing Bias and Ensuring FairnessImportance: Biased data can perpetuate discrimination in AI outcomes, leading to unfair decisions.Example: Code Snippet for Bias Detectionfrom aif360.datasets import CompasDataset from aif360.algorithms.preprocessing import Reweighing # Load dataset dataset = CompasDataset() # Detect and mitigate bias privileged_group = [{'race': 1}] unprivileged_group = [{'race': 0}] privileged_groups = [privileged_group] unprivileged_groups = [unprivileged_group] rw = Reweighing(unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups) dataset_transformed = rw.fit_transform(dataset)3. Ensuring Data Security and PrivacyImportance: AI often deals with sensitive data; governance ensures its protection.Example: Code Snippet for Data Encryptionfrom cryptography.fernet import Fernet # Generate encryption key key = Fernet.generate_key() cipher_suite = Fernet(key) # Encrypt sensitive data encrypted_data = cipher_suite.encrypt(b"Sensitive information")4. Promoting Ethical Decision-MakingImportance: Ethical considerations shape AI’s impact on society; governance ensures ethical use.Example: Code Snippet for Ethical AI Policy Implementationdef check_ethical_guidelines(decision):    ethical_guidelines = ["fairness", "transparency", "accountability"]    if any(keyword in decision for keyword in ethical_guidelines):        return True    else:        return False decision = "Implement AI system with transparency." is_ethical = check_ethical_guidelines(decision)5. Adhering to Regulatory ComplianceImportance: Compliance with regulations builds trust and avoids legal repercussions.Example: Code Snippet for GDPR Compliancefrom gdpr_utils import GDPRUtils # Check GDPR compliance user_data = {    "name": "John Doe",    "email": "john.doe@example.com",    "age": 30,    # ... other user data fields } is_gdpr_compliant = GDPRUtils.check_compliance(user_data)Data governance is the cornerstone, ensuring that AI technologies are innovative but also ethical, secure, and reliable. By implementing robust data governance frameworks and integrating ethical considerations, organizations can unleash the full potential of AI, fostering a future where technological advancements are not just groundbreaking but also responsible and beneficial for all.ConclusionAs organizations grapple with the complexities of data management, ChatGPT stands tall, offering a sophisticated solution that transcends boundaries. Its ability to automate, analyze, and assist in real-time reshapes the landscape of data governance, propelling businesses into a future where informed decisions, ethical practices, and compliance are seamlessly intertwined. With ChatGPT at the helm, data governance is not merely a task; it becomes a strategic advantage, empowering enterprises to harness the full potential of their data securely and intelligently. Embrace the future of data governance with ChatGPT, where precision meets innovation and where data is not just managed but masterfully orchestrated for unparalleled success.Author BioJyoti Pathak is a distinguished data analytics leader with a 15-year track record of driving digital innovation and substantial business growth. Her expertise lies in modernizing data systems, launching data platforms, and enhancing digital commerce through analytics. Celebrated with the "Data and Analytics Professional of the Year" award and named a Snowflake Data Superhero, she excels in creating data-driven organizational cultures.Her leadership extends to developing strong, diverse teams and strategically managing vendor relationships to boost profitability and expansion. Jyoti's work is characterized by a commitment to inclusivity and the strategic use of data to inform business decisions and drive progress.
Read more
  • 0
  • 0
  • 353

article-image-ai-distilled-39-unpacking-mistral-large-googles-gemini-challenges-and-copilot-enterprise
Kartikey Pandey
21 Mar 2024
9 min read
Save for later

AI_Distilled #39: Unpacking Mistral Large, Google's Gemini Challenges, and Copilot Enterprise

Kartikey Pandey
21 Mar 2024
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Print to Pixel: Optimize your learning experience with PacktSeveral research studies have proven that printed books enhance comprehension, with the tactile experience of flipping pages and annotating the margins adding depth to the learning experience. However, developers can't overlook the practical benefits of eBooks, such as quickly finding relevant information or carrying an entire library on a single device.Acknowledging the unique benefits of both formats, Packt is offering a 40% discount on all print books, plus a free eBook version of each purchase, from February 26th to February 29th.Here’s what’s included:A Vast Library: Enjoy 40% off on over 5,000 titles spanning topics from Cybersecurity to Generative AI.Complimentary eBook: Each print book purchase includes a free eBook.AI Assistant: Top 500 books come with a personalized AI that can simply complex topics to your learning style, offering an interactive learning experience.Start Building Your Tech Library Today!👋 Hello,“No Al is perfect, especially at this emerging stage of the industry’s development, but we know the bar is high for us and we will keep at it for however long it takes.”-Sundar Pichai, Google CEOPichai acknowledges problems with Gemini AI, stressing the importance of unbiased information for users, and outlining steps to address issues and improve products. A rapidly progressing industry, AI development is a tricky game to master, with numerous pitfalls along the way.Greetings readers! Our mission is to help you stay on top of the ever-changing AI landscape so you can advance your skills. Let’s get started with the latest news and developments across the AI field:Microsoft provides new LLM Mistral Large on Azure with Mistral AIGoogle accepts some responses from their Gemini were unacceptable and biasedGitHub has launched Copilot Enterprise coding assistant integrating throughout the software development processResearchers developed new optimized language models called MobileLLM for mobile devices with under a billion parametersResearchers at Microsoft have developed new techniques to improve visual language modelsWe’ve also got you your fresh dose of GPT and LLM secret knowledge and tutorials:Mastering the Art of Prompt CraftingBreaking Down How Large Language Models LearnUsing AI to Level Up Live GamesMonitoring Large Language Models on AWSLast but not least, don’t miss out on the hands-on strategies and tips straight from the AI community for you to use on your own projects:Fine-Tuning Models for Speech Recognition Made SimpleMake Conversation Come Alive - Deploying Your Own AI Chat PartnerCombining Geospatial and Semantic Data to Build Powerful Search ToolsLeveraging Notion, Supabase and AI for Knowledge RetrievalWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers,  Kartikey Pandey  Editor-in-Chief, Packt  Unleash Your Data Potential with Packt's Latest Titles and Platform Enhancements! In a world that's always changing, learning is key to success. At Packt, we've updated our learning platform to help you stay ahead in the fast-moving tech world. Our platform makes learning easier and more effective, helping you overcome challenges and achieve your goals. Boost Your Data Skills with Packt's DataPro Library: On-Demand Learning: Access a wide range of books, video courses, research papers, and articles to help you grow. AI Assistance: Get help from AI to understand complex concepts easily, all within the same learning environment.Personalized Dashboard: Enjoy a tailored learning experience with recommendations and insights just for you. Advanced Self-Assessment: Use the latest tools to identify what you need to learn and track your progress accurately. Vibrant Community: Join a community of data and AI enthusiasts on Discord for collaboration and knowledge sharing. Exclusive Access: Be part of the DataPro beta program for a chance to win Amazon gift cards and early access to new features. Value for Money: Get all these benefits for just $7.99 per month, a small investment for big gains in your careerEnhance Your Data Skills Today⚡ TechWave: AI/GPT News & AnalysisMicrosoft has partnered with Mistral AI to provide their new LLM Mistral Large on Azure cloud services. This state-of-the-art AI model offers advanced NLP capabilities. Several companies have praised Mistral Large's performance in increasing productivity and aiding innovation.Google's CEO recently said some responses from their AI model Gemini were unacceptable and biased. The company has been working to address these issues and sees improvements but will review what happened. They plan to relaunch Gemini in the coming weeks after fixing it.GitHub has launched Copilot Enterprise, an AI coding assistant that integrates throughout the software development process. It provides customized code suggestions based on an organization's codebase, answers questions about internal systems, and generates summaries of code changes. Early testing found massive productivity gains from such AI tools.Researchers have developed new optimized language models for mobile devices with under a billion parameters. Called MobileLLM, the models achieve higher accuracy than previous smaller models through innovative architecture and weight-sharing techniques. MobileLLM shows significant gains on conversation tasks and competes with much larger models for common on-device uses.Researchers at Microsoft have developed new techniques to improve visual language models using structured knowledge graphs. By incorporating relationship maps between image elements like objects and attributes, models can generate richer images from text descriptions. Hierarchical prompting and dual-path encoding methods were also introduced to help models better understand complex language.🌟 Secret Knowledge: AI/LLM Resources🌀 Mastering the Art of Prompt Crafting: Got a new NLP project that needs prompting? This guide covers the basics of effective prompt engineering for AI models like ChatGPT. Learn how clarity, conciseness, and context can improve responses. Also explore techniques like zero-shot learning and dynamic few shots, plus how temperature, top-p, and other settings can refine your model's "personality". From system messages to tailoring examples, these tips will help you leverage your LLMs' full potential.🌀 Breaking Down How Large Language Models Learn: This article provides a helpful breakdown of how LLMs are trained through causal language modeling and calculates loss. It visually explains how models generate text sequences, are pre-trained to predict the next token, and how cross-entropy loss compares predictions to true labels to update weights. The process is demonstrated through code showing how loss is manually calculated for an LLM matching the framework's automatic calculation. This gives developers valuable insights into how state-of-the-art models learn.🌀 Using AI to Level Up Live Games: This article discusses how generative AI can enhance live service games. Techniques like adaptive gameplay, personalized ads, and faster asset creation are described. The authors provide a framework for developing games using tools like Unity, GKE, and Vertex AI. They demonstrate how ML models can dynamically generate images, code and dialogue to customize the player experience. Whether deploying models on GKE or Vertex, cloud-based AI brings the benefits of lower costs and easier maintenance than self-hosted options. 🌀 Monitoring Large Language Models on AWS: As AI language models grow more advanced, ensuring they behave properly becomes more important. This article discusses techniques for monitoring LLMs deployed on AWS. Key metrics covered include semantic similarity of responses, sentiment analysis, refusal rates, and more. The proposed architecture takes in model outputs, runs metrics modules, and reports results to CloudWatch for aggregation and alerts. With the right monitoring in place, you can help keep your conversational AI acting as intended.🔛 Masterclass: AI/LLM Tutorials🌀 Fine-Tuning Models for Speech Recognition Made Simple: This article discusses how to fine-tune LLMs for automatic speech recognition tasks using Amazon SageMaker. It explains language models and ASR as well as the basic steps for fine-tuning a pre-trained model which includes preparing data, choosing a model, training, evaluating, and deploying. SageMaker is highlighted as a powerful yet easy-to-use platform for this process due to its scalability, integration with AWS services, and pay-as-you-go pricing.🌀 Make Conversation Come Alive - Deploying Your Own AI Chat Partner: Tired of boring chatbots? This guide shows you how to bring the amazing Qwen AI model to your own server so you can have engaging discussions on any topic. The steps cover setting up your environment, installing dependencies, initializing the tokenizer and model, and using history to keep conversations flowing naturally. Once complete, you'll have a powerful AI assistant right at your fingertips. Best of all, it's completely open source.🌀 Combining Geospatial and Semantic Data to Build Powerful Search Tools: This guide shows developers how to create an interactive campground search map using vector databases, NLP models, and geospatial data. Technologies like Qdrant, Llama2, and Streamlit allow embedding text and locations to enable semantic queries. The page explains setting up Qdrant cloud, loading campground CSV data, and parsing text into nodes. Developers can then embed nodes with HuggingFace and query the vector store to retrieve similar results. By leveraging tools that understand both spatial and semantic context, you can build customized applications to help users explore outdoor destinations.🌀 Leveraging Notion, Supabase, and AI for Knowledge Retrieval: This tutorial shows how you can build a knowledge base by extracting data from Notion databases and storing it in a vector format in Supabase. It then demonstrates retrieving relevant information from the knowledge base using an AI model from OpenAI. By combining these tools, developers can query custom datasets and generate responses based on retrieved documents. The process involves loading Notion documents, storing embeddings in Supabase, and setting up a retrieval pipeline. With some enhancements, this could be a powerful way to access organizational information.🚀 HackHub: Trending AI Tools🌀 lucky-lance/expert_sparsity: Implements efficient expert pruning and dynamic skipping techniques for mixture-of-experts large language models to improve their efficiency and speed while maintaining strong performance.🌀 facebookresearch/pearl: This open-source library provides a modular reinforcement learning framework for building and training production-ready AI agents, empowering developers with state-of-the-art techniques.🌀 zhen-tan-dmml/llm4annotation: Curates papers on using LLMs for data annotation, which developers could reference to apply these techniques or learn about the current state of the art.🌀 google/gemma.cpp: Provides a lightweight C++ library for running Google's Gemma models that developers can easily integrate into their own projects for experimenting with and deploying LLMs.
Read more
  • 0
  • 0
  • 1285