PythonPro | 23 articles | Packt Newsletter Hub

04 Mar 2025

PythonPro #65: PyPy v7.3.19 Updates, FastRTC for AI Communication, and AutoML with mljar-supervised

04 Mar 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#65PyPy v7.3.19 Updates, FastRTC for AI Communication, and AutoML with mljar-supervisedHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the book, Learn Quantum Computing with Python and IBM Quantum, Second Edition, which describes how to visualize quantum circuits using Qiskit's circuit_drawer function.News Highlights: PyPy v7.3.19 fixes JIT bugs and introduces Python 3.11 beta, FastRTC launches for AI-driven real-time communication, Google's free Gemini-powered agent automates data analysis on Colab, and mljar-supervised automates the full ML pipeline with its new AutoML framework.My top 5 picks from today’s learning resources:The Secret Life of__init__.py : Why This Tiny File Holds the Key to Python’s Magic🔑Demand Forecasting with Darts: A Tutorial🎯It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software⏳uv + Ray: Pain-Free Python Dependencies in Clusters🌐Modern Good Practices for Python Development🛠️And, in From the Cutting Edge, we introduce evclust, a Python library that enhances traditional clustering methods by incorporating the Dempster-Shafer theory to effectively manage and represent uncertainty in cluster memberships.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPyPy v7.3.19 released: The release primarily addresses JIT-related bug fixes and introduces a Python 3.11 beta interpreter, alongside continued support for Python 2.7 and 3.10, with plans to drop support for 3.10 in the next update.FastRTC: The Real-Time Communication Library for Python: The library simplifies building audio and video AI applications with features like automatic voice detection, a built-in Gradio UI, and the capability to connect via phone.Google launches free Gemini-powered Data Science Agent on its Colab Python platform: The free Gemini 2.0-powered agent will automate data analysis with AI-generated Jupyter notebooks to streamline workflows for researchers, data scientists, and developers.AutoML Open Source Framework with Python API and GUI: The framework, mljar-supervised, automates the entire ML pipeline, including data preprocessing, feature engineering, model selection, and hyperparameter tuning.💼Case Studies and Experiments🔬It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software: Systematically analyzes date and time computation bugs in Python projects using data from GitHub and suggests improvements in software practices.I Uploaded a 27-Year-Old EXE File to Claude 3.7 and What Happened Next Blew My Mind: Describes an experience with Claude 3.7, which successfully analyzed and converted a Visual Basic EXE into a functional Python application using Pygame.📊AnalysisEmbedding Python in Elixir, it's Fine: Analyzes the integration of Python into Elixir through Pythonx, enhancing interoperability and functionality within Elixir's ecosystem.A peek into a possible future of Python in the browser: Discusses breakthroughs in running Python on the web, focusing on the SPy project which seeks to compile Python-like code into WebAssembly.🎓Tutorials and Guides🤓Performing K-means Clustering with Python and Scikit-learn: Explains the concept and steps involved in K-means clustering, including choosing the number of clusters, assigning data points to the nearest cluster, updating cluster centers, and assessing convergence using the inertia metric.The Secret Life of__init__.py : Why This Tiny File Holds the Key to Python’s Magic: Explains the purpose and functionality of the __init__.py file in Python, highlighting its crucial role in treating directories as packages and organizing modules efficiently within a Python project.Controlling Ableton Live with Python: Provides step-by-step instructions for setting up MIDI in Ableton using the IAC driver, sending MIDI commands using the rtmidi library, generating melodies with Markov chains, and more.Project Setup with Python: Covers modern practices for Python project setup, including using pyproject.toml for configurations, src layout for directory structures, virtual environments for development, and the use of requirements files for package management.Affinity Propagation with Python and Scikit-learn: Explores both theoretical aspects of the algorithm—such as how it mimics social group formation and determines cluster numbers—and practical implementation steps, including creating and running an example model.How to deploy Python or Flask apps on Plesk: Covers installing mod_python and Phusion Passenger, updating Plesk components, configuring Python support in domain settings, adding WSGI application settings to your code, and managing Apache and Nginx settings.Demand Forecasting with Darts: A Tutorial: Offers a comprehensive tutorial on demand forecasting using Python and Darts, focusing on the TiDE and TFT models for retail scenarios.🔑Best Practices and Advice🔏uv + Ray: Pain-Free Python Dependencies in Clusters: Discusses how the integration of the uv package manager with Ray enhances Python dependency management in distributed systems by enabling consistent and efficient environment setup across cluster nodes.Modern Good Practices for Python Development: Covers code formatting, linting, type hinting, and testing primarily with pytest, alongside packaging advice and the use of data classes, enums, f-strings, and datetime objects.Counting: How Hard Can it Be?: Explores common misunderstandings in object identity and equality, using the simple act of counting biscuits as a teaching tool.Hello FastHTML and MonsterUI: Introduces Jeremy Howard's FastHTML and Isaac Flath's MonsterUI, Python-based tools that simplify building modern, responsive web applications with features for easy prototyping and scalability.Python packaging: Why we can't have nice things: Strongly advises against using Pip with administrative privileges and recommends the exclusive use of wheel distributions to enhance security and reduce the risk of running unintended code.🔍From the Cutting Edge: evclust—Python library for evidential clustering💥In "evclust: Python library for evidential clustering," Soubeiga and Antoine present evclust, a Python library which enables evidential clustering. The approach advances traditional clustering methods by integrating the Dempster-Shafer theory to manage uncertainty in cluster memberships.ContextEvidential clustering extends traditional clustering methods by allowing objects to belong to multiple clusters, each with varying degrees of belief quantified using mass functions. This approach leverages the Dempster-Shafer Theory, a mathematical framework designed for modeling and reasoning with uncertainty. By incorporating this theory, evclust quantifies and manages the uncertainty of cluster memberships more effectively than traditional hard or fuzzy clustering methods. The result is a Credal Partition, a structured set of mass functions that represents uncertain cluster memberships and enables a more nuanced interpretation of data. This framework is particularly valuable in applications where uncertainty or overlap between clusters is a concern, providing a robust tool for complex data analysis scenarios.Key Features of evclustComprehensive Library: Includes state-of-the-art evidential clustering algorithms and tools for analysis.Integration with Python Ecosystem: Works seamlessly with libraries like numpy, pandas, matplotlib, and scikit-learn.Visualization and Evaluation Tools: Facilitates the understanding of credal partitions through various visual and analytical tools.Cross-platform Compatibility: Supports Windows, macOS, and Linux operating systems.Open Source: Available under the MIT License with ongoing community contributions on GitHub.What This Means for Youevclust is particularly relevant for data scientists, researchers, and developers involved in data analysis and clustering where uncertainty and ambiguity are factors. It provides a robust framework for enhancing traditional clustering methods with the capability to manage and represent uncertainty effectively.Examining the DetailsThe evclust library's architecture supports a variety of evidential clustering algorithms, catering to different data types and clustering complexities. The algorithms like Evidential c-Means (ECM), Relational Evidential c-Means (RECM), and Credal c-Means (CCM) extend existing clustering methodologies to handle uncertainty in data more effectively. These methods incorporate the Dempster-Shafer theory to assign belief levels to cluster memberships, offering a nuanced interpretation compared to hard or fuzzy clustering. This approach allows handling complex uncertainty patterns and better management of overlapping cluster memberships and outliers. The library's comprehensive toolset for visualizing, evaluating, and analyzing credal partitions helps in making more informed decisions based on clustering results.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 8: Optimizing and Visualizing Quantum Circuits” in the book, Learn Quantum Computing with Python and IBM Quantum, Second Edition by Robert Loredo, published in February 2025.Visualizing and enhancing circuit graphsThis section will focus on the various visualizations available in Qiskit. The graphs we have been using so far were from the default visualization library in Qiskit. However, we can specify other drawing tools that may be better suited for your documentation purposes. Say,for example, that you are authoring a research paper withLaTeXand youwant to use the LaTeX content.By simply adding style parameters from the Qiskit visualization library, you can then leverage the many features included with the visualization library. We’ll cover a few of those now to get you started.Learning about customized visual circuitsWhen rendering a circuit, it is often necessary or convenient to have the results in a format that suits the format of your document. It’s here where the Qiskit circuit_drawer comes in handy with various features. Let’s begin with a simple quantum circuit to illustrate the various visual rendering examples:First, let’s create a quantum circuit with various operators to get a good representation of all the visual components in the various formats:# Sample quantum circuitqc = QuantumCircuit(4)qc.h(0)qc.cx(0,1)qc.barrier()qc.cx(0,2)qc.cx(0,3)qc.barrier()qc.cz(3,0)qc.h(0)qc.measure_all()# Draw the circuit using the default renderercircuit_drawer(qc, output='mpl')This will render the following circuit drawing, which is just a random representation of gates. This circuit does not do anything special; it’s just used to represent various components. As an option, you can use therandom_circuitmethod to create a random circuit:Figure 8.17: Circuit rendering using the default libraryNext, we will render the preceding circuit usinglatex:circuit_drawer(qc, output='latex')This will render thelatexversion of the circuit:If you’re running this on your local machine and not on the platform, you may have some warnings or errors indicating you need to install some file dependencies, such as installingpylatexenc. To install this library you will need to runpip install pylatexencin a cell first, and then restart the kernel.Figure 8.18: Circuit rendering using the latex libraryIf you are planning to post your circuit onto a website, blog, or social media and would like to include some styles on the image, you can do that as well by passing in the style contents as a parameter, such asbackgroundcolor,gatetextcolor, andfontsize, just to name a few:# Define the style to render the circuit and componentsstyle = {'backgroundcolor': 'lightblue','gatefacecolor': 'white', 'gatetextcolor': 'black', 'fontsize': 9}# Draw the mpl with the specified stylecircuit_drawer(qc, style=style, output='mpl')The preceding code results in adjusting the background, gate color schemes, and font size, as illustrated here:Figure 8.19: Rendered circuit with the custom style dictionary on matplotlibTo use the style setting, you must use the outputmatplotlibas this is the only library that supports the styles.Note:Details on the available list of styles can be found in theStyle Dict Detailssection of the Qiskit API documentation (https://docs.quantum-computing.ibm.com/api/qiskit/qiskit.visualization.circuit_drawer).Learn Quantum Computing with Python and IBM Quantum, Second Editionwas published in February 2025. Packt library subscribers can continue reading the entire book for free.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
1586

Divya Anne Selvaraj

25 Feb 2025

PythonPro #64: ChatGPT Explains Shapiro-Wilk Test, Python VS Code Updates, and dtaianomaly for Time Series

Divya Anne Selvaraj

25 Feb 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#64ChatGPT Explains Shapiro-Wilk Test, Python VS Code Updates, and dtaianomaly for Time SeriesJoin Snyk's Fetch the Flag CTF 2025!Snyk’s annual Capture the Flag event, Fetch the Flag 2025, is back! Hosted by Snyk and John Hammond, the event will take place on February 27, 2025, from 9am ET to 9pm ET. Engage in 20+ real-world security challenges, enhance your professional development & earn CPE credits.Register for free!Hi ,In today’sExpert Insight we bring you an excerpt from the book, Coding with ChatGPT and Other LLMs, which discusses using LLMs like ChatGPT to explain and document code, specifically explaining the Shapiro-Wilk normality test code.News Highlights: Python in Visual Studio Code – February 2025 Release adds no-config debugging, test discovery cancellation, Native REPL, and AI code actions; CLI2 v4 introduces a simplified CLI, Sphinx integration, and 12-factor config; and Python docs expose risks in legacy web development with insecure CGI code.My top 5 picks from today’s learning resources:Introduction to CUDA Programming for Python Developers🖥️Fine Tuning a Python Code Completion Model🔧Slashing my ".bashrc" in half⚡20 Important Statistical Approaches Every Data Scientist Knows📊Python does a super weird thing that puts methods on the wrong object🤔And, in From the Cutting Edge, we introduce dtaianomaly, an open-source Python library that standardises and simplifies time series anomaly detection for both researchers and industry, offering a scikit-learn-inspired API, extensibility for developing new models, and tools for preprocessing, benchmarking, and visual analysis.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython in Visual Studio Code – February 2025 Release: This release of Python, Pylance, and Jupyter extensions introduces no-config debugging, test discovery cancellation, Native REPL launch, Go to Implementation with Pylance, and an experimental AI code action for generating symbols.Cli2Python Automation framework v4 released: The Python automation framework for DevOps includes features like a simplified CLI alternative, Sphinx integration for documentation, and a 12-factor configuration library.Python's official documentation contains textbook example of insecure code (XSS): The module with the insecure CGI code example is deprecated in Python 3.11 and removed in 3.13, but exposes risks in legacy Python web development.💼Case Studies and Experiments🔬Lines, Scribbles, and an Algorithm: AI in Sleep Analysis and Brain Mapping: Explores how Python and machine learning, using libraries like MNE and Random Forest Classifiers, can be applied to EEG data for classifying sleep stages and analyzing brain activity.Antivirus Bypass Method for Metasploit Tools: Details a method for bypassing antivirus detection using Metasploit tools, including creating a Python-based Meterpreter payload with msfvenom, obfuscating it through encoding and compression techniques, and configuring msfconsole for stealth and persistence.📊AnalysisWhen Maps Made Me Dumb, Valuable Geospatial Insights Rescued Me: Explores how Python’s GeoPandas library, along with tools like QGIS, can help accurately measure geographic areas and understand the distortions caused by map projections like Mercator.Python does a super weird thing that puts methods on the wrong object: Explores why Python places the join method on strings instead of lists, examining its design, historical context, and alignment with Python's philosophy.🎓Tutorials and Guides🤓Introduction to CUDA Programming for Python Developers: Explains GPU architecture, parallel processing, and the CUDA platform for optimizing computations, especially for deep learning.Building an Open, Multi-Engine Data Lakehouse with S3 and Python: Explains how to set up a Python-first, open lakehouse on cloud storage, integrate multiple query engines, and handle metadata management using Iceberg and Snowflake's Open Catalog.Fine Tuning a Python Code Completion Model: Details the process of fine-tuning the smolLM 135M model for Python code completion using LoRA, with an iterative approach to optimize hyperparameters, achieving efficient training and strong code generation performance.Train a Small GPT-Style LLM from Scratch: Provides a Jupyter Notebook for training a small GPT-style, decoder-only language model from scratch using PyTorch, covering tokenization, positional encoding, self-attention, and more.Automating FrameBlender with AI Agents and LangGraph: Explains how to automate a previously interactive system with multi-agent workflows using LangGraph to improve scalability and efficiency.UV — An Ultimate Python Package Manager: The Beginner's Guide: Covers how to install and use UV for managing Python environments, packages, and projects efficiently, along with advanced features like multi-version support and isolated tool execution.Synthetic Data(What, Why, When, and How?): Defines synthetic data, its use cases in scenarios like healthcare and finance, and provides a Python code snippet to generate synthetic data using libraries like Faker, Pandas, and xlsxwriter.🔑Best Practices and Advice🔏Slashing my ".bashrc" in half: Covers how the author improved their shell configuration by replacing numerous plugins with efficient, modern Rust-based tools, resulting in a faster, leaner, and easier-to-maintain ".bashrc."Benchmarking utility for Python: Details how Go’s built-in testing framework simplifies benchmarking, andintroduces a custom utility function, autobench, to replicate Go's automatic loop count and repetition handling in Python.10 machine learning mistakes and how to avoid them: Outlines issues like AI hallucinations, model bias, poor data quality, and integration challenges, and offers insights on how to avoid these.pytest snippets for python 3.11+: Provides useful snippets focusing on asynchronous test support, integration test handling, and SQLAlchemy async support with factory-boy.20 Important Statistical Approaches Every Data Scientist Knows: Covers methods like Bayesian Inference, Maximum Likelihood Estimation, and Principal Component Analysis, with Python code snippets to illustrate each technique.🔍From the Cutting Edge: dtaianomaly— A Python library for time series anomaly detection💥In "dtaianomaly: A Python library for time series anomaly detection," Carpentier et al. introduce an open-source Python library for time series anomaly detection. The library aims to bridge the gap between cutting-edge research and practical applications in business and industry.ContextTime series anomaly detection (TSAD) involves identifying outliers or abnormal observations in time series data. Accurate anomaly detection is crucial in various fields, such as healthcare, finance, and IoT. However, many existing tools struggle to effectively integrate state-of-the-art research with practical industrial applications. dtaianomaly addresses this issue by simplifying anomaly detection for practitioners while providing a framework for researchers to extend the tool with new methods.Key concepts in this domain include anomaly detection, which refers to the identification of unusual patterns in data; time series data, which consists of data points indexed by time; and APIs, which offer a standard interface for communication between software systems.Key FeaturesStandardised API: Provides consistency for new users and compatibility with tools like scikit-learn.Extensibility: Designed to allow easy addition of new anomaly detection methods.Experimental Validation: Simplifies benchmarking, enabling comparison of models with minimal code.Range of Anomaly Detectors: Includes various built-in detectors, such as Matrix Profile.Time Series Preprocessing: Provides tools for scaling, resampling, and smoothing data.Visual Analysis Tools: Visualisation techniques for qualitative assessment of models.Confidence Prediction: Estimates the confidence level of anomaly scores.Profiling: Supports runtime and memory profiling, useful for evaluating edge-device suitability.Cross-Platform Unit Testing: Ensures robustness with thorough testing across platforms.What This Means for Youdtaianomaly is relevant to both researchers and industry practitioners. Researchers can use it to rapidly implement and test new algorithms, while businesses can apply its established tools for practical anomaly detection tasks, such as in sensor data or financial transactions. The library's user-friendly documentation and standardised API make it accessible to newcomers, while its extensibility makes it valuable for ongoing research.Examining the Detailsdtaianomaly uses an object-oriented design similar to scikit-learn. Its core component, the BaseDetector, standardises the interface for all anomaly detection models, making it easy to integrate new methods. The library supports time series preprocessing, preparing data before anomaly detection, and offers an automated workflow for large-scale experimental validation. The LazyDataLoader ensures efficient handling of large datasets, while the Pipeline object allows for chaining preprocessing and detection steps. Evaluation is facilitated through both quantitative metrics (e.g., AUC, precision, recall) and qualitative visualisations.The library also supports runtime and memory profiling, helping determine if a model is suitable for deployment on edge devices, which is particularly useful for resource-constrained industries. Its cross-platform compatibility and continuous testing ensure reliability.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 3: Grounding Responses” in the book, Coding with ChatGPT and Other LLMs by Dr. Vincent Austin Hall.Documenting codeOften, you’ll come across code that someone else (including an earlier version of you) has written and it’ll not make sense to you. You might have to go through each line and function and class, running it in your mind and testing each bit of code inan IDE.Fortunately, there is now anotheroption: LLMs or chatbots. You can get one to be your personal code translator to clarify what is going on.You can put some code in the chat window and ask the bot to explain what it does. You will generally also get detailed explanations when the chatbots giveyou code.Let’s get ChatGPT and to explain some codeStatistical tests are useful for statisticians and data scientists, so, let’s do some. Jason Brownlee can help here [Brownlee_17Stats]. What is aShapiro-Wilk test?Here is some code in Python for this, from 17 Statistical Hypothesis Tests in Python (Cheat Sheet):# Example of the Shapiro-Wilk Normality Testfrom scipy.stats import shapirodata = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]stat, p = shapiro(data)print('stat=%.3f, p=%.3f' % (stat, p))if p > 0.05:print('Probably Gaussian')else:print('Probably not Gaussian')What does ChatGPT sayabout this?Here isthe prompt:Please explain this code to me, I don't understand it at all.Here is ChatGPT’sresponse:(I put it into a markdown interpreter to get it into a nice style, without the **around theheadings** [Dillinger].)Certainly! The provided code is an example of how to use the Shapiro-Wilk Normality Test in Python, specifically with thescipy.statslibrary. This statistical test is used to determine whether a given sample follows a Gaussian (normal) distribution. Here’s a breakdown ofthe code:# Importing the Shapiro-Wilk test from the scipy.stats modulefrom scipy.stats import shapiro# Creating a sample dataset (in this case, a list of numbers)data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]# Applying the Shapiro-Wilk test to the datasetstat, p = shapiro(data)# Printing the test statistic and the p-valueprint('stat=%.3f, p=%.3f' % (stat, p))# Checking the p-value and making a decision based on a significance level (e.g., 0.05)if p > 0.05: print('Probably Gaussian') # If p > 0.05, we fail to reject the null hypothesis, suggesting the data is probably Gaussian.else: print('Probably not Gaussian') # If p <= 0.05, we reject the null hypothesis, suggesting the data is probably not Gaussian.Here’s astep-by-stepexplanation:Importing the Shapiro-Wilktest function:from scipy.stats import shapiroThis line imports theshapirofunction from thescipy.statsmodule.Defining asample dataset:data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]This line creates a list named data containing a set ofnumerical values.Applying theShapiro-Wilk test:stat, p = shapiro(data)This line applies the Shapiro-Wilk test to the dataset, and the results are stored in two variables: stat (the test statistic) and p (the p-value).Printingthe results:print('stat=%.3f, p=%.3f' % (stat, p))This line prints the test statistic and the p-value with threedecimal places.Interpretingthe results:if p > 0.05: print('Probably Gaussian')else: print('Probably not Gaussian')Based on a significance level (0.05is commonly used), this code prints whether the dataset is probably Gaussian or probably not Gaussian. Ifp > 0.05, we fail to reject the null hypothesis, suggesting the data is probably Gaussian. Ifp <= 0.05, we reject the null hypothesis, suggesting the data is probablynot Gaussian.Here is my review: ChatGPT (not GPT-4, but definitely use that instead) tells us what the test is for, how to import the relevant library and where it comes from, how to create example data and what type it is, about the statistics returned, about printing, and what each statistics result means. That’sreally helpful!What could it do better? Maybe it could suggest other tests that might be able to do this, when to not use the test, and whether it might be able to critique the code. We can askthe following:ChatGPT, when should I not use thisstatistical test?ChatGPT lists six points and details that the test gets better with larger sample sizes, but there is a limit. It might also find significance where there isn’t any, on very large sample sizes. ChatGPT also mentions non-Gaussian distributions and sensitivity to outliers and uses other tests and actions to confirm that the data is to benormally distributed.There are far more details, which I won’t get into here, for brevity, but I’m sure if you ask the AI for it, it’ll give you goodinformation [ChatGPT].We could also ask Gemini to critiquethe code:Critique the code and give us some visualizations to help us understand theShapiro-Wilk test.Let’s check how that compares with what Gemini says about the code...Coding with ChatGPT and Other LLMs was published in November 2024. Packt library subscribers can continue reading the entire book for free.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
1401

Divya Anne Selvaraj

18 Feb 2025

PythonPro #63: Python Mobile Wheels for iOS & Android, LLM 0.22 Updates, and Top Python Vizualization Tools 📊

Divya Anne Selvaraj

18 Feb 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#63Python Mobile Wheels for iOS & Android, LLM 0.22 Updates, and Top Python Vizualization Tools 📊Hi ,Welcome to a brand new issue of PythonPro!Here are today's News Highlights: PyPI now supports iOS and Android wheels for Python 3.13; LLM 0.22 adds API key support, a chatgpt-4o-latest alias, and logging improvements; llm-mlx enables running Apple MLX-optimized LLMs like Llama 3.2 3B on macOS.My top 5 picks from today’s learning resources:Top Python Data Visualization Libraries of 2025📊How to Integrate Ollama and Deepseek-R1 with FastAPI for AI-Powered Chatbots🤖Transforming Non-Stationary Time Series with Differencing🔄The “late‐binding closure” pitfall in Python⚠️8 Open-Source AutoML Frameworks: How to Choose the Right One🤯And, in From the Cutting Edge, we introduce FVAPPS,the largest formal verification benchmark, transforming coding interview problems into theorem-proving tasks in Lean 4, providing a rigorous test for AI models and researchers working on provably correct code generation.Stay awesome!Divya Anne SelvarajEditor-in-ChiefPS: We're conducting market research to better understand the evolving landscape of software engineering and architecture – including how professionals like you learn, grow and adapt to the impact of AI.We think your insights would be incredibly valuable, and would love to hear what you have to say in a quick 1:1 conversation with our team.What's in it for you?✅ A brief 20–30 minute conversation at a time that’s convenient for you✅ An opportunity to share your experiences and shape the future of learning✅ A free credit to redeem any eBook of your choice from our library as a thank-youHow to Participate:Schedule a quick call at your convenience using the link provided after the form:https://forms.office.com/e/Bqc7gaDCKqLooking forward to speaking with you soon!Thank you,Team Packt.Note: Credits may take up to 15 working days to be applied to your accountSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPyPI Now Supports iOS and Android Wheels for Mobile Python Development: This follows PEP 730 (iOS support) and PEP 738 (Android support), both targeting Tier 3 support in Python 3.13.LLM 0.22, the annotated release notes: The version introduces API key support for models, a new chatgpt-4o-latest alias, improved logging and model search functionality, an embedding --prepend option, and various bug fixes.Run LLMs on macOS using llm-mlx and Apple’s MLX framework: llm-mlx, a new plugin for the LLM Python library and CLI, enables local execution of Apple MLX-optimized LLMs and simplifies running models like Llama 3.2 3B.💼Case Studies and Experiments🔬URL-addressable Pyodide Python environments: Explores Datasette Lite, a browser-based Python environment powered by Pyodide and WebAssembly, highlighting its ability to run Python entirely in the browser.Lines, Scribbles, and an Algorithm: AI in Brain Mapping: Applies machine learning to classify EEG data, tests a model, analyzes its accuracy, and discusses broader implications.📊AnalysisA year of uv: pros, cons, and should you migrate: Explains uv’s advantages over traditional tools like pip and venv, when not to use it, and how Astral’s development approach ensures reliability and future improvements.Top Python Data Visualization Libraries of 2025: pygwalker leads as a low-code tool, matplotlib remains foundational with GPU support, and Plotly dominates dashboards. Seaborn, Bokeh, and Altair cater to statistical, streaming, and academic needs.🎓Tutorials and Guides🤓Hermione's Undetectable Extension Charm: Revealed (Using Python): Uses Hermione's magical handbag from Harry Potter to explain Python's memory model, how references work, and why lists and other data structures appear to contain large objects while remaining small themselves.Transforming Non-Stationary Time Series with Differencing: Explains how differencing transforms non-stationary time series by removing trends, using first and second-order differencing, and applies this method to global temperature anomalies to prepare data for statistical modeling.Solving Markov Decision Processes with Value Iteration: Explains key MDP components, implements Bellman Equations for decision-making, and compares Value Iteration and Policy Iteration for optimizing movement strategies.How to Integrate Ollama and Deepseek-R1 with FastAPI for AI-Powered Chatbots: Provides step-by-step instructions to build an AI-powered chatbot, including setup, coding, API development, testing, and next steps.3D DMD and Visualization with OpenFOAM and Python: Provides a step-by-step guide to applying 3D DMDto OpenFOAM computational fluid dynamics (CFD) simulations, using Python and ParaView.Quick Start Robotics and Reinforcement Learning with MuJoCo: Covers the library's core structure, MJCF format, integration with reinforcement learning algorithms such as PPO, and environment setup for training robotic agents in a simulated physics environment.Creating a private, custom distilled Deepseek-R1 model on your local machine & using it within an operational Python pipeline: Covers using Ollama and integrating it into a Python pipeline for AI-driven text analysis while ensuring data privacy and offline functionality.🔑Best Practices and Advice🔏The “late‐binding closure” pitfall in Python: Explains how closures in Python capture variables by reference, not value, and provides a practical workaround to ensure expected behavior when using closures inside loops.Python Tooling Guide (Evergreen): Provides an up-to-date overview of the best tools for managing dependencies, linting, formatting, type checking, and editing as of early 2025.8 Open-Source AutoML Frameworks: How to Choose the Right One: Covers early projects like Auto-WEKA and TPOT, widely used tools like Auto-sklearn and H2O AutoML, and modern frameworks like AutoGluon, FLAML, and PyCaret.Python Setup Tips: Recommends uv as the modern 2025 approach for speed and ease of use but includes Conda as an alternative; provides step-by-step instructions for installing Python, creating virtual environments, and managing dependencies.#TechForNewbies: Inheritance: Explains inheritance in OOP using a family analogy, demonstrating how child classes inherit properties and methods from parent classes, the difference between single and multiple inheritance, and how to manage conflicts in variable names.🔍From the Cutting Edge: FVAPPS -- A Benchmark for Formally Verified Code Generation💥In "Proving the Coding Interview: A Benchmark for Formally Verified Code Generation," Dougherty and Mehta introduce Formally Verified Automated Programming Progress Standards (FVAPPS), the largest formal verification benchmark to date, containing 4,715 samples, including 1,083 curated ones. This benchmark expands upon Automated Programming Progress Standards (APPS), integrating interactive theorem proving into coding interview-style problems.ContextFormal verification mathematically proves a program’s correctness, offering stronger assurances than traditional software testing. This is crucial in safety-critical applications such as medical devices, cybersecurity, and autonomous systems. While Large Language Models (LLMs) have improved in code generation, ensuring their correctness remains a challenge. FVAPPS provides a rigorous benchmark to evaluate both program synthesis and theorem proving.Key FindingsLargest Formal Verification Benchmark: FVAPPS converts APPS coding problems into 4,715 theorem-proving tasks using Lean 4.Structured Benchmark: Problems are provided as Lean 4 theorems with "sorry" placeholders, requiring formal proofs for correctness.Automated Dataset Generation: Uses Anthropic’s Claude Sonnet 3.5 for iterative improvements and verification.Baseline Model Performance:Sonnet 3.5: Proved 30% of the attempted theorems.Gemini 1.5: Proved 18% of the attempted theorems.Human vs AI Performance: A human spent 10 hours on a single problem but couldn't complete the proof, highlighting the challenge of formal verification.What This Means for YouFVAPPS is valuable forAI researchers advancing formal verification and program synthesis, software engineers working on provably correct systems,Machine Learning developers refining AI theorem-proving capabilities, and hiring platforms assessing candidates using verifiable coding tasks.Examining the DetailsFVAPPS extends APPS by reformatting problems into theorem-proving tasks in Lean 4. The dataset undergoes a multi-stage process:Data Processing: Extracts coding problems, consolidates unit tests, and reformulates solutions.Lean Conversion: Converts Python solutions into function definitions and theorem statements with "sorry" placeholders.Validation: Uses Lean’s type-checking system to verify theorem structures.Quality Assurance: Ensures problems pass unit tests and align with formal verification principles.Curation: Filters out invalid or trivial problems, categorizing them into three subsets: unguarded (full dataset), guarded (unit test-verified), and guarded & plausible (most rigorous).You can learn more by reading the entire paper or accessing FVAPPS.And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
1446

Divya Anne Selvaraj

11 Feb 2025

PythonPro #62: Python 3.14’s New Interpreter: 9–15% Faster; Pydantic.ai Agent Framework; and Unified TTS Wrapper

Divya Anne Selvaraj

11 Feb 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#62Python 3.14’s New Interpreter: 3~30% Faster; Pydantic.ai Agent Framework; and Unified TTS WrapperHi ,Welcome to a brand new issue of PythonPro!Here are today's News Highlights: py3-TTS-Wrapper 0.9.18 simplifies speech synthesis across AWS, Google, Azure, IBM, and ElevenLabs; Pydantic.ai beta framework supports OpenAI, Anthropic, Gemini, and real-time debugging; and Python 3.14 promises a new interpreter with 3~30% speed boosts.My top 5 picks from today’s learning resources:How I Built a Deep Learning Library from Scratch Using Only Python, NumPy & Math🔢Django PDF Actions: How to Export PDF from Django Admin📑From Scratch to Masterpiece: The VAE’s Journey to Generate Stunning Images🧑‍🎨The Best Pre-Built Toolkits for AI Agents🕵️‍♂️Permutation Generation in PyTorch on GPU: Statistic Based Decision Rule forrandpermvs.argsortandrand ⚙️And, in From the Cutting Edge, we introduce HintEval, a Python library that streamlines hint generation and evaluation by integrating datasets, models, and assessment tools, providing a structured and scalable framework for AI-driven question-answering systems.Stay awesome!Divya Anne SelvarajEditor-in-ChiefPS: We're conducting market research to better understand the evolving landscape of software engineering and architecture – including how professionals like you learn, grow and adapt to the impact of AI.We think your insights would be incredibly valuable, and would love to hear what you have to say in a quick 1:1 conversation with our team.What's in it for you?✅ A brief 20–30 minute conversation at a time that’s convenient for you✅ An opportunity to share your experiences and shape the future of learning✅ A free credit to redeem any eBook of your choice from our library as a thank-youHow to Participate:Schedule a quick call at your convenience using the link provided after the form:https://forms.office.com/e/Bqc7gaDCKqLooking forward to speaking with you soon!Thank you,Team Packt.Note: Credits may take up to 15 working days to be applied to your accountSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsUnified TTS Interface: py3-TTS-Wrapper 0.9.18 Simplifies Speech Synthesis Across APIs: The library simplifies integration across services like AWS Polly, Google, Microsoft Azure, IBM Watson, and ElevenLabs.Pydantic.ai:Pythonagent framework from Pydantic team: Inspired by FastAPI’s success, the framework (in early beta) supports multiple AI models (OpenAI, Anthropic, Gemini, etc.), real-time debugging via Pydantic Logfire, and more.Python 3.14 Lands A New Interpreter With 3~30% Faster Python Code: alpha 5 is slated for release today, and word is, Python may be receiving a new interpreter with a 9-15% speedup on PyPerformance benchmarks.💼Case Studies and Experiments🔬Let's compile Python 1.0: Details the process of compiling Python 1.0 using podman and an old Debian container and reveals that despite its age, 1.0 had high-level data structures, process control, file handling, and more.How I Built a Deep Learning Library from Scratch Using Only Python, NumPy & Math: Explains the motivation, abstraction layers, and technical design, and delves into comparisons with PyTorch, covering key components like tensors, autograd, neural network modules, and optimizers.📊AnalysisWebAssembly and Python Ecosystem: Explores the current state of Python in WASM, its challenges, available tools, and performance comparisons with Rust, Go, and Docker for serverless computing.Data Analysis Showdown: Comparing SQL, Python, and esProc SPL: Compares SQL, Python, and esProc SPL for various data analysis tasks, including session counting, player scoring, and user retention.🎓Tutorials and Guides🤓Choose Your Fighter • Let's Play (#1 in Inheritance vs Composition Pair): Provides a step-by-step tutorial on building a simple shooting game using Python's turtle module, touching on OOP concepts, particularly inheritance.From Scratch to Masterpiece: The VAE’s Journey to Generate Stunning Images: Covers key VAE components—encoder, decoder, reparameterization trick, and loss function—and demonstrates how to train a VAE on MNIST to generate synthetic images.Installing and using DeepSeek AI on a Linux system: Covers CUDA setup, Ollama installation, model download, Chatbox integration, and Python scripting, highlighting the advantages of running AI models offline.Build Your Own DeepSeek-R1 ChatBot That Can Search Web: Covers Ollama installation, DeepSeek model setup, Docker-based SearXNG search integration, and Gradio-based UI creation, enabling offline AI interactions with real-time web augmentation.Data Analysis with Python Pandas and Matplotlib (Advanced): Coversusing Python, Pandas, and Matplotlib, covering data manipulation, importing CSV files, filtering, grouping, and visualization.Django PDF Actions: How to Export PDF from Django Admin: Introduces a package that simplifies exporting data to PDFs from Django Admin, addressing challenges like multilingual support, layout consistency, and styling.Elisp Cheatsheet for Python Programmers: Maps common Python constructs to their Elisp equivalents, covering collections, looping, file I/O, string operations, and data structures like lists, vectors, and hash tables.🔑Best Practices and Advice🔏The One About the £5 Note and the Trip to the Coffee Shop • The Difference Between `is` and `==` in Python: Explains how Python handles equality and identity, when to use is vs. ==, and how to define custom equality rules in classes using __eq__().The Best Pre-Built Toolkits for AI Agents: Explores toolkits such as CrewAI, LangChain, Agno, and Vercel AI SDK, which allow developers to extend AI agent capabilities.LangChain vs LlamaIndex: designing RAG and choosing the right framework for your project: Demonstrates side-by-side implementations of a chatbot using both frameworks, integrating vector databases (Qdrant), OpenAI embeddings, and PDF processing.Permutation Generation in PyTorch on GPU: Statistic Based Decision Rule forrandpermvs.argsortandrand : Analyzes the trade-offs between torch.randperm() and torch.argsort(torch.rand()) and introduces a statistical decision rule to determine when batching with argsort(rand()) is acceptable.Stop Creating Bad DAGs — Optimize Your Airflow Environment By Improving Your Python Code: Coversbest practices like limiting top-level code and avoiding XComs and Variables, and introduces airflow-parse-bench, an open-source tool for measuring and comparing DAG parse times.🔍From the Cutting Edge: HintEval: A Comprehensive Framework for Hint Generation andEvaluation for Questions💥In "HintEval: A Comprehensive Framework for Hint Generation andEvaluation for Questions," Mozafari et al. introduce a Python library for hint generation and evaluation in question-answering tasks. The framework consolidates scattered resources and provides a unified toolkit for developing and assessing hints.ContextThe integration of LLMsin Information Retrieval (IR) and Natural Language Processing (NLP) has improved information access but this can hinder critical thinking. Hint Generation mitigates this by guiding users towards answers rather than providing them outright, while Hint Evaluation ensures hints remain effective without revealing answers.Existing datasets and tools for hint research are fragmented and often incompatible, making comparisons difficult. HintEval addresses this by integrating multiple datasets, hint generation methods, and evaluation metrics into a single framework.Key Features of HintEvalAccess to preprocessed datasets: Provides a collection of preprocessed datasets, including TriviaHG, WikiHint, HintQA, and KG-Hint, which are designed for fact-based question answering.Support for two hint generation models: Includes an Answer-Aware model, which generates hints based on a known answer, and an Answer-Agnostic model, which generates hints without requiring an answer.Comprehensive hint evaluation system: Includes five evaluation metrics—relevance, readability, convergence, familiarity, and answer leakage—to ensure hints remain useful, clear, and non-revealing.Integration with advanced language models: Supports state-of-the-art LLMs such as GPT-4, LLaMA, Gemini, and others, allowing researchers to experiment with different hint-generation techniques.Freely available and open-source: Accessible on GitHub and PyPI, with detailed documentation and example implementations to facilitate ease of use.What This Means for YouHintEval is useful for researchers, developers, and educators working with AI-driven question-answering systems. Researchers can use it to test and compare models, developers can integrate smart hints into their applications, and educators can create interactive learning experiences that encourage critical thinking.Examining the DetailsHintEval simplifies working with hints by offering a structured approach to generating, evaluating, and testing them. It allows users to load preprocessed datasets or create custom ones, ensuring flexibility across different research needs. The framework also makes it easy to run hint evaluations at scale, with options to extend its capabilities using custom models and methods. Designed to work locally or in the cloud, it integrates smoothly with modern AI workflows, making it adaptable for a range of NLP and machine learning applications.You can learn more by reading the entire paper or accessing the library on GitHub.And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
1386

Divya Anne Selvaraj

04 Feb 2025

PythonPro #61: Meta’s Llama Flaw, Codon’s NumPy Boost, and Web2Vec for Automated Website Analysis

Divya Anne Selvaraj

04 Feb 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#61Meta’s Llama Flaw, Codon’s NumPy Boost, and Web2Vec for Automated Website AnalysisHi ,In today’sExpert Insight we bring you an excerpt from the recently published book, Generative AI on Google Cloud with LangChain, which discusses how LLMs generate plausible but sometimes false responses (hallucinations), and demonstrates how structured prompting with LangChain can help mitigate the issue.News Highlights: Meta's Llama flaw exposes AI servers to remote code execution via Python’s pickle; Codon’s 2025 update brings a faster, open-source NumPy with GPU support; Codegen, a Python library for code refactoring and analysis, is now open source.My top 5 picks from today’s learning resources:Decorator JITs - Python as a DSL⚡100 Common Python Mistakes🐍9 Statistical Distributions Every Data Scientist Should Know📊Create an Adaptive Customer Behavior Analytics Dashboard with Claude AI and Python📈Security and cryptography algorithms: A guide🔐And, in From the Cutting Edge, we introduce Web2Vec, a Python library that automates website analysis by extracting over 200 structured parameters through web crawling and direct feature extraction, with potential applications in cybersecurity, SEO, and machine learning.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsMeta's Llama Framework Flaw Exposes AI Systems to Remote Code Execution Risks: The critical vulnerability exposed AI inference servers to remote code execution due to unsafe deserialization with Python’s pickle module.Codon in 2025: New compiler-optimized NumPy implementation. Switching to an open source license: This update leverages Codon’s multithreading, GPU capabilities, and compiler optimizations, achieving significant speed improvements over standard NumPy.Codegen is now open source: Codegen is a Python library for advanced code manipulation, enabling large-scale refactoring, pattern enforcement, and static analysis without requiring deep AST knowledge.💼Case Studies and Experiments🔬How I Built a Python RL Trading Bot That Simulated 1150% Profit: Describes building a bot that identifies potential short squeezes using Financial Modeling Prep’s Fail-to-Deliver and historical stock price data.An empirical study of developers’ challenges in implementing Workflows as Code: A case study on Apache Airflow: Analyzes 1,000 Stack Overflow posts to categorizes challenges into workflow definition, execution, environment setup, quality assurance, security, and optimization, identifying key pain points.📊AnalysisObservations: Using Python with DeepSeek-R1: Explores using DeepSeek-R1 for AI tasks, covering API integration, response streaming, and Retrieval-Augmented Generation (RAG) while analyzing its reasoning process.Decorator JITs - Python as a DSL: Covers AST-based, bytecode-based, and tracing JITs, showing how they convert Python code into optimized LLVM IR for execution.🎓Tutorials and Guides🤓Security and cryptography algorithms: A guide: Coversblock and stream ciphers, hashing, key exchange, public key encryption, and cryptographic libraries, including practical examples using Python.TLS and networking: Explains TLS handshakes, encryption, certificate verification, networking layers, HTTP protocols, sockets, firewalls, and secure app deployment. Also discusses mutual TLS, DNS, and network security.On Shared Birthdays (and a Bit on Pythagorean Triplets) • Pythonic Code: Demonstrates probability concepts with itertools, collections, and datetime,simulates birthday collisions, analyzes probability with brute-force, and more.Make Sick Beats with Python: Explains how to build a simple drum machine in Python using the pygame library, covering setting up the environment, storing and playing sounds, representing music in code, and more.Create an Adaptive Customer Behavior Analytics Dashboard with Claude AI and Python: Demonstrates building a dashboard which analyzes uploaded CSV data, generates Python scripts, executes them, and creates visualizations.Nine Pico PIO Wats with MicroPython (Part 2): Through debugging techniques and real-world examples, it demonstrates workarounds for PIO limitations while building a theremin-like musical instrument.Managing Magento Configurations with PyGento: A Powerful Python CLI Tool for Developers: Explains how the tool integrates with PyGento, provides database access via SQLAlchemy, and automates tasks like searching, viewing, and editing Magento settings without using the admin panel.🔑Best Practices and Advice🔏AI Python Libraries: A centralized resource listing 1,037 libraries with descriptions and use cases for AI development, covering deep learning, machine learning, NLP, and data science.Python Code for Automated Log Analysis & Alerting: Covers parsing system, firewall, and web server logs to detect suspicious activity like brute force attacks and malicious IPs.9 Statistical Distributions Every Data Scientist Should Know: Introduces key statistical distributions, explains their characteristics, and provides practical examples.Crafting a Winning Conference Talk: Lessons from a PyCon US Reviewer:Provides guidance based on the author's experience as a PyCon US reviewer, outlining common mistakes and recommendations for writing a strong conference talk proposal.100 Common Python Mistakes: Covers logic bugs, inefficient code, non-Pythonic practices, and best practices for readability and performance, using clear examples for each mistake.🔍From the Cutting Edge: Web2Vec — A Python Library for Website-to-Vector Transformation💥In "Web2Vec: A Python Library for Website-to-Vector Transformation," D. Frąszczak and E. Frąszczak introduce Web2Vec, a Python library that converts websites into structured vector representations. The library automates feature extraction from web pages, integrating over 200 parameters from website content and structure to enable efficient analysis.ContextWebsite processing involves extracting and transforming web data for analysis. This includes web crawling, which systematically navigates websites to collect data, and web scraping, which extracts specific information from web pages. Website feature extraction identifies key attributes such as structure, security settings, and external references, while vectorisation converts unstructured data into numerical formats for computational analysis.The internet hosts over a billion websites, with millions actively generating data daily. Extracting insights is essential for market research, cybersecurity, and machine learning. While many research papers discuss web data collection, they often rely on custom scripts, leading to inefficiencies. Existing services like WHOIS, SimilarWeb, and Google Search Index provide valuable data but restrict free API access, requiring users to parse raw HTML instead. Web2Vec addresses these challenges with a unified, open-source solution for automated website data extraction and analysis.Key FeaturesWeb2Vec offers a structured approach to web data collection and analysis through:Automated Website Crawling – Uses Scrapy-based spiders to extract data from single pages or entire websites.Comprehensive Feature Extraction – Captures 211 parameters, including URL lexical features, HTML content, HTTP response details, SSL certificates, WHOIS data, and traffic metrics.Flexible Data Processing – Supports active crawling and pre-generated HTML snapshots.Cybersecurity Applications – Detects phishing sites, misinformation, and suspicious activity by integrating services like PhishTank and OpenPhish.Graph-Based Analysis – Visualises website relationships through network graphs for deeper insights.Open-Source & Extensible – Available on PyPI (pip install web2vec), with community support for updates and improvements.What This Means for YouWeb2Vec is a valuable tool for professionals and researchers working with web data. Data scientists can automate website feature extraction for large-scale analysis, while cybersecurity professionals can detect phishing and misinformation using structured data and threat intelligence services. SEO and marketing professionals can benefit from its ability to analyse rankings, metadata, and traffic sources. Developers and web scraping practitioners can replace custom scripts with Web2Vec’s unified framework, reducing redundancy and improving efficiency.Examining the DetailsWeb2Vec processes websites using two primary methods: direct extraction from pre-generated HTML snapshots and active web crawling with Scrapy-based spiders. It extracts 211 parameters across multiple categories, converting website characteristics into structured vector representations. The extracted data, stored in JSON format, includes details such as URL, title, HTML response, and status codes. Users can either crawl entire websites using the Web2VecSpider or extract specific parameters through direct method invocation. Web2Vec also supports visualising website relationships using directed graphs, making it particularly useful for cybersecurity applications such as phishing detection and misinformation analysis.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 3: Grounding Responses” in the book, Generative AI on Google Cloud with LangChain by Leonid Kuligin, Jorge Zaldívar, and Maximilian Tschochohei.Overcoming hallucinations in LLMsLLMs are trained on large amounts of publicly available data (please take a look atAppendix 1 for more details on how LLMs are trained). By design, they absorb the information they are explicitly given in the prompt and the information they have seen during thetraining.By default, LLMs don’t have access to any external information (except for what they have already memorized), and in most cases, they’re autoregressive models (they predict output words or tokens one by one by looking at the previous input, and this limits their reasoning capabilities). We’ll see some examples of how we can expand an LLM’s reasoning capabilities with agentic workflows in Chapters 9,10,and11.Simply put, LLMs will use the information from their training to respond to a prompt. LLMs effectively reproduce human language and, therefore, their answers sound very credible even if they are just a probabilistic continuation of the prompt. This can be correct if the training corpus has the right information – or it can just be plausible, but factually wrong. If it’s the latter, the LLM is saidto hallucinate.Let’s take asimple example:Prompt: Tell me about the black cucumbers.Response:Black CucumbersBlack cucumbers, scientifically known as Cucumis sativus var. Nigra, are an intriguing variety of cucumber characterized by their unique dark green to black skin. They are not genetically modified but instead occur naturally due to a higher concentration of chlorophyll.Origin and CultivationBlack cucumbers are native to Europe and Asia, particularly Russia and China. They are grown in warm, moist climates and require well-drained soil rich in organic matter. The fruits typically mature within 60-70 days after planting.Physical CharacteristicsSkin: Deep green to black, with occasional lighter... (The response was truncated because it has reached the token limit. Try to increase the token limit if you need a longer response.)At least to my knowledge, black cucumbers don’t exist. But the LLM has produced a response based on a description of actual cucumbers that looks plausible and real. In the summer of 2023, a US lawyer used an LLM to answer legal questions. The LLM cited non-existing cases, but they looked so trustworthy that the lawyer used them in court and got intoproblems [1].Sometimes hallucinations can be overcome by prompt engineering. Let’s look at the followingprompt template:from langchain.chains import LLMChainfrom langchain.prompts.prompt import PromptTemplatefrom langchain_google_vertexai import VertexAIllm = VertexAI(model_name="gemini-1.0-pro", temperature=0.8, max_output_tokens=128)template = """Describe {plant}.First, think whether {plant} exist.If they {plant} don't exist, answer "I don't have enough information about {plant}".Otherwise, give their title, a short summary and then talk about origin and cultivation.After that, describe their physical characteristics."""prompt_template = PromptTemplate( input_variables=["plant"], template=template,)chain = LLMChain(llm=llm, prompt=prompt_template)chain.run(plant="black cucumbers")If we run this chain, we’ll get arelevant answer:I don't have enough information about black cucumbers.You can double-check and ask the question about green cucumbers to make sure that the LLM will give a correct answer withthis prompt.HallucinationsHallucinations are one of the key problems that the industry is facing atthe moment.The good news: There are ways to significantly reduce hallucination rates, and we’re going to discuss them in this and thenext chapters.The bad news: Anygenerative AI(GenAI) might produce hallucinations, and you need to evaluate and monitor them during application development. We’ll talk about evaluation inChapter 14.Generative AI on Google Cloud with LangChainwas published in December 2024. Packt library subscribers can continue reading the entire book for free.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
1372

Divya Anne Selvaraj

28 Jan 2025

PythonPro #60: XGBoost Multithreading, Python 3.14 Updates, REST API Frameworks, and Quantum ML with sQUlearn

Divya Anne Selvaraj

28 Jan 2025

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#60XGBoost Multithreading, Python 3.14 Updates, REST API Frameworks, and Quantum ML with sQUlearnHi ,Happy New Year! We’re back! Did you try your hand at any exciting Python projects over the holidays that you’d like to share? If so, reply to this email and let me know. If it’s brilliant, we’ll share what you made in next week's issue.In today’sExpert Insight we bring you an excerpt from the recently published book, XGBoost for Regression Predictive Modeling and Time Series Analysis, which demonstrates the power of XGBoost's multithreaded capabilities, showcasing how adjusting the nthread parameter can significantly accelerate model training by utilizing multiple CPU cores, as illustrated through a practical example with the California housing dataset.News Highlights: Python 3.14.0 alpha 4 introduces features like PEP 649 for deferred annotations and improved error messages; Python wins Tiobe's Programming Language of the Year 2024 with a 9.3% popularity surge; and a new PEP proposes SBOMs for better package security and dependency tracking.My top 5 picks from today’s learning resources:A technical intro to Ibis: The portable Python DataFrame library🐦How to Split a Python List or Iterable Into Chunks🍰Build a chatbot web app under 5min in Python🤖Five Key Lessons for Google Earth Engine Beginners🌍Choosing your Python REST API framework🔧And, in From the Cutting Edge, we introduce sQUlearn, a Python library for quantum machine learning that integrates seamlessly with classical tools like scikit-learn, offering high-level APIs, low-level customisation, and robust support for NISQ devices.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython 3.14.0 alpha 4 is out: This early developer preview showcases new features like PEP 649 (deferred annotation evaluation), PEP 741 (Python configuration C API), and improved error messages.Python wins Tiobe language of the year honors: Python won Tiobe's Programming Language of the Year 2024 with a 9.3% popularity increase, surpassing other languages like Java and Go.Software bill-of-materials (SBOMs) docs eyed for Python packages: A new Python Enhancement Proposal (PEP) suggests incorporating SBOM documents into Python packages to improve dependency tracking and vulnerability analysis.💼Case Studies and Experiments🔬A Scheme for Network Programmability and Backup Automation Using Python Netmiko Library on Cisco; the Case Study of the Komfo Anokye Teaching Hospital Local Area Network: Presents a Python-based framework that addresses the inefficiencies of manual processes, achieving a 99% reduction in backup time, a 100% success rate, and enhanced resource utilization.Change Python's syntax with the "# coding:" trick: Demonstrates a playful yet risky manipulation of Python's behavior by creating a codec that converts braces into indentation, enabling alternative syntax like using {} for blocks.📊AnalysisA technical intro to Ibis: The portable Python DataFrame library: Introduces Ibis which simplifies multi-backend data workflows through lazy evaluation, backend-agnostic code, and seamless backend translation.A Deeper Look into Node.js Docker Images: Help, My Node Image Has Python!: Analyzes various Node.js Docker images, comparing their sizes, security vulnerabilities, and use cases.🎓Tutorials and Guides🤓Embedding Python in Rust (for tests): Covers setting up a Python interpreter within a Rust project, exposing Rust functions to Python, handling dynamic types, and building a test runner.How to Visualize your Python Project’s Dependency Graph: Explains how to visualize a Python project's dependency graph using the Tach tool to define module boundaries, sync dependencies, and visualize the graph in a browser or with GraphViz.Build a chatbot web app under 5min in Python: Provides a beginner-level tutorial for building a chatbot web app in Python using Dash, Dash-Chat, and OpenAI's GPT models.Deploying the Python Masonite framework on Lambda: Coverscreating a Lambda function, configuring the lambda_handler , and setting up CI/CD with GitHub Actions to automate deployments.The Shortest Python `import` Tutorial • A Picture Story: Explains the three main ways of importing in Python: importing the whole module, importing specific items, and importing everything with a wildcard.How to Split a Python List or Iterable Into Chunks: Techniques covered include using Python’s standard library (itertools.batched()), third-party packages like more_itertools and NumPy, and custom implementations.Nine Pico PIO Wats with MicroPython (Part 1): Explores surprising behaviors ("Wats") in Raspberry Pi Pico's Programmable Input/Output (PIO) subsystem through a musical theremin project using MicroPython.🔑Best Practices and Advice🔏Choosing your Python REST API framework: Evaluates popular frameworks like FastAPI, Django REST framework, Flask-RESTX, Sanic, and Tornado, offering guidance for selecting the most suitable framework.The Storeroom: Introduces a fix to the White Room analogy, a teaching method for explaining Python’s variable handling and namespaces and addresses the analogy's limitation in representing multiple references to the same object.Python's Mutable vs Immutable Types: What's the Difference?: Discusses key concepts like object identity, type, and value, along with common pitfalls, such as aliasing variables, mutating function arguments, and using mutable default values.Five Key Lessons for Google Earth Engine Beginners: Offfers tips illustrated with real-world examples, including calculating water balance and drought in a water basin in Ecuador.Using Tree Sitter to extract insights from your code and drive your development metrics: Covers three methods for analyzing code: textual matching, syntax linting, and AST traversal, emphasizing the advantages of the latter for accuracy and extracting node values.🔍From the Cutting Edge: sQUlearn – A Python Library for Quantum Machine LearningI💥In the paper, "sQUlearn – A Python Library for Quantum Machine Learning," Kreplin et al. introduce sQUlearn, a Python library for quantum machine learning (QML), designed to integrate seamlessly with classical machine learning tools like scikit-learn.ContextQuantum Machine Learning (QML) combines quantum computing and machine learning to harness quantum principles for computational efficiency and enhanced algorithmic capabilities. However, many current QML tools demand in-depth quantum computing expertise. Noisy Intermediate-Scale Quantum (NISQ) devices, while promising, pose significant challenges due to their limitations in handling deep quantum circuits. To bridge these gaps, sQUlearn focuses on NISQ-compatibility, usability, and integration with classical ML tools, particularly scikit-learn.Key FeaturessQUlearn offers:High-Level Interfaces: Provides scikit-learn-compatible APIs for quantum kernel methods (e.g., quantum SVMs) and quantum neural networks (QNNs) for classification and regression tasks.Low-Level Functionalities: Offers tools for designing quantum circuits, customising encodings, and performing advanced differentiation for QML research.Quantum Kernel Methods: Supports fidelity-based and projected quantum kernels (FQK and PQK) for enhanced data embedding and efficient computation.Flexible Execution: Enables seamless transitions between simulations and real quantum hardware using Qiskit and PennyLane backends.Automation Features: Includes session management, result caching, error handling, and automatic restarts to simplify quantum experiment execution.Customisation Options: Allows users to create and modify data encoding strategies, observables, and outer kernels for tailored solutions.What This Means for YousQUlearn simplifies quantum machine learning for both researchers and practitioners. For researchers, it offers a flexible low-level framework for exploring novel QML algorithms and quantum circuit designs. For practitioners it simplifies the deployment of QML solutions with minimal quantum-specific knowledge via high-level interfaces and pre-built models using familiar tools like scikit-learn.Examining the DetailssQUlearn’s dual-layer architecture enables flexibility, with high-level APIs for seamless integration into machine learning workflows and low-level tools for advanced customisation. The Executor module centralises quantum job execution, handling retries, caching results, and transitioning between simulation and real hardware. It supports quantum kernel methods and neural networks while addressing noise challenges on quantum devices through built-in regularisation techniques. This focus on automation and robustness ensures the library is both reliable for practical applications and adaptable for research needs.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 13: Deploying Your XGBoost Model” in the book, XGBoost for Regression Predictive Modeling and Time Series Analysis by Partha Pritam Deka and Joyce Weiner.Using XGBoost’s multithreaded featuresXGBoost has built-in support for multithreaded computing, which allows you to speed up model training by utilizing multiple CPU cores. You can control this by setting thenthread parameter, which determines the number of threads to use. By default,XGBoost will automatically use the maximum number of available threads.It’s important to note that if you’re using Dask, any value you set fornthreadwithin XGBoost will take precedence over Dask’s default configuration. The following example demonstrates how the multithreading parameter works. We’ll revisit theCalifornia housing datasetthat you worked with inChapter 4:Create a Python file to demonstrate XGBoost’s multithreaded functionality. We’ve started with a header and named thefilemultithreaded.py.Import the necessary modules. You can load the California housing dataset from scikit-learn (sklearn). You’ll also be usingpandas,numpy, a module calledtimeto track how long code execution takes, and, ofcourse,xgboost:import pandas as pdimport numpy as npimport timeimport xgboost as xgbfrom sklearn.metrics import r2_scorefrom sklearn import datasetsfrom sklearn.model_selection import train_test_splitNow, you can load in the California housing dataset and perform the train-test split using scikit-learn, as youdid previously:housingX, housingy = datasets.fetch_california_housing( return_X_y=True, as_frame=True)X_train, X_test, y_train, y_test = train_test_split( housingX,housingy, test_size=0.2, random_state=17)Previously, you used the scikit-learn interface for XGBoost. In this example, you’ll use the XGBoost API for Python. One difference is that XGBoost uses a data structure called aDMatrixto manipulate data. So, the first thing you need to do is convert the dataset fromnumpyorpandasform intoDmatrixform by using theDMatrixfunction and passing in the data and the labels. In this case, we’ll be usingdtrain = xgb.DMatrix(X_train, y_train)for the training dataset; do the same for thetest dataset:dtrain = xgb.DMatrix(X_train, y_train)dtest = xgb.DMatrix(X_test, y_test)Now, the data is in a format that XGBoost can manipulate with efficiency. As mentioned inChapter 3, XGBoost does some sorting and performs other operations on the dataset to speedup execution.At this point, you’re ready to train a model using the XGBoost API and the multithreading feature. By default, XGBoost uses the maximum number of threads available. To see the difference, train the model with just two threads, and then increase the maximum number of logical processors you have in your computer. You’ll need to use thetimemodule to get the computation time and print it out so that you can compare the results. First, save the start time with the following lineof code:train_start = time.time()You can set the training parameters for XGBoost by creating a dictionary with the parameters as key-value pairs. You can configure all the parameters listed in theHyperparameterssection ofChapter 5. Here, seteta = 0.3(the learning rate),booster = gbtree, andnthread =2:param = {"eta": 0.3, "booster": "gbtree", "nthread": 2}Now that the training parameters have been set, you can train the model and save the end of the execution time by using thefollowing code:housevalue_xgb = xgb.train(param, dtrain)train_end = time.time()Print the execution time with a formattedprintstatement while subtractingtrain_startfromtrain_endand converting it into milliseconds by multiplyingby 103:print ("Training time with 2 threads is :{ 0:.3f}".format((train_end - train_start) * 10**3), "ms")Now, repeat the code and increase the number of threads XGBoost uses by changing the value ofnthread. Since our computer has eight logical processors, I’vechosen8:train_start = time.time()param = {"eta": 0.3, "booster": "gbtree", "nthread": 8}housevalue_xgb = xgb.train(param, dtrain)train_end = time.time()print ("Training time with 8 threads is :{ 0:.3f}".format((train_end - train_start) * 10**3), "ms")To ensure the model is working as expected, you can make a prediction and check the R2value. You can also time the prediction. To make a prediction with the Python API, just call thepredictmethod on your model and pass thetest dataset:pred_start = time.time()ypred = housevalue_xgb.predict(dtest)pred_end = time.time()print ("Prediction time is :{0:.3f}".format(( pred_end - pred_start) * 10**3), "ms")xgb_r2 = r2_score(y_true=y_test, y_pred= ypred)print ("XGBoost Rsquared is {0:.2f}".format(xgb_r2))Running this script results in the following output. Please note that the execution time on your computer willbe different:Training time with 2 threads is :237.088 msTraining time with 8 threads is :130.723 msPrediction time is :2.012 ms XGBoostRsquared is 0.76On our computer, going from two to eight threads sped up training by over 44%. This demonstrates the benefit XGBoost provides with multithreading. Recall that by default, it will use the maximum number of threads available. Next, you’ll learn about using XGBoost with distributed compute by using Daskon Linux.XGBoost for Regression Predictive Modeling and Time Series Analysis was published in December 2024.Get the eBook for $39.99 $27.98Get the Print Book for $49.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
517

Divya Anne Selvaraj

10 Dec 2024

PythonPro #59: Training AI Models, YOLO Library Breach, and GPT for Scientific LaTeX

Divya Anne Selvaraj

10 Dec 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.🎄#59🥂Training AI Models, YOLO Library Breach, and GPT for Scientific LaTeXHi ,Welcome to our very last issue for the year 2024! We will be taking our usual year end break and return again on the 14th of January 2025. In the meanwhile do keep an eye out. I've heard 🤫some irrestable deals from Packt will be coming your way soon! Now let's get to it.In today’sExpert Insight we bring you an excerpt from the recently published book, AI Product Manager's Handbook, which discusses the process of training AI models for market readiness.News Highlights: Ultralytics library, used for YOLO, hit by a GitHub Actions supply-chain attack; and python-build-standalone transitions to Astral for continued development.My top 5 picks from today’s learning resources:From Code to Paper: Using GPT Models and Python to Generate Scientific LaTeX Documents📄Linear Regression in Python📊Asynchronous Tasks With Django and Celery⚙️Disposable environments for ad-hoc analyses🧪Negative Testing in Python Web Applications with pytest🛡️And, today’s Featured Study, delves into techniques and challenges in making AI models interpretable, emphasizing XAI's role in ethical and high-stakes applications like healthcare and finance.Stay awesome!And of course, as Charles Dickens wrote, may you feel "light as a feather🪶, ...happy as an angel😇,... merry as a schoolboy🕺....A merry Christmas 🎄to everybody! A happy New Year🥂 to all the world!"Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPopular Python AI library Ultralytics compromised with a crypto miner: The library, used for YOLO object detection, was compromised through a supply-chain attack exploiting GitHub Actions.A new home for python-build-standalone:The project, initially developed by Gregory Szorc, which addresses challenges like dependency linking and slow source builds, will not be taken up by Astral.💼Case Studies and Experiments🔬From Code to Paper: Using GPT Models and Python to Generate Scientific LaTeX Documents: Explores the feasibility of using GPT models to automate the generation of structured LaTeX documents from Python algorithms, serving as a proof-of-concept.The Black-Litterman Model: A smart integration of our market view: Explores the theoretical foundation, practical application, and benefits of the Black-Litterman model in portfolio optimization.📊AnalysisExpression vs Statement in Python: What's the Difference?: Examines their differences through examples, including lambdas, assertions, and compound statements, while addressing practical relevance in day-to-day programming.Typed Python in 2024: Well adopted, yet usability challenges persist: According to JetBrains, Meta, and Microsoft's survey on the state of Python typing, 88% of respondents frequently use types for benefits like enhanced IDE support, bug prevention, and better documentation.🎓Tutorials and Guides🤓When a Duck Calls Out • On Duck Typing and Callables in Python: Demonstrates how callables—objects with a __call__() method—can enable flexible and polymorphic code by focusing on behavior over inheritance.Socket Programming in Python (Guide): Includes examples such as echo servers, multi-connection setups, and application-level protocols for real-world applications.Expanding HUMS: Integrating Multi-Sensor Monitoring: Demonstrates expanding a Health and Usage Monitoring System (HUMS) by integrating multiple sensors and using FPGA-based tools for scalable data logging, clock synchronization, and enhanced monitoring capabilities.Asynchronous Tasks With Django and Celery: Covers setting up Celery with Redis as a message broker, configuring tasks, and executing them independently from the main app flow.Customising Pattern Matching Behaviour: Provides practical examples, including handling the end of iteration and publishing the approach as the pattern-utils library for extended functionality.How to Round Numbers in Python: Covers advanced rounding with the decimal module, NumPy, and pandas, enabling precise control over data manipulation for specific tasks.PydanticAI: Pydantic AI Agent Framework for LLMs: Demonstrates how PydanticAI enables structured outputs, enforces type safety, and integrates seamlessly with LLMs for creating AI agents, using practical examples.Linear Regression in Python: Demonstrates implementing simple, multiple, and polynomial regression in Python using libraries like scikit-learn and statsmodels, covering key concepts such as underfitting and overfitting.🔑Best Practices and Advice🔏Disposable environments for ad-hoc analyses: Introduces the juv package, which embeds Python dependencies directly within Jupyter notebooks, eliminating the need for external environment files and enhancing reproducibility.Effective Python Testing With pytest: Introduces pytest, a Python testing framework, covering features like fixtures for managing dependencies, test parametrization to reduce redundancy, and detailed failure reports.Django Signals: [Not] the Evil Incarnate You Think: Demystifies Django Signals, demonstrating transitioning from tightly coupled models to a signal-based architecture using custom signals and message data classes.Negative Testing in Python Web Applications with pytest: Demonstrates testing invalid inputs, malformed requests, database failures, and rate limiting, alongside best practices like isolating scenarios, mocking dependencies, and ensuring comprehensive coverage.Python for Security Engineers: Focuses on foundational skills like working with APIs (via the requests library), processing data formats (JSON, CSV, XML), file operations, writing detections, and building simple CLI tools and Flask apps.🔍Featured Study: Explainable AI💥In the paper, "A Comprehensive Guide to Explainable AI: From Classical Models to LLMs", Hsieh et al. explore the challenges and solutions in making machine learning (ML) models interpretable. The paper's goal is to provide a foundational understanding of explainable AI (XAI) techniques for researchers and practitioners.ContextXAI aims to make AI systems transparent and understandable. Classical models, such as decision trees and linear regression, are naturally interpretable, whereas deep learning and LLMs are often considered "black-box" systems. This lack of transparency raises concerns in high-stakes applications, such as healthcare, finance, and policymaking, where accountability and fairness are critical. Tools like SHAP (Shapley Additive Explanations), LIME (Local Interpretable Model-agnostic Explanations), and Grad-CAM provide explanations for complex AI models. The study’s relevance lies in addressing the pressing need for trust and transparency in AI, particularly in ethical and regulatory contexts.Key RecommendationsEnhance Explainability for Complex Models: Future research should focus on improving methods for understanding the internal mechanisms of Large Language Models (LLMs) using techniques like fine-grained attention visualisation and probing methods.Combine Interpretability Approaches: Hybrid models that integrate intrinsic interpretability (e.g., Decision Trees) with post-hoc explanation tools like SHAP and LIME are recommended to balance accuracy and comprehensibility.Promote User Interaction for Better Explanations: Develop interactive, human-in-the-loop systems to allow users to query and customise model explanations, fostering trust and usability.Address Domain-Specific Needs: In fields like healthcare and finance, tailor explainability methods to meet high-stakes requirements. For instance, real-time explanations are crucial for clinicians using diagnostic models.Integrate Legal Compliance: XAI methods must align with regulations like GDPR's "right to explanation" by offering clear and defensible explanations for automated decisions.Develop Standards for Interpretability Evaluation: Address the lack of standardised metrics by creating frameworks that combine quantitative and qualitative evaluation methods to ensure reliable assessments of explainability.Balance Transparency and Privacy: Techniques that enhance model transparency must also protect sensitive training data, particularly in models trained on personal datasets, to avoid privacy breaches.What This Means for YouThis study is useful for AI practitioners, data scientists, and decision-makers in fields like healthcare, finance, and policymaking. It provides practical tools and techniques, including SHAP and Grad-CAM, for improving model transparency and trust. The included Python code examples and resources enable direct application to real-world projects, making it a valuable guide for integrating explainability into high-stakes AI systems.Examining the DetailsThe paper uses case studies in healthcare, finance, and policymaking to highlight practical uses of XAI.You can learn more by reading the entire paper or accessing its code on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 2: Model Development and Maintenance for AI Products” in the book, AI Product Manager's Handbook by Irene Bratsis.Training – when is a model ready for market?In this section, we will explore the standard process for gathering data to train a model and tune hyperparameters optimally to achieve a certain level of performance and optimization. In the Implementation phase (step 4 of the NPD process), we’re looking for a level of performance that would be considered we’re looking for a levelofperformance that would be considered optimal based on the Define phase (step 2of the NPD process) before we move to the next phase of Marketing and crafting our message for what success looks like when using our product. A lot must happen in the Implementation phase before we can do that. Some of the key considerations are as follows:Data accessibility is the most important factor when it comes to AI/ML products. At first, you might have to start with third-party data, which you’ll have to purchase, or public data that’s freely available or easily scraped. This is why you’ll likely want or need to partner with a few potential customers. Partnering with customers you can trust to stick with you and help you build a product that can be successful with real-world data is crucial to ending up with a product that’s ready for market. The last thing you want is to create a product based on pristine third-party datasets or free ones that then becomes overfitted to real-world data and performs poorly with data coming from your real customers that it’s never seen before.Having a wide variety of data is important here, so in addition to making sure it’s real-world data, you also need to make sure that your data is representative of many types of users. Unless your product caters to very specific user demographics, you’re going to want to have a model trained on data that’s as varied as possible for good model performance as well as good usability ethics. There will be more on that in the final section.The next key concept to keep in mind with regard to training ML models is minimizing the loss function. While training data is key, your loss function is going to determine how off from the mark your model is performing. The process of training is exactly that: using data and adjusting your models to optimize for how correct it is at predicting an output. The more incorrect it is, the higher your loss. The more correct it is, the more you’ve minimized your loss function. The more your machine learns (and practices) the better its chances of good performance.Iterative hyperparameter tuning will also be hugely important as you continuously retrain your models for performance. One of the tools you have at your disposal, apart from changing/improving your training data, is adjusting the hyperparameters of your model. Note that not all models have hyperparameters to tune but most do. Models like linear regression models do have coefficients that can change, they are not at the discretion of the engineer. In contrast, deep learning models have the most hyperparameters and this is a big part of their training process.The performance metrics and benchmarks in the Define phase (step 2of the NPD) will inform how your ML engineers will go about tuning their hyperparameters. Most of the time, we don’t yet know what the optimal model architecture for a certain use case is. We want to explore how a model functions with various datasets and start somewhere so that we can see which hyperparameters give us superior performance.Examples of what hyperparameters do include the degree of features that should be used in a linear model, the maximum depth that should be allowed for a decision tree model, how many trees should be included in a random forest model, or how many neurons or layers should be included for a neural network layer. In all these cases, we’re looking at the external settings of the model itself and all these settings are worthy of scrutiny based on the model performance they produce. Having competent AI/ML engineers who are comfortable with navigating these shifts in performance will be important in creating a product that’s set up for success.We want to go into some applied examples of models and their comparisons to give product managers out there who are unfamiliar with AI/ML performance benchmarks a sense of how you can go about evaluating whether one model is better than another. The following are a few examples of performance metrics that your ML engineers will look at as they evaluate whether or not they’re using optimal models. Note that not using optimal models could come with significant engineering and financial costs from the need to correct mistakes, including time and computational resources to redevelop and retrain your models.You’ll notice some of the names are familiar from our previous list of model types:Note:These comparisons were done on a personal project, which was a model we had created to predict the price of Ether, a form of cryptocurrency. If you’d like to see the entire project outlined, you can do so here.The first model we wanted to use was anordinary least squares(OLS) regression model because this is the most straightforward of the linear regression models that we wanted to select to give us a good baseline before we approached other model types.The results of the OLS regression model are as follows:Figure 2.2 – OLS regression model resultsInChapter 1, we discussed the notion of performance metrics for ML models and how to track them. There are a number of metrics that are automatically generated when you train a model. In the example above, we see what the full list of available metrics looks like when you run a model. For our comparison, we will be focusing on theR-squared of the model in the test setline inFigure 2.2to get the rate of error that’s comparable between models. TheR-squaredmetric is also referred to as the “coefficient of determination” and the reason why we use this particular metric so often in regression models is that it best assesses how far the data lies from the fitted regression line that the regression model creates. With the preceding OLS regression model, we see an R-squared of0.889for the test set using an 80/20 split of the training data. We used 80% of the data for training and the remaining 20% of the data for testing.The next model we tested was a random forest to compare results with a tree-based model. One of our hyperparameters for this random forest example was setting our cross-validation to10so that it would run through the training 10 times and produce an average of those 10 iterations as a final score. That average was an R-squared of 0.963, higher than our OLS model!The results of the random forest model are as follows:Figure 2.3 – Random forest model resultsFinally, the last comparison was with our KNN model, which produced a score of 0.994. The hyperparameter we chose in this model was 6, which means we are looking for a group of 6 neighbors for each grouping. This KNN model gives us our best performance because we’re ideally looking for the closest we can get to a perfect score of 1. However, we must keep this in mind with a caveat: although you are looking to get as close as you can to 1, the closer you get to 1, the more suspicious you should be of your model. For instance, the results of the KNN model are as follows:Figure 2.4 – KNN model resultsThough it may seem counterintuitive, getting this high a score likely means that our model is not working well at all, or that it’s working especially well on the training data but won’t perform as well on new datasets. While it may seem paradoxical, though the model is trying to get as close as it can to 1, getting too close is quite suspicious. That’s because we always expect a model will be imperfect – there will always be some loss. When models perform exceedingly well with training data and get high scores, it could just mean that the model was calibrated to that data sample and that it won’t perform as well with a new data sample.This phenomenon is calledoverfittingand it’s a big topic of conversation in data science and ML circles. The reason for this is that, fundamentally, all models are flawed and are not to be trusted until you’ve done your due diligence in selecting the best model. This game of choosing the right model, training it, and releasing it into the wild must be done under intense supervision. This is especially true if you’re charging for a product or service and attempting to win the confidence of customers who will be vouching for you and your products someday. If you’re an AI/ML product manager, you should look for good performance that gets better and better incrementally with time, and you should be highly suspicious of excellent model performance from the get-go. I’ve had an experience where model performance during training was taken for granted and it wasn’t until we had already sold a contract to a client company that we realized the model performed terribly when applied to the client’s real-world data. As a result, we had to go back to the drawing board and retrain a new model to get the performance we were looking for before deploying that model into our client’s workflows.A quick note on neural networks: while training generative AI models will be a bit different considering the subject matter and purpose of your model, it will follow a similar process. You’re still going to put a premium on a clean and diverse data sample, you’re still going to be thoughtful about which neural network will work best for the performance you want, and you’re still going to need to account for (and optimize on) your loss function to the best of your ability. This process will continue through various loops of training and validating until you feel confident enough that your generative AI model will be able to generate new outputs based on the training examples you’ve given it. Your goal of tweaking hyperparameters for performance, minimizing loss where you can, and amassing enough data to set your model up for success remains the same as it does for other ML models.Once you have comprehensive, representative data that you’re training your models on, and you’ve trained those models enough times and adjusted those models accordingly to get the performance you’re seeking (and promising to customers), you’re ready to move forward!AI Product Manager's Handbookwas published in November 2024.Get the eBook for $39.99 $27.98Get the Print Book for $49.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email! See you next year!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5562

Divya Anne Selvaraj

03 Dec 2024

PythonPro #58: HTTP Requests Demystified, Goat vs. Car 🐐🚗, and Python's Dependency Dilemma

Divya Anne Selvaraj

03 Dec 2024

0
0
6252

Divya Anne Selvaraj

26 Nov 2024

PythonPro #57: NASA Image Pipeline in Airflow, PyPI Crypto Scare, and Numba vs. Cython

Divya Anne Selvaraj

26 Nov 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#57NASA Image Pipeline in Airflow, PyPI Crypto Scare, and Numba vs. CythonHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Apache Airflow Best Practices, which explains how to build and test a pipeline in Jupyter Notebook to extract daily images from NASA's APOD API, store them locally, and prepare the workflow for automation using Apache Airflow.News Highlights:PyPI'saiocpa updated with code to steal private keys via Telegram; AWS Lambda SnapStart now supports Python 3.12+ and .NET 8+ for faster startups; Eel simplifies Python/JS HTML GUI apps with async support; and Marimo raises $5M for an open-source reactive Python notebook.My top 5 picks from today’s learning resources:Numba vs. Cython: A Technical Comparison⚡NumPy Practical Examples: Useful Techniques🔢Optimize Python with Nuitka🛠️Speed Up Your Python Program With Concurrency🏎️Efficient String Concatenation in Python🧵And, today’s Featured Study, introduces CODECLEANER, an open-source toolkit that employs automated code refactoring to mitigate data contamination in Code Language Models, significantly enhancing evaluation reliability across Python and Java through systematic and scalable techniques.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:Thank you all who participated in this month's survey. With this issue, we have fulfilled all content requests made this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython Crypto Library Updated to Steal Private Keys: The PyPI packageaiocpawas updated with malicious code to exfiltrate private keys via Telegram, targeting crypto library users.AWS Lambda now supports SnapStart for Python and .NET functions: This opt-in feature is ideal for latency-sensitive applications and is available for Python 3.12+ and .NET 8+ across several AWS regions.Eel: For little HTML GUI applications, with easyPython/JS interop: The library simplifies development for utility scripts, supports asynchronous functions, and enables distribution through PyInstaller.Marimo raises $5M to build an open-source reactive Python notebook: The Python notebook will be optimized for reproducibility, Git compatibility, script execution, and web app deployment.💼Case Studies and Experiments🔬Running Code from Strangers: Recounts the evolution of Livedocs' runtime architecture, from browser-based Pyodide to Kubernetes for scalability, security, and performance in running isolated, interactive Python-based documents.Mach 1 with Python!:Details a fun DIY project using a Raspberry Pi, ultrasonic sensors, and Python to create a Mario pipe that plays a sound whenever someone passes through it.📊AnalysisNumba vs. Cython: A Technical Comparison: Compares the two tools for optimizing Python performance, detailing their features, use cases, and benchmarking results to guide their practical application.Is async django ready for prime time?: Details the setup required for fully async operations, such as using an ASGI server, async views, and an async ORM, while showcasing tools like django-ninja and aiohttp.🎓Tutorials and Guides🤓How to Iterate Through a Dictionary in Python: Explores various methods for iterating through Python dictionaries, including using.items(),.keys(), and.values() methods for accessing keys, values, or key-value pairs.NumPy Practical Examples: Useful Techniques: Demonstrates advanced NumPy techniques, including creating multidimensional arrays from file data, handling duplicates, and reconciling datasets with structured arrays.Build generative AI applications on Amazon Bedrock with the AWS SDK for Python (Boto3): Focuses on accessing and integrating foundation models into applications without managing infrastructure.Python range(): Represent Numerical Ranges: Covers Python's range() function, explaining its use for generating numerical sequences for loops, defining intervals with start, stop, and step parameters.A Complete Guide to Credit Risk Analysis With Python and Datalore AI: Utilizing Python and Datalore's AI-assisted coding capabilities, this guide demonstrates how to retrieve and process financial data, create visualizations, and perform statistical analyses to uncover patterns and insights.Optimize Python with Nuitka: Introduces Nuitka, a Python-to-C++ compiler, detailing its features like performance optimization, compatibility, cross-platform support, and the creation of standalone executables.NotebookLM-style podcasts with API in < 50 lines of Python code: A notebook-based tutorial leveraging the OpenAI API to generate a concise summary of the input text and then utilizes a text-to-speech (TTS) service to convert this summary into audio format.Language Translation with Python: Introduces LibreTranslate, a lightweight, Python-based API server for language detection and translation, particularly useful for analyzing data in non-English languages.🔑Best Practices and Advice🔏The Code That Almost Led to Disaster • A Starbase Story: Narrates a fictional scenario where a Python import error nearly causes a catastrophic failure on a starbase, illustrating the critical importance of understanding Python's import system to prevent such issues.Speed Up Your Python Program With Concurrency: Explains threading, asynchronous tasks, and multiprocessing and demonstrates how to optimize I/O-bound and CPU-bound tasks for improved performance.Efficient String Concatenation in Python: Covers methods including using the +and+=operators, the.join()method for lists, and tools likeStringIO for handling large datasets, with best practices for performance and flexibility.Interacting With Python: Explores ways and provides guidance to interact with Python, including using the interactive REPL mode, running scripts from files via the command line, working in IDEs, and leveraging tools like Jupyter Notebooks.Samuel Colvin on LogFire, mixing Python with Rust & being pydantic about types: This podcast episode featuresSamuel Colvin, who introducesLogFire, a new observability platform for Python, while exploring the increasing integration of Rust into Python’s ecosystem.🔍Featured Study: CODECLEANER💥In "CODECLEANER: Elevating Standards with a Robust Data Contamination Mitigation Toolkit," Cao et al. address the pervasive issue of data contamination in Code Language Models (CLMs). The study introduces CODECLEANER, an automated code refactoring toolkit designed to mitigate contamination, enabling more reliable performance evaluations for CLMs.ContextData contamination occurs when CLMs, trained on vast code repositories, inadvertently include test data, leading to inflated performance metrics. This undermines the credibility of CLMs in real-world applications, posing risks for software companies. Refactoring, a method of restructuring code without altering its functionality, offers a potential solution. However, the lack of automated tools and validated methods has hindered its adoption. CODECLEANER fills this gap by systematically evaluating refactoring operators for Python and Java code, ensuring they reduce contamination without semantic alterations.Key Features of CODECLEANERAutomated Code Refactoring: CODECLEANER provides a fully automated solution for restructuring code, eliminating the need for manual intervention while preserving original code functionality.Comprehensive Refactoring Operators: It includes 11 refactoring operators categorised into three distinct types catering to different aspects of code restructuring.Syntactic Refactoring: Operators such asif-condition flipping,loop transformations, anditeration changesalter code structure without affecting its semantics, offering lightweight syntactic adjustments.Semantic Refactoring: Advanced operators likeidentifier renamingandperformance measurement decoratorsdisrupt patterns that models memorise, significantly reducing overlap with training data.Code Style Modifications: Adjustments such asnaming style switches(e.g., camel case to snake case) andcode normalisation(e.g., consistent formatting) ensure stylistic uniformity while mitigating contamination.Cross-Language Functionality: While primarily designed for Python, CODECLEANER demonstrates adaptability by implementing selected operators in Java, addressing data contamination in a second language.Scalable Application: The toolkit works on both small-scale (method-level) and large-scale (class-level) codebases, proving its utility across various levels of complexity.Open Source and Accessible: CODECLEANER is available online, enabling widespread adoption and further research into mitigating data contamination in CLM evaluations.What This Means for YouThis study is particularly valuable for software developers and engineering teams seeking to integrate CLMs into production, researchers aiming to benchmark CLMs accurately, and organisations evaluating AI-based code tools. By addressing data contamination, CODECLEANER enhances the credibility and reliability of CLM-based solutions for real-world applications.Examining the DetailsThe researchers evaluated CODECLEANER by applying 11 refactoring operators to Python and Java code at method-, class-, and cross-class levels. Effectiveness was measured using metrics like N-gram overlap and perplexity across over 7000 code snippets sampled from The Stack dataset. Four Code Language Models (CLMs), including StarCoder and CodeLlama, were used to assess changes in contamination severity.Results showed that semantic operators, such as identifier renaming, reduced overlap by up to 39.3%, while applying all operators decreased overlap in Python code by 65%. On larger class-level Python codebases, contamination was reduced by 37%. Application to Java showed modest improvements, with the most effective operator achieving a 17% reduction.You can learn more by reading the entire paper and accessing the toolkit here.🧠 Expert insight💥Here’s an excerpt from “Chapter 4: Basics of Airflow and DAG Authoring” in the Apache Airflow Best Practices by Dylan Intorf, Dylan Storey, and Kendrick van Doorn, published in October 2024.Extracting images from the NASA APIThis pipeline is designed to extract an image every day, store this information in a folder, and notify you of the completion. This entire process will be orchestrated by Apache Airflow and will take advantage of the scheduler to automate the function of re-running. As stated earlier, it is helpful to spend timeworking through practicing this in Jupyter Notebook or another tool to ensure the API calls and connections are operating as expected and to troubleshootany issues.The NASA APIFor this data pipeline, we will be extracting data from NASA. My favorite API is theAstronomy Picture of the Day(APOD) where a new photo is selected and displayed. You can easily change the API to another of interest, butfor this example, I recommend you stick with the APOD and explore othersonce completed.A NASA API key is required to start thisnext step:Create a NASA APIkey (https://api.nasa.gov/).Input your name, email, and planned functional use ofthe API.Navigate to your email to locate the APIkey information.Figure 4.3: NASA API Key input screenshotBuilding an API request in Jupyter NotebookWith the environment configured and the API set up, we can begin authoring a DAG to automate this process. As a reminder, most Python code can be pre-tested in a system outside of Airflow, such as Jupyter Notebook or locally. If you are running into problems, it is recommended to spend time analyzing what the code is doing and workto debug.In Jupyter Notebook, we are going to use the following code block to represent the function of calling the API, accessing the location of the image, and then storing the image locally. We will keep this example as simple as possible and walk througheach step:import requestsimport jsonfrom datetime import datefrom NASA_Keys import api_keyurl = f'https://api.nasa.gov/planetary/apod?api_key={api_key}'response = requests.get(url).json()responsetoday_image = response['hdurl']r = requests.get(today_image)with open(f'todays_image_{date.today()}.png', 'wb') as f: f.write(requests.get(today_image).content)The preceding code snippet is normally how we recommend starting any pipeline, ensuring that the API is functional, the API key works, and the current network requirements are in place to perform the procedures. It is best to ensure that the network connections are available and that no troubleshooting alongside the information security or networking teamsis required.Here is how the code looks in our JupyterNotebook environment:We begin by importing the required libraries to support our code. These libraries includethe following:requests: A common Python library for making HTTP requests. It is an easy-to-use library that makes working with HTTP requests simple and allows for easy use ofGETandPOSTmethods.json: This library allows you to parse JSON from strings or files into a dictionaryor list.datetime: This library provides the currentdateandtimeparameters. We will use this later on to title theimage file.NASA_Keys: This is a local file to our machine holding theapi_keyparameter. This is used in this example to keep things as simple as possible and also maskthe variable.Figure 4.4: What your current Jupyter cell should look likeAfter importing the appropriate libraries and variables required, we construct a variable calledurlto house the HTTP request call including ourapi_keyvariable. This allows theapi_keyvariable to be included in the URL while hidden by a mask. It callsapi_keyfrom theNASA_Keysfile:url = f'https://api.nasa.gov/planetary/apod?api_key={api_key}'Next, we use therequestslibrary to perform an HTTPGETmethod call on the URL that we have created. This calls on the API to send information for our program to interpret. Finally, we convert this information from theGET call into JSON format. For our own understanding and analysis of the information being sent back, we print out the response to get a view of how the dictionaryis structured. In this dictionary, it seems that there is only one level with multiple key-value pairs including copyright , date ,explanation ,hdurl, media_type, service_version, title , and url:Figure 4.5: Response from the NASA API callIn the next step, we will utilize thehdurlkey to access the URL associated with the high-definition astronomy image of the day. Since I am an enthusiast and want the highest quality image available, I have decided that the highest definition available meets my user needs. This is a great example of a time to determine whether your users desire or need the highest quality available or whether there is an opportunity to deliver a product that meets their needs at a lower cost or lower requirementof memory.We storeresponse['hdurl']within thetoday_imagevariable for use in the next step for storing the image. This storing ofhdurlallows for manipulation of the string later on in thenext step:Figure 4.6: Saving the hdurl response in a variableIn the last step, we make use ofhdurland appenddate.today()to create a new name for the image each day. This is so that an image from yesterday does not have the same name as an image from today, thus reducing the risk of overwrites. There are additional ways to reduce the risk of overwrites, especially when creating an automated system, but this was chosen as the simplest option forour needs:Figure 4.7: Writing the image content to a local fileFinally, we can look in the local repository or folder and find that the image wasstored there:Figure 4.8: The image file we saved in the local repository or folderThis walk-through in Jupyter Notebook may seem ...excessive..., but taking the time to ensure the API is working and thinking through the logic of the common steps that need to be automated or repeated can be extremely beneficial when stepping into creating the Airflow DAG.Apache Airflow Best Practiceswas published in October 2024.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5535

Divya Anne Selvaraj

19 Nov 2024

PythonPro #56: Pandas Vectorized Operations, PyPI Deletion Rules, and ChatGPT vs. Gemini Accuracy Showdown

Divya Anne Selvaraj

19 Nov 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#56Pandas Vectorized Operations, PyPI Deletion Rules, and ChatGPT vs. Gemini Accuracy ShowdownHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Pandas Cookbook - Third Edition, which emphasizes the importance of using vectorized operations in pandas for better performance compared to Python loops.News Highlights:Technion launches PyPIM for in-memory computing in Python; PEP 763 limits PyPI deletions to 72 hours post-upload; and ColiVara API enables advanced document retrieval with visual embeddings.My top 5 picks from today’s learning resources:Any Python program fits in 24 characters*🧩Flash Attention derived and coded from first principles with Triton (Python)⚡Mastering Bivariate Maps with Plotly: A Step-by-Step Guide🗺️5 Overrated Python Libraries (And What You Should Use Instead)🔄Attemtps at immutability with dataclasses in Python🔒And, today’s Featured Study, evaluates AI programming tools ChatGPT, Gemini, AlphaCode, and GitHub Copilot, highlighting ChatGPT's highest single-attempt accuracy (87.2% pass@1), and Gemini's strong multi-attempt performance.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month'ssurvey is still live. Do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsResearchers develop Python code for in-memory computing — in-memory computation comes to Python code: Technion researchers have developed PyPIM, a tool that translates Python code into machine code for in-memory computing, enabling programmers to use Python without adaptation.PEP 763 – Limiting deletions on PyPI: PEP 763 proposes restricting the deletion of files, releases, and projects on PyPI to within 72 hours of upload, with exceptions for pre-release specifiers.ColiVara – State of the Art RAG API with Vision Models: ColiVara is a Python-based API and suite of tools for state-of-the-art document retrieval using visual embeddings, designed as a web-first implementation of the ColPali paper.💼Case Studies and Experiments🔬Any Python program fits in 24 characters*: Demonstrates how to encode any Python program in 24 characters (excluding whitespace) by exploiting the flexibility of whitespace encoding and Unicode representations.Judge a Book by its Color: How to Visualize Decades of Book Cover Colors from Scratch— Scraping, Data, and Design: Explores six decades of bestseller book cover colors using web scraping, ColorThief, and other libraries.📊AnalysisA pocket calculator using lambdatalk vs. Python: Compares building a browser-based pocket calculator using the lightweight functional programming language lambda talk with the Python-to-JavaScript transpiler Brython.Building a macOS app with python - BeeWare vs Kivy vs Flet vs Swift: Compares Python GUI frameworks BeeWare, Kivy, and Flet with Swift for building a macOS voice cloning app using the F5 TTS MLX model.🎓Tutorials and Guides🤓📽️Flash Attention derived and coded from first principles with Triton (Python): Provides a comprehensive tutorial on deriving and coding Flash Attention from scratch, covering mathematical foundations, CUDA, and Triton.Mastering Bivariate Maps with Plotly: A Step-by-Step Guide: Covers data generation, normalization, creating custom legends, and interactive map visualization, offering insights into crafting informative and visually appealing geospatial representations.1969: Can You Land on The Moon? • A Python `turtle` Lunar Lander: Demonstrates how to create a lunar landing game using Python’s turtle module, simulating realistic physics and controls for landing a lunar module.Generating realistic IoT data using Python & storing into MongoDB Timeseries Collection. Part 1: Guides you through generating realistic IoT sensor data streams using Python and storing them in MongoDB Time Series Collections.Vector animations with Python: A notebook demonstrating how to create dynamic vector animations in Python using Gizeh for vector graphics and MoviePy for animation.Dependent Types in 200 Lines of Python: Demonstrates building a type checker for the Calculus of Constructions (CoC) in Python, illustrating dependent types, type polymorphism, and precise type guarantees.Data in the Small: Python packagelittletable combines in-memory NoSQL ORM with schemaless setup(and easy CSV import/export): Introduces littletable, a lightweight Python package for in-memory NoSQL ORM with a schema-less setup, offering SQL-like features.🔑Best Practices and Advice🔏5 Overrated Python Libraries (And What You Should Use Instead): Critiques Requests, BeautifulSoup, Pandas, Matplotlib, and Scikit-Learn as outdated or inefficient for modern needs, and suggests alternatives.Python Dictionary Comprehensions: How and When to Use Them: Covers creating dictionaries from iterables, transforming existing ones, and filtering key-value pairs with conditionals, while also advising on best practices.Using the Python zip() Function for Parallel Iteration: Covers key concepts such as zip()'s lazy evaluation, handling unequal-length iterables, and using zip() to build dictionaries, alongside techniques like unzipping sequences.Using the len() Function in Python: Delves into applying len() to built-in and third-party types, like NumPy arrays and pandas DataFrames, as well as extending its functionality to user-defined classes via the .__len__() method.Attemtps at immutability with dataclasses in Python: Explores achieving immutability in Python through various methods, comparing old-style constants, new-style constants, dataclasses, enums, namedtuples, and metaprogramming.🔍Featured Study: Programming with AI💥In "Programming with AI: Evaluating ChatGPT, Gemini, AlphaCode, and GitHub Copilot for Programmers," Siam, Gu, and Cheng, compare four leading AI-powered tools for programming. The researchers from the New York Institute of Technology, aim to assess the tools' code-generation accuracy, capabilities, and implications for developers using rigorous benchmarks and evaluation metrics.ContextLLMs like ChatGPT, Gemini, AlphaCode, and GitHub Copilot use transformer architectures to process natural language and generate programming code. Tools such as these are revolutionising software development by automating code creation and assisting with problem-solving tasks. The study’s relevance lies in its comprehensive evaluation of their accuracy, efficiency, and potential to transform programming workflows. Metrics like pass@k (accuracy over k attempts) and test case pass rates (functional correctness) provide critical insight into the models' capabilities.Key FindingsChatGPT: GPT-4-Turbo-0125 achieved the highest accuracy (87.2% pass@1) on HumanEval, outperforming other models in single-attempt code generation.Gemini: Gemini-1.5-Pro scored 74.9% on HumanEval, while Gemini-Ultra excelled in multiple-attempt scenarios with a 74.7% pass@100 on Natural2Code.AlphaCode: Designed for competitive programming, AlphaCode achieved pass rates of 54% (Python), 51% (Java), and 45% (C++) on Codeforces challenges.GitHub Copilot: On LeetCode, Copilot attained test case pass rates of 75.7% (Java) and 73.3% (C++), enhancing productivity by offering real-time code suggestions.Ethical Issues: Models exhibit biases in outputs, risk copyright infringement, and occasionally produce plausible but incorrect code. GitHub Copilot, in particular, has faced criticism over intellectual property concerns.What This Means for YouThe study is particularly valuable for programmers, software engineers, and organisations using AI tools to streamline coding tasks. It highlights which tools excel in accuracy and productivity, enabling developers to make informed decisions based on their specific needs, such as competitive programming (AlphaCode) or real-time coding assistance (GitHub Copilot). Ethical concerns warrant careful oversight when using these tools in professional environments.Examining the DetailsThe study uses empirical methods, analysing performance across benchmarks like HumanEval, Codeforces, and Natural2Code. Metrics such as pass@1, pass@100, and test case pass rates were applied to ensure rigorous evaluation. By referencing 10 recent research papers, it validates the models' capabilities and relevance. However, the study also emphasises limitations, including computational costs and the need for human oversight due to occasional inaccuracies. Despite these challenges, the findings are robust, demonstrating how AI tools are reshaping the future of programming.You can learn more by reading the entire paper.🧠 Expert insight💥Here’s an excerpt from “Chapter 10: General Usage and Performance Tips” in the Pandas Cookbook - Third Edition by William Ayd and Matthew Harrison, published in October 2024.Use vectorized functions instead of loopsPython as a language is celebrated for its looping prowess. Whether you are working with a list or a dictionary, looping over an object in Python is a relatively easy task to perform, and can allow you to write really clean, concise code.Even though pandas is a Python library, those same looping constructs are ironically an impediment to writing idiomatic, performant code. In contrast to looping, pandas offersvectorized computations, i.e, computations that work with all of the elements contained within apd.Seriesbut which do not require you to explicitly loop.How to do itLet’s start with a simplepd.Seriesconstructed from a range:ser = pd.Series(range(100_000), dtype=pd.Int64Dtype())We could use the built-inpd.Series.summethod to easily calculate the summation:ser.sum()4999950000Looping over thepd.Seriesand accumulating your own result will yield the same number:result = 0for x in ser: result += xresult4999950000Yet the two code samples are nothing alike. Withpd.Series.sum, pandas performs the summation of elements in a lower-level language like C, avoiding any interaction with the Python runtime. In pandas speak, we would refer to this as avectorizedfunction.By contrast, theforloop is handled by the Python runtime, and as you may or may not be aware, Python is a much slower language than C.To put some tangible numbers forth, we can run a simple timing benchmark using Python’stimeitmodule. Let’s start withpd.Series.sum:timeit.timeit(ser.sum, number=1000)0.04479526499926578Let’s compare that to the Python loop:def loop_sum(): result = 0 for x in ser: result += xtimeit.timeit(loop_sum, number=1000)5.392715779991704That’s a huge slowdown with the loop!Generally, you should look to use the built-in vectorized functions of pandas for most of your analysis needs. For more complex applications, reach for the.agg,.transform,.map, and.applymethods, which were covered back inChapter 5, Algorithms and How to Apply Them. You should be able to avoid usingforloops in 99.99% of your analyses; if you find yourself using them more often, you should rethink your design, more than likely after a thorough re-read ofChapter 5, Algorithms and How to Apply Them.The one exception to this rule where it may make sense to use aforloop is when dealing with apd.GroupByobject, which can be efficiently iterated like a dictionary:df = pd.DataFrame({ "column": ["a", "a", "b", "a", "b"], "value": [0, 1, 2, 4, 8],})df = df.convert_dtypes(dtype_backend="numpy_nullable")for label, group in df.groupby("column"): print(f"The group for label {label} is:\n{group}\n")The group for label a is:column value0 a 01 a 13 a 4The group for label b is:column value2 b 24 b 8Pandas Cookbook - Third Edition was published in October 2024.Get the eBook for $39.99 $27.98Get the Print Book for $49.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, take the survey or just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5177

Divya Anne Selvaraj

12 Nov 2024

PythonPro #55: Python Project Packaging Guide, AWS Credentials Theft Alert, and PyTorch 2 Speeds Up ML

Divya Anne Selvaraj

12 Nov 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#55Python Project Packaging Guide, AWS Credentials Theft Alert, and PyTorch 2 Speeds Up MLHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, LLM Engineer's Handbook, which discusses comprehensive RAG evaluation through the Ragas and ARES frameworks.News Highlights: Malicious Python package "Fabrice" on PyPI has been stealing AWS credentials by mimicking Fabric; and PyTorch 2 boosts ML speeds with dynamic bytecode transformation, achieving 2.27x inference and 1.41x training speedups on NVIDIA A100 GPUs.My top 5 picks from today’s learning resources:A Comprehensive Guide to Python Project Management and Packaging📦20 Python scripts to automate common daily tasks🔄Principal Component Analysis with Python (A Deep Dive) -Part 1📉State of Python 3.13 Performance: Free-Threading🚀How I Got Started Making Maps with Python and SQL🗺️And, today’s Featured Study, introduces Magentic-One, a generalist multi-agent AI system developed by Microsoft Research, designed to coordinate specialised agents in tackling complex, multi-step tasks across diverse applications.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month'ssurvey is now live. Do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsMalicious Python package collects AWS credentials via 37,000 downloads: The malicious Python package "Fabrice" has been typosquatting on PyPI since 2021, gathering AWS credentials by imitating the popular Fabric SSH library.🎥PyTorch 2 Speeds Up ML with Dynamic Python Bytecode Transformation and Graph Compilation: Fully backward-compatible, the version achieves a 2.27x inference speedup and 1.41x training speedup on NVIDIA A100 GPUs, surpassing six other compilers across 180+ models.💼Case Studies and Experiments🔬LangGraph for Healthcare: A Comprehensive Technical Guide: Delves into the technical aspects of integrating LangGraph into healthcare workflows, emphasizing its potential to improve patient care, streamline administrative tasks, and facilitate medical researchTiming-sensitive analysis in python: explores timing sensitivity in Python through a hands-on approach, testing timing stability under different conditions (like input size and intentional delay).📊AnalysisState of Python 3.13 Performance: Free-Threading: Examines how free-threading affects Python's ability to handle parallel tasks, particularly through a practical example with a PageRank algorithm implementation.Modular, Faster DateOnly Operations in Python: Delves into the reasoning behind memory-efficient, modular date handling and performance optimization, examining different approaches and technologies like C structures.🎓Tutorials and Guides🤓How to Reset a pandas DataFrame Index:Covers methods like .reset_index(), .index, and .set_axis() while exploring index alignment, duplicate removal, multi-index handling, and using columns as indexes.A Comprehensive Guide to Python Project Management and Packaging: Covers structuring projects, managing dependencies, and creating distributable packages, along with practical examples.Effortless Image Cropping with Python: Automate Your Workflow in Minute: Shows you how to automate image cropping in Python using OpenCV to identify the focal area of an image and crop it to a 16:9 aspect ratio.Adding keyboard shortcuts to the Python REPL: Explains how to add custom keyboard shortcuts to the Python 3.13 REPL using a PYTHONSTARTUP file and the unsupported _pyrepl module.Simplifying News Scraping with Python’s Newspaper4k Library: Demonstrates how to use Python’s Newspaper4k library to automate the scraping and parsing of news articles, extracting key components.Principal Component Analysis with Python (A Deep Dive) -Part 1: Provides a step-by-step guide for reducing data dimensionality through mathematical and coding examples.Using the OpenAI Reatime API in python: Covers key challenges such as managing 24kHz, 16-bit audio format, avoiding jerky audio by handling recording and playback concurrently, and preventing echo by using a headset.🔑Best Practices and Advice🔏How to Fix the Most Common Python Coding Errors: Covers IndentationError, SyntaxError, TypeError (NoneType not subscriptable), IndexError (list index out of range), and KeyError (missing dictionary key).Do Constructors Return Values in OOP?: Clarifies that while constructors don’t explicitly return values, they implicitly return the newly created instance of the class, fulfilling their primary purpose of object initialization.20 Python scripts to automate common daily tasks: Lists scripts for tasks such as sending emails, scraping news, downloading stock prices, backing up files, posting to social media, fetching weather updates, and resizing images.What time is it? A simple approach to AI-agents: Explains how AI agents solve real-time queries by selecting and executing pre-defined functions, using tasks like fetching the current time and weather as examples.How I Got Started Making Maps with Python and SQL: Recounts the author’s journey into spatial data visualization using tools like DuckDB, H3, and GeoPandas to create interactive maps, from building density to 3D dashboards.🔍Featured Study: Magentic-One💥In "Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks," Fourney et al. from AI Frontiers - Microsoft Research aim to develop a versatile, multi-agent AI system capable of autonomously completing complex tasks. The study presents Magentic-One as a generalist solution that orchestrates specialised agents to tackle tasks that require planning, adaptability, and error recovery.ContextTo address the need for AI systems capable of handling a wide range of tasks, Magentic-One leverages a multi-agent architecture. In this setup, agents are AI-driven components, each with a distinct skill, such as web browsing or code execution, all working under the direction of an Orchestrator agent. The Orchestrator not only delegates tasks but monitors and revises strategies to keep progress on track, ensuring effective task completion. This system responds to the growing demand for agentic systems in AI—those able to handle tasks involving multiple steps, real-time problem-solving, and error correction.The importance of such systems has increased as AI technology advances in areas like software development, data analysis, and web-based research, where single-agent models often struggle with multi-step, unpredictable tasks. By developing Magentic-One as a generalist system, the researchers offer a foundation that balances adaptability and reliability across diverse applications, helping establish future standards for agentic AI systems.Key Features of Magentic-OneMulti-Agent Architecture: Magentic-One uses multiple agents, each specialising in a specific task, coordinated by an Orchestrator agent.Orchestrator-Led Dynamic Planning: The Orchestrator creates and adapts task plans, tracks progress, and initiates corrective actions as needed, improving resilience.Specialised Agents:Coder: Writes, analyses, and revises code.ComputerTerminal: Executes code, manages shell commands.WebSurfer: Browses the web, interacts with web pages.FileSurfer: Reads and navigates files of various types.Performance on Benchmarks: Magentic-One achieved high performance on challenging benchmarks like GAIA (38% completion rate) and AssistantBench (27.7% accuracy), positioning it competitively among state-of-the-art systems.AutoGenBench Tool for Evaluation: AutoGenBench offers a controlled testing environment, allowing for repeatable, consistent evaluation of agentic systems like Magentic-One.What This Means for YouThe study’s findings are particularly relevant for developers, researchers, and AI practitioners focused on real-world applications of AI for complex, multi-step tasks. For instance, fields such as autonomous software engineering, data management, and digital research can leverage Magentic-One's multi-agent system to automate complex workflows. Its modular, open-source design enables further adaptation, making it useful for those interested in customising AI tools to meet specific requirements or studying multi-agent coordination for diverse scenarios.Examining the DetailsThe researchers applied a rigorous methodology to assess Magentic-One's reliability and practical value. Key benchmarks included GAIA, AssistantBench, and WebArena, each with unique tasks requiring multi-step reasoning, data handling, and planning. To verify the system’s efficacy, Magentic-One’s performance was compared against established state-of-the-art systems. The study reports a 38% task completion rate on GAIA, positioning Magentic-One competitively among leading systems without modifying core agent capabilities.To analyse the system’s interactions and address limitations, the team examined errors in detail, identifying recurring issues such as repetitive actions and insufficient data validation. By tracking these errors and using AutoGenBench, an evaluation tool ensuring isolated test conditions, the researchers provided a clear, replicable performance baseline. Their approach underscores the importance of modularity in AI design, as Magentic-One's agents operated effectively without interfering with each other, demonstrating both reliability and extensibility.You can learn more by reading the entire paper or access the system here.🧠 Expert insight💥Here’s an excerpt from “Chapter 7: Evaluating LLMs” in the book, LLM Engineer's Handbookby Paul Iusztin and Maxime Labonne, published in October 2024.RAG evaluationWhile traditional LLM evaluation focuses on the model’s inherent capabilities, RAG evaluation requires a more comprehensive approach that considers both the model’s generative abilities and its interaction with external information sources.RAG systems combine the strengths ofLLMs with information retrieval mechanisms, allowing them to generate responses that are not only coherent and contextually appropriate but also grounded in up-to-date, externally sourced information. This makes RAG particularly valuable in fields where current and accurate information is crucial, such as news reporting, research, and customer support.The evaluation of RAG systems goes beyond assessing a standalone LLM. It requires examining the entire system’s performance, including:Retrieval accuracy: How well does the system fetch relevant information?Integration quality: How effectively is the retrieved information incorporated into the generated response?Factuality and relevance: Does the final output address the query appropriately while seamlessly blending retrieved and generated content?Key metrics for RAG evaluation include retrieval precision and recall, which measure the accuracy and comprehensiveness of the retrieved information. Additionally, the quality of integration between retrieved data and generated text is crucial, as is the overall factuality and coherence of the output.To illustrate how these metrics are applied in practice, consider a RAG system designed for a customer support chatbot in an e-commerce setting. In this scenario, the user asks “What’s your return policy for laptops purchased during the holiday sale?” The RAG pipeline finds relevant documents on the electronics return policy and documents on holiday sale terms. This additional context is appended at the end of the question, and the model uses it to respond:For laptops purchased during our holiday sale, you have an extended return period of 60 days from the date of purchase. This is longer than our standard 30-day return policy for electronics. Please ensure the laptop is in its original packaging with all accessories to be eligible for a full refund.Table 7.3: Example of output from a RAG pipeline designed for customer supportIn this pipeline, we can evaluate if the retrieved documents correspond to what was expected (retrieval accuracy). We can also measure the difference between responses with and without additional context (integration quality). Finally, we can assess whether the output is relevant and grounded in the information provided by the documents (factuality and relevance).In this section, we will cover two methods to evaluate how well RAG models incorporate external information into their responses.RagasRetrieval-Augmented Generation Assessment(Ragas) is an open-source toolkit designed to provide developers with a comprehensive set of tools for RAG evaluation and optimization. It’s designed around the idea ofmetrics-driven development(MDD), a product development approach that relies on data to make well-informed decisions, involving the ongoing monitoring of essential metrics over time to gain valuable insights into an application’s performance. By embracing this methodology, Ragas enables developers to objectively assess their RAG systems, identify areas for improvement, and track the impact of changes over time.One of the key capabilities of Ragas is its ability to synthetically generate diverse and complex test datasets. This feature addresses a significant pain point in RAG development, as manually creating hundreds of questions, answers, and contexts is both time-consuming and labor-intensive. Instead, it uses an evolutionary approach paradigm inspired by works like Evol-Instruct to craft questions with varying characteristics such as reasoning complexity, conditional elements, and multi-context requirements. This approach ensures a comprehensive evaluation of different components within the RAG pipeline.Additionally, Ragas can generate conversational samples that simulate chat-based question-and-follow-up interactions, allowing developers to evaluate their systems in more realistic scenarios.Figure 7.1: Overview of the Ragas evaluation frameworkAs illustrated inFigure 7.1, Ragas provides a suite of LLM-assisted evaluation metrics designed to objectively measure different aspects of RAG system performance. These metrics include:Faithfulness: This metric measures the factual consistency of the generated answer against the given context. It works by breaking down the answer into individual claims and verifying if each claim can be inferred from the provided context. The faithfulness score is calculated as the ratio of verifiable claims to the total number of claims in the answer.Answer relevancy: This metric evaluates how pertinent the generated answer is to the given prompt. It uses an innovative approach where an LLM is prompted to generate multiple questions based on the answer and then calculates the mean cosine similarity between these generated questions and the original question. This method helps identify answers that may be factually correct but off-topic or incomplete.Context precision: This metric evaluates whether all the ground-truth relevant items present in the contexts are ranked appropriately. It considers the position of relevant information within the retrieved context, rewarding systems that place the most pertinent information at the top.Context recall: This metric measures the extent to which the retrieved context aligns with the annotated answer (ground truth). It analyzes each claim in the ground truth answer to determine whether it can be attributed to the retrieved context, providing insights into the completeness of the retrieved information.Finally, Ragas also provides building blocks for monitoring RAG quality in production environments. This facilitates continuous improvement of RAG systems. By leveraging the evaluation results from test datasets and insights gathered from production monitoring, developers can iteratively enhance their applications. This might involve fine-tuning retrieval algorithms, adjusting prompt engineering strategies, or optimizing the balance between retrieved context and LLM generation.Ragas can be complemented with another approach, based on custom classifiers.ARESARES (an automated evaluation framework for RAG systems) is a comprehensive tool designed to evaluate RAG systems. It offers an automated process that combines synthetic data generation with fine-tuned classifiers to assess various aspects of RAG performance, including context relevance, answer faithfulness, and answer relevance.The ARES framework operates in three main stages: synthetic data generation, classifier training, and RAG evaluation. Each stage is configurable, allowing users to tailor the evaluation process to their specific needs and datasets.In the synthetic data generation stage, ARES creates datasets that closely mimic real-world scenarios for robust RAG testing. Users can configure this process by specifying document file paths, few-shot prompt files, and output locations for the synthetic queries. The framework supports various pre-trained language models for this task, with the default being google/flan-t5-xxl. Users can control the number of documents sampled and other parameters to balance between comprehensive coverage and computational efficiency.Figure 7.2: Overview of the ARES evaluation frameworkThe classifier training stage involves creating high-precision classifiers to determine the relevance and faithfulness of RAG outputs. Users can specify the classification dataset (typically generated from the previous stage), test set for evaluation, label columns, and model choice. ARES uses microsoft/deberta-v3-large as the default model but supports other Hugging Face models. Training parameters such as the number of epochs, patience value for early stopping, and learning rate can be fine-tuned to optimize classifier performance.The final stage, RAG evaluation, leverages the trained classifiers and synthetic data to assess the RAG model’s performance. Users provide evaluation datasets, few-shot examples for guiding the evaluation, classifier checkpoints, and gold label paths. ARES supports various evaluation metrics and can generate confidence intervals for its assessments.ARES offers flexible model execution options, supporting both cloud-based and local runs through vLLM integration. The framework also supports various artifact types (code snippets, documents, HTML, images, and so on), enabling comprehensive evaluation across different RAG system outputs.In summary, Ragas and ARES complement each other through their distinct approaches to evaluation and dataset generation. Ragas’s strength in production monitoring and LLM-assisted metrics can be combined with ARES’s highly configurable evaluation process and classifier-based assessments. While Ragas may offer more nuanced evaluations based on LLM capabilities, ARES provides consistent and potentially faster evaluations once its classifiers are trained. Combining them offers a comprehensive evaluation framework, benefiting from quick iterations with Ragas and in-depth, customized evaluations with ARES at key stages.LLM Engineer's Handbook was published in October 2024.Get the eBook for $47.99Get the Print Book for $59.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, take the survey or just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5868

Divya Anne Selvaraj

05 Nov 2024

PythonPro #54: Global Forecasting Models, Python Overtakes JavaScript, and Hidden Python Libraries

Divya Anne Selvaraj

05 Nov 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#54Global Forecasting Models, Python Overtakes JavaScript, and Hidden Python LibrariesHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Modern Time Series Forecasting with Python - Second Edition, which explains the shift from traditional, isolated time series models to global forecasting models, which leverage related datasets to enhance scalability, accuracy, and reduce overfitting in large-scale applications.News Highlights: Python has overtaken JavaScript on GitHub, driven by its role in AI and data science, per GitHub's Octoverse 2024 report; and IBM’s Deep Search team has released Docling v2, a Python library for document extraction with models on Hugging Face.Hidden Python Libraries That Will Blow Your Mind🌟Python threading and subprocesses explained🧵Books are Datasets: Mapping 12 Sacred Texts with Python and D3.js📖Python Closures: Common Use Cases and Examples➿ChatGPT-4o cannot run proper Generalized Additive Models currently— but can correctly interpret results from R🧠And, today’s Featured Study, introducesSafePyScript, a machine-learning-based tool developed by researchers at the University of Passau, Germany, for detecting vulnerabilities in Python code.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython has overtaken JavaScript on GitHub: GitHub’s Octoverse 2024 report reveals Python as the most popular language on GitHub, driven by its role in AI, data science, and machine learning. Jupyter Notebooks usage has also surged.Docling: Document extraction Python library from the Deep Search team at IBM: IBM’s Deep Search team released Docling v2, an MIT-licensed Python library for document extraction with custom models available on Hugging Face.💼Case Studies and Experiments🔬Programmed differently? Testing for gender differences in Python programming style and quality on GitHub: The study confirms that programming style can predict gender but these differences do not impact code quality.Tune your guitar with python: Demonstrates using Python’s sounddevice and matplotlib modules to create a real-time guitar tuner, where a live spectrogram identifies key bass guitar note frequencies for tuning, with a custom interface..📊AnalysisPackage compatibility tracker: Python 3.13 free-threading and subinterpreters: This compatibility tracker shows that 83% of the 500 most downloaded Python packages are compatible with Python 3.13’s new free-threading feature, while 73% support importing without GIL in Docker tests.Hidden Python Libraries That Will Blow Your Mind: Introduces six powerful yet lesser-known Python libraries including Streamlit for quick app-building, PyWhatKit for task automation, and Typer for simplified CLIs.🎓Tutorials and Guides🤓Python threading and subprocesses explained: Details Python’s threading and multiprocessing methods to handle parallel tasks, using thread/process pools for I/O-bound and CPU-bound tasks.Tiny GraphRAG in 1000 lines ofPython:introduces a simplified, local implementation of the GraphRAG algorithm, using a graph-based structure to enhance RAG for more contextually aware information retrieval.Building AI chatbots with marimo: Covers how to create adaptable chatbots capable of sophisticated tasks, including visualizing data and processing diverse inputs.Explanation of Vision Transformer with implementation: Provides an in-depth explanation and step-by-step implementation of Vision Transformer, covering key concepts such as embedding patches with code examples.PyD-Net: Advancing Real-Time Depth Estimation for Mobile and Embedded Devices: Showcases PyD-Net's practical applications across autonomous navigation, augmented reality, assistive technology, and design.Explore Solvable and Unsolvable Equations with Python: Delves into solving equations in Python, discussing when closed-form solutions are feasible and when numerical methods become necessary.Books are Datasets: Mapping 12 Sacred Texts with Python and D3.js: Explores using Python and D3.js to analyze and visualize 12 major sacred texts as datasets, showcasing text-processing techniques to reveal connections and patterns within and between these religious texts.🔑Best Practices and Advice🔏Variables in Python: Usage and Best Practices: Covers variable creation, dynamic typing, expressions, and best practices for naming and using variables in various scopes along with parallel assignments, and iterable unpacking.The Python Square Root Function: Details Python’s sqrt() function from the math module, explaining its use for calculating square roots of positive numbers and zero, while raising errors for negative inputs.Python Closures: Common Use Cases and Examples: Explains Python closures, inner functions that capture variables from their surrounding scope, enabling state retention, function-based decorators, and encapsulation.Python ellipses considered harmful: Argues that using ellipses (... ) to declare unimplemented methods in Python’s abstract classes can lead to hidden errors, and advocates for raise NotImplementedError instead.ChatGPT-4o cannot run proper Generalized Additive Models currently— but it can correctly interpret results from R: Highlights limitations of ChatGPT-4o in advanced statistical modeling, informing Python users about workarounds and considerations when working with similar tools.🔍Featured Study: SafePyScript💥In "SafePyScript: A Web-Based Solution for Machine Learning-Driven Vulnerability Detection in Python," Farasat et al., researchers from the University of Passau,Germany, introduce SafePyScript, a machine-learning-based web tool designed to detect vulnerabilities in Python code.ContextIn software development, identifying vulnerabilities is a major concern due to the security risks posed by cyberattacks. Vulnerabilities, or flaws in code that can be exploited by attackers, require constant detection and correction. Traditionally, vulnerability detection relies on:>Static Analysis: This rule-based approach scans code for known vulnerability patterns but often results in high false positives.>Dynamic Analysis (Penetration Testing): This approach tests code in a runtime environment, relying on security experts to simulate potential attacks, making it resource-heavy and dependent on professional expertise.Machine learning offers a data-driven alternative, enabling automated vulnerability detection with improved accuracy. Despite its popularity, Python lacks dedicated machine-learning-based tools for this purpose, which SafePyScript aims to provide. SafePyScript leverages a specific machine learning model, BiLSTM (Bidirectional Long Short-Term Memory), and the ChatGPT API to not only detect but also propose secure code, addressing this gap for Python developers.Key Features of SafePyScriptBiLSTM Model for Vulnerability Detection: Trained on word2vec embeddings, this model has achieved an accuracy of 98.6% and ROC of 99.3% for Python code vulnerabilities.Integration with ChatGPT API: SafePyScript uses ChatGPT (Turbo 3.5) to analyse and generate secure alternatives for vulnerable code.Common Vulnerabilities Addressed: These include SQL Injection, Cross-Site Scripting (XSS), Remote Code Execution, Cross-Site Request Forgery (XSRF), and Open Redirect.User-Friendly Interface: Built using Django (backend) and HTML, CSS, and JavaScript with Ajax (frontend) for a responsive, accessible user experience.Report Generation: Users can download detailed reports on vulnerabilities detected in their code, making it easier to track and resolve issues systematically.Feedback Mechanism: Users can provide feedback, allowing for tool improvement and adaptation to new security threats.What This Means for YouSafePyScript is most useful for Python developers and software engineers who need an efficient way to detect vulnerabilities in their code without relying on traditional, labour-intensive methods. Its machine-learning foundation and integration with ChatGPT make it highly practical for real-world application, providing not only insights into code vulnerabilities but also generating secure code alternatives.Examining the DetailsSafePyScript’s effectiveness rests on a robust BiLSTM model. This model, using word2vec embeddings, achieved an impressive 98.6% accuracy, 96.2% precision, and 99.3% ROC in vulnerability detection. The researchers optimised the BiLSTM’s hyperparameters—such as a learning rate of 0.001 and a batch size of 128—through rigorous testing, achieving reliable results as benchmarks.Additionally, SafePyScript leverages ChatGPT’s language model to generate secure code alternatives. The research team implemented precise prompt engineering to maximise ChatGPT’s effectiveness in analysing Python code vulnerabilities, further supporting the tool’s usability.SafePyScript’s frontend design, built with HTML, CSS, JavaScript (with Ajax), and a Django backend, ensures a smooth user experience. This structure allows developers to log in, upload or import code, select detection models, review reports, and access secure code—all within an intuitive, accessible platform.You can learn more by reading the entire paper or accessing SafePyScript.🧠 Expert insight💥Here’s an excerpt from “Chapter 6: Time Series Forecasting as Regression” in the book, Modern Time Series Forecasting with Python - Second Edition by Manu Joseph and Jeffrey Tackes, published in October 2024.Global forecasting models – a paradigm shiftTraditionally, each time series was treated in isolation. Because of that, traditional forecasting has always looked at the history of a single time series alone in fitting a forecasting function. But recently, because of the ease of collecting data in today's digital-first world, many companies have started collecting large amounts of time series from similar sources, or related time series.For example, retailers such as Walmart collect data on sales of millions of products across thousands of stores. Companies such as Uber or Lyft collect the demand for rides from all the zones in a city. In the energy sector, energy consumption data is collected across all consumers. All these sets of time series have shared behavior and are hence calledrelated time series.We can consider that all the time series in a related time series come from separatedata generating processes(DGPs), and thereby model them all separately. We call these thelocalmodels of forecasting. An alternative to this approach is to assume that all the time series are coming from a single DGP. Instead of fitting a separate forecast function for each time series individually, we fit a single forecast function to all the related time series. This approach has been calledglobalorcross-learningin literature.The terminologyglobalwas introduced byDavid Salinas et al.in theDeepARpaper andCross-learningbySlawek Smyl....having more data will lead to lower chances of overfitting and, therefore, lower generalization error (the difference between training and testing errors). This is exactly one of the shortcomings of the local approach. Traditionally, time series are not very long, and in many cases, it is difficult and time-consuming to collect more data as well. Fitting a machine learning model (with all its expressiveness) on small data is prone to overfitting. This is why time series models that enforce strong priors were used to forecast such time series, traditionally. But these strong priors, which restrict the fitting of traditional time series models, can also lead to a form of underfitting and limit accuracy.Strong and expressive data-driven models, as in machine learning, require a larger amount of data to have a model that generalizes to new and unseen data. A time series, by definition, is tied to time, and sometimes, collecting more data means waiting for months or years and that is not desirable. So, if we cannot increase thelengthof the time-series dataset, we can increase thewidthof the time series dataset. If we add multiple time series to the dataset, we increase the width of the dataset, and there by increase the amount of data the model is getting trained with.Figure 5.7shows the concept of increasing the width of a time series dataset visually:Figure 5.7 – The length and width of a time series datasetThis works in favor of machine learning models because with higher flexibility in fitting a forecast function and the addition of more data to work with, the machine learning model can learn a more complex forecast function than traditional time series models, which are typically shared between the related time series, in a completely data-driven way.Another shortcoming of the local approach revolves around scalability. In the case of Walmart we mentioned earlier, there are millions of time series that need to be forecasted and it is not possible to have human oversight on all these models. If we think about this from an engineering perspective, training and maintaining millions of models in a production system would give any engineer a nightmare. But under the global approach, we only train a single model for all these time series, which drastically reduces the number of models we need to maintain and yet can generate all the required forecasts.This new paradigm of forecasting has gained traction and has consistently been shown to improve the local approaches in multiple time series competitions, mostly in datasets of related time series. In Kaggle competitions, such asRossman Store Sales(2015),Wikipedia WebTraffic Time Series Forecasting(2017),Corporación Favorita Grocery Sales Forecasting(2018), andM5 Competition(2020), the winning entries were all global models—either machine learning or deep learning or a combination of both. TheIntermarché Forecasting Competition(2021) also had global models as the winning submissions. Links to these competitions are provided in theFurther readingsection.Although we have many empirical findings where the global models have outperformed local models for related time series, global models are still a relatively new area of research.Montero-Manson and Hyndman(2020) showed a few very interesting results and showed that any local method can be approximated by a global model with required complexity, and the most interesting finding they put forward is that the global model will perform better, even with unrelated time series. We will talk more about global models and strategies for global models in Chapter 10,Global Forecasting Models.Modern Time Series Forecasting with Python - Second Edition was published in October 2024.Get the eBook for $46.99 $31.99!Get the Print Book for $57.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
4644

Divya Anne Selvaraj

29 Oct 2024

PythonPro #53: FastAPI on Docker, Python-CUDA Integration with Numbast, and Concurrent Requests with httpx vs aiohttp

Divya Anne Selvaraj

29 Oct 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#53FastAPI on Docker, Python-CUDA Integration with Numbast, and Concurrent Requests with httpx vs aiohttpHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, FastAPI Cookbook, which explains how to deploy FastAPI apps using Docker, covering Dockerfile creation, image building, and container generation.News Highlights: Numbast simplifies Python-CUDA C++ integration by auto-generating Numba bindings for CUDA functions; and DJ Beat Drop enhances Django’s new developer onboarding with a streamlined project initializer.Concurrent Requests in Python: httpx vs aiohttp🚦Python Thread Safety: Using a Lock and Other Techniques🔐Time-Series Data Meets Blockchain: Storing Time-Series Data with Solidity, Ganache and Python⛓️Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces🧩Optimization of Iceberg Table In AWS Glue🧊And, today’s Featured Study, introduces LSS-SKAN, a Kolmogorov–Arnold Network (KAN) variant that uses a single-parameter function (Shifted Softplus) for efficient accuracy and speed.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:Thank you to those who participated in this month's survey. With this issue, we have tried to fulfill at least one request made by each participant. Keep an eye out for next month's survey.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsBridging the CUDA C++ Ecosystem and Python Developers with Numbast: Numbast streamlines the integration of CUDA C++ libraries with Python by automatically generating Numba bindings for CUDA functions.Improving the New Django Developer Experience: Introduces DJ Beat Drop as a streamlined project initializer to improve the onboarding experience for new Django developers.💼Case Studies and Experiments🔬Concurrent Requests in Python: httpx vs aiohttp: Describes how switching from the httpx to aiohttp library resolved high-concurrency issues and improved stability in a computer vision application.From Python to CPU instructions: Part 1: Explains how rewriting a Python program in C exposes low-level details Python abstracts away, particularly highlighting the manual effort required for tasks like input handling.📊AnalysisPython 3.13, what didn't make the headlines: highlights Python 3.13's understated but impactful improvements, focusing on debugging enhancements, filesystem fixes, and minor concurrency updates.When should you upgrade to Python 3.13?: Advises waiting until December 2024 for Python 3.13 upgrades to ensure compatibility with libraries, tools, and bug-fix improvements.🎓Tutorials and Guides🤓Python Thread Safety: Using a Lock and Other Techniques: Explains how to address issues like race conditions and introduces synchronization techniques such as semaphores to ensure safe, concurrent code execution.Time-Series Data Meets Blockchain: Storing Time-Series Data with Solidity, Ganache and Python: Walks you through the steps to set up Ethereum locally, deploy a smart contract, and store and retrieve data points.Beautiful Soup: Build a Web Scraper With Python: Covers how to inspect site structure, scrape HTML content, and parse data using Requests and Beautiful Soup to build a script that extracts and displays job listings.🎥Advanced Web Scraping Tutorial! (w/ Python Beautiful Soup Library): Covers Requests to retrieve and parse data, especially from dynamic pages like Walmart's, with enhancements like using modified headers.Fuzzy regex matching in Python: Introduces the orc library to simplify fuzzy matching by providing a human-friendly interface that highlights edits and can invert changes, enhancing usability for complex text correction tasks.Achieving Symmetrical ManyToMany Filtering in Django Admin: Covers using Django's RelatedFieldWidgetWrapper and a custom ModelForm , allowing for consistent filtering on both sides of a ManyToMany relationship.Get started with the free-threaded build of Python 3.13: Details installation, usage in Python programs, compatibility with C extensions, and how to detect GIL status programmatically.🔑Best Practices and Advice🔏Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces: Details how variables are resolved in local, enclosing, global, and built-in scopes, using accessible examples to clarify potential pitfalls.🎥Robust LLM pipelines (Mathematica, Python, Raku): Given the unreliable and often slow nature of LLMs, this presentation outlines methods to enhance pipeline efficiency, robustness, and usability.A new way of Python Debugging with the Frame Evaluation API: Introduces Python's Frame Evaluation API, a tool that allows real-time monitoring and control of program execution at the frame level.Buffers on the edge: Python and Rust: Explains how Python's buffer protocol, which enables memory sharing between objects, can lead to undefined behavior due to data races in C, and the challenges Rust faces in maintaining soundness.Optimization of Iceberg Table In AWS Glue: Discusses how AWS Glue offers built-in optimization, but a Python-based solution using boto3 and Athena SQL scripts provides customizable, cost-effective automation.🔍Featured Study: LSS-SKAN💥In "LSS-SKAN: Efficient Kolmogorov–Arnold Networks based on Single-Parameterized Function," Chen and Zhang from South China University of Technology present a refined Kolmogorov–Arnold Network (KAN) variant. Their study introduces an innovative design principle for neural networks, improving accuracy and computational speed while ensuring greater model interpretability.ContextKANs are neural networks based on the Kolmogorov-Arnold theorem, which breaks down complex, multivariate functions into simpler univariate ones, aiding in better visualisation and interpretability. This makes them valuable in critical decision-making applications, where understanding a model's decision process is crucial. Unlike typical neural networks like Multilayer Perceptrons (MLPs), which rely on opaque linear and activation functions, KANs assign functions to network edges, creating a more interpretable structure. Over time, several KAN variants, such as FourierKAN and FastKAN, have emerged, each with unique basis functions to balance speed and accuracy.LSS-SKAN builds on these advancements with the Efficient KAN Expansion (EKE) Principle, a new approach that scales networks using fewer complex basis functions, allocating parameters to the network's size instead. This principle is central to LSS-SKAN's efficiency and demonstrates how a simpler basis function can yield high accuracy with reduced computational cost.Key Features of LSS-SKANEKE Principle: Scales the network by prioritising size over basis function complexity, making LSS-SKAN faster and more efficient.Single-Parameter Basis Function: Utilises the Shifted Softplus function, requiring only one learnable parameter for each function, which simplifies the network and reduces training time.Superior Accuracy: Outperforms KAN variants, showing a 1.65% improvement over Spl-KAN, 2.57% over FastKAN, 0.58% over FourierKAN, and 0.22% over WavKAN on the MNIST dataset.Reduced Training Time: Achieves significant reductions in training time, running 502.89% faster than MLP+rKAN and 41.78% faster than MLP+fKAN.What This Means for YouFor those working in machine learning or fields requiring interpretable AI, LSS-SKAN offers a practical solution to enhance neural network accuracy and speed while maintaining transparency in model decision-making. LSS-SKAN is particularly beneficial in applications involving image classification, scientific computing, or scenarios demanding high interpretability, such as medical or financial sectors where model explainability is crucial.Examining the DetailsThe researchers conducted detailed experiments using the MNIST dataset to measure LSS-SKAN’s performance against other KAN variants. They tested both short-term (10-epoch) and long-term (30-epoch) training cycles, focusing on two key metrics: accuracy and execution speed.Through these tests, LSS-SKAN consistently outperformed other KAN models in accuracy, achieving a 1.65% improvement over Spl-KAN, 2.57% over FastKAN, and 0.58% over FourierKAN, while also running 502.89% faster than MLP+rKAN and 41.78% faster than MLP+fKAN.The LSS-SKAN Python library is available on GitHub, along with experimental code, so you can replicate and build on their findings. They recommend a learning rate between 0.0001 and 0.001 for best results, particularly due to KANs’ sensitivity to learning rate adjustments.You can learn more by reading the entire paper and accessing LSS-SKAN.🧠 Expert insight💥Here’s an excerpt from “Chapter 12: Deploying and Managing FastAPI Applications” in the book, FastAPI Cookbook by Giunio De Luca, published in August 2024.Running FastAPI applications in Docker containersDockeris a useful tool that lets developers wrap applications with their dependencies into a container. This method makes sure that the application operates reliably in different environments, avoiding the commonworks on my machine issue. In this recipe, we will see how to make aDockerfile and run a FastAPI application inside a Docker container. By the end of this guide, you will know how to put your FastAPI application into a container, making it more flexible and simpler to deploy.Getting readyYou will benefit from some knowledge of container technology, especially Docker, to follow the recipe better. But first, check thatDocker Engineis set up properly on your machine. You can see how to do it at thislink:https://docs.docker.com/engine/install/.If you use Windows, it is better to installDocker Desktop, which is a Docker virtual machine distribution with a built-ingraphical interface.Whether you have Docker Engine or Docker Desktop, make sure the daemon is running by typingthis command:$ docker imagesIf you don’t see any error about the daemon, that means that Docker is installed and working on the machine. The way to start the Docker daemon depends on the installation you choose. Look at the related documentation to see how todo it.You can use the recipe for your applications or follow along with theLive Applicationapplication that we introduced in the first recipe, which we are using throughoutthe chapter.How to do it…It is not very complicated to run a simple FastAPI application in a Docker container. The process consists ofthree steps:Createthe Dockerfile.Buildthe image.Generatethe container.Then, you just have to run the container to have theapplication working.Creating the DockerfileThe Dockerfile contains the instructions needed to build the image from an operating system and the file we wantto specify.It is good practice to create a separate Dockerfile for the development environment. We will name itDockerfile.devand place it under the projectroot folder.We start the file by specifying the base image, which will beas follows:FROM python:3.10This will pull an image from the Docker Hub, which already comes with Python 3.10 integrated. Then, we create a folder called/codethat will hostour code:WORKDIR /codeNext, we copyrequirements.txtinto the image and install the packages insidethe image:COPY ./requirements.txt /code/requirements.txtRUN pip install --no-cache-dir -r /code/requirements.txtThepip installcommand runs with the--no-cache-dirparameter to avoidpipcaching operations that wouldn’t be beneficial inside a container. Also, in a production environment, for larger applications, it is recommended to pin fixed versions of the packages inrequirements.txtto avoid potential compatibility issues due topackage upgrades.Then, we can copy theappfolder containing the application into the image with thefollowing command:COPY ./app /code/appFinally, we define the server startup instructionas follows:CMD ["fastapi", "run", "app/main.py", "--port", "80"]This is all we need to create ourDockerfile.devfile.Building the imageOnce we haveDockerfile.dev, we can build the image. We can do it by running the following from the command line at the project rootfolder level:$ docker build -f Dockerfile.dev -t live-application .Since we named our DockerfileDockerfile.dev, we should specify it in an argument. Once the build is finished, you can check that the image has been correctly built by runningthe following:$ docker images live-applicationYou should see the details of the image on the output printlike this:REPOSITORY TAG IMAGE ID CREATED SIZElive-application latest 7ada80a535c2 43 seconds ago 1.06GBWith the image built, we can proceed with creating thecontainer creation.Creating the containerTo create the container and run it; simply runthe following:$ docker run -p 8000:80 live-applicationThis will create the container and run it. We can see the container by runningthe following:$ docker ps -aSince we didn’t specify a container name, it will automatically affect a fancy name. Mine, for example,isbold_robinson.Open the browser onhttp://localhost:8000and you will see the home page response ofour application.This is all you need to run a FastAPI application inside a Docker container. Running a FastAPI application in a Docker container is a great way to use the advantages of both technologies. You can easily scale, update, and deploy your web app withminimal configuration.See alsoThe Dockerfile can be used to specify several features of the image. Check the list of commands in the official documentation:Dockerfilereference:https://docs.docker.com/reference/dockerfile/Docker CLI documentation:https://docs.docker.com/reference/cli/docker/FastAPI in Containers - Docker:https://fastapi.tiangolo.com/deployment/docker/FastAPI Cookbook was published in August 2024.Get the eBook for $35.99 $24.99!Get the Print Book for $44.99 $30.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5997

Divya Anne Selvaraj

22 Oct 2024

PythonPro #52: AI-Powered Vulnhuntr for Python, SageMaker Core SDK, and Exploring User Behaviour with Python

Divya Anne Selvaraj

22 Oct 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#52AI-Powered Vulnhuntr for Python, SageMaker Core SDK, and Exploring User Behaviour with PythonHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Building AI Applications with OpenAI APIs - Second Edition, which discusses how to create a language translation desktop app using OpenAI's ChatGPT API and Microsoft Word.News Highlights: Protect AI to release Vulnhuntr, an AI tool for detecting Python zero-day vulnerabilities; Amazon launches SageMaker Core, a Python SDK simplifying machine learning with object-oriented interfaces; and PyCharm becomes the official IDE of OpenCV as JetBrains joins as a Silver Member.Comprehensive Python Cheatsheet📚Exploring User Behavior: A Python Case Study of Bike-Sharing Company Dataset🚴‍♂️Python's property(): Add Managed Attributes to Your Classes🔧Python approach to the Semantic Web: exploring linked data and RDF🌐Assert vs. Raise: When to Use Each in Your ML/AI Projects⚠️And, today’s Featured Study, presents ChangeGuard, a tool designed to compare code behaviour before and after changes to detect functionality modifications.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month's survey is still live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsOpen source LLM tool primed to sniff out Python zero-days: Researchers with Seattle-based Protect AI will soon release Vulnhuntr, an AI-powered open-source tool that uses Claude AI to detect zero-day vulnerabilities in Python codebases by analyzing entire call chains for security issues.Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker: The SDK will simplify the machine learning lifecycle by replacing complex JSON structures with object-oriented interfaces.Press Release: PyCharm Becomes Official IDE of OpenCV, JetBrains Joins as Silver Member: As a Silver Member, JetBrains will financially support OpenCV, ensuring its resources remain free.💼Case Studies and Experiments🔬Part 2: Data Quality Dashboard: A Visual Approach to Monitoring Expectations in Databricks: Explains how to quickly identify issues using graphical representations like pie charts and bar charts.Exploring User Behavior: A Python Case Study of Bike-Sharing Company Dataset: UsesPython to uncover user behaviour patterns and develop strategies to convert casual riders into annual members.📊Analysis🎥Russell Keith-Magee on Beeware, packaging, GUI & money in Python: Focuses on the challenges of cross-platform Python packaging, particularly for desktop and mobile platforms and discusses how BeeWare helps developers.Should you use uv’s managed Python in production?: Advises careful consideration of uv’s production readiness, noting recent improvements but recommending thorough evaluation based on project-specific risks.🎓Tutorials and Guides🤓Python's property(): Add Managed Attributes to Your Classes: Covers creating read-only, read-write, and computed properties, logging, and more, while maintaining a stable public API for your classes.A Multi-Agent AI Chatbot App using Databutton and Swarm: Explains how different agents can collaborate and hand off tasks, with an example of a multi-agent healthcare chatbot that connects users to specialized agents.Understanding Pluggable Authentication Module (PAM) and Creating a Custom One in Python: Covers PAM’s architecture, module stacks, and control flags and walks you through building and integrating a custom PAM.Python approach to the Semantic Web: exploring linked data and RDF: Covers creating RDF triples, querying SPARQL endpoints, and visualizing relationships using NetworkX.Understanding Web Scraping in Python and Scrapy: Explains what web scraping is, its significance, and the tools required, such as BeautifulSoup, Requests, and Scrapy.🎥A hand-holding guide to writing FUSE-based filesystems in Python: Covers the process of creating Python-based FUSE file systems, from basic functionality to more advanced features like file attributes.Adding syntax to the cpython interpreter: Demonstrates how to add new syntax to Python, specifically making ternary statements default to None when no else condition is provided, similar to Ruby.🔑Best Practices and Advice🔏What I Learned from Making the Python Backend for YouTube Transcript Optimizer: Explains the process of building the Python backend for a YouTube Transcript Optimizer using FastAPI and SQLmodel.Comprehensive Python Cheatsheet: An extensive resource covering a wide array of Python topics, including syntax, data structures, and advanced concepts.How to Use Lambda Functions in Python: Covers their syntax, common use cases with functions like map(), filter(), and sorted(), along with advantages, limitations, and best practices for effective use in simplifying code.Assert vs. Raise: When to Use Each in Your ML/AI Projects: Discusses when to use assert for internal checks during development and raise for handling user-facing errors in ML/AI projects to ensure robust error handling.Structural Pattern Matching in Python: Explores customizing pattern matching for classes, extracting nested data, and common limitations in Python’s implementation.🔍Featured Study: ChangeGuard - Validating Code Changes via Pairwise Learning-Guided Execution💥In "ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution," Gröninger et al. present a tool called ChangeGuard, which compares code behaviour before and after changes to determine whether the modifications alter functionality.ContextValidating whether code changes preserve intended behaviour is a key challenge in software development, particularly when changes are deep within complex projects. Developers may make modifications to improve readability, performance, or to fix bugs, but unintended changes in functionality can lead to errors. Current methods, such as regression testing, often fail to catch these subtle changes. This study is relevant because it introduces a more reliable approach—ChangeGuard, which uses pairwise learning-guided execution. This approach involves running two versions of a code snippet simultaneously and predicting values to ensure the code runs correctly, even in complex scenarios.Key Featured of ChangeGuardPairwise learning-guided execution: Simultaneously executes old and new versions of code to compare their runtime behaviour.Value injection: Predicts and injects missing or uninitialised values, ensuring the code executes smoothly and reaches all relevant paths.High precision and recall: Achieves 77.1% precision and 69.5% recall in identifying behaviour-altering code changes.Extensive evaluation: Tested on 224 manually annotated code changes and datasets generated by automated refactoring tools.Outperforms regression tests: Traditional regression tests only achieved 7.6% recall in identifying semantics-changing code modifications.What This Means for YouThis paper will be most useful for software developers, especially those working with large and complex codebases. It provides practical insights into validating code changes more effectively than existing methods, offering a way to catch unintended behaviour early in the development process. Developers using automated refactoring tools or large language models like GPT-4 will particularly benefit from ChangeGuard's ability to detect subtle, behaviour-altering modifications.Examining the DetailsChangeGuard's methodology is based on pairwise learning-guided execution, an extension of an existing technique. It predicts missing values dynamically, ensuring more execution paths are covered than previous approaches. The tool was evaluated on 224 annotated code changes from popular Python open-source projects, showing high accuracy in detecting semantics changes. Additionally, ChangeGuard was applied to automated refactoring tools and large language models like GPT-3.5 and GPT-4, where it found 87 out of 187 and 143 out of 258 code changes to unexpectedly alter behaviour. This comprehensive testing provides strong evidence for ChangeGuard's reliability and robustness.You can learn more by reading the entire paper and accessing ChangeGuard.🧠 Expert insight💥Here’s an excerpt from “Chapter 6: Language Translation Desktop App with the ChatGPT API and Microsoft Word” in the book, Building AI Applications with OpenAI APIs - Second Edition by Martin Yanev, published in October 2024.Integrating the ChatGPT API with Microsoft OfficeIn this section, we will explore how to set up our project and install thedocxPython library to extract text fromWorddocuments. Thedocx library is a Python package that allows us to read and writeMicrosoft Word (.docx) files and provides a convenient interface to access information stored inthese files.The first step is to initiate your work by creating a new directory calledTranslation Appand loading it with VSCode. This will enable you to have a dedicated area to craft and systematize your translation app code. Activate your virtual environment from the terminal window following the steps outlined inChapter 1,Getting Started with the ChatGPT API forNLP Tasks.To run the language translation desktop app, you will need to install thefollowing libraries:openai: Theopenailibrary allows you to interact with the OpenAI API and perform variousNLP tasksdocx: Thedocxlibrary allows you to read and write Microsoft Word.docxfilesusing Pythontkinter: Thetkinterlibrary is a built-in Python library that allows you to createGraphical User Interfaces(GUIs) for yourdesktop appAstkinteris a built-in library, there is no need for installation since it already exists within your Python environment. To install theopenaianddocxlibraries, access the VSCode terminal, and then execute thefollowing commands:pip install openaipip install python-docxTo access and read the contents of a Word document, you will need to create a sample Word file inside your project. Here are the steps to create a newWord file:In your project, right-click on the project directory, selectNew Folder, and nameitfiles.Right-click on thefilesfolder and selectNew File.In the edit field that appears, enter a filename with the.docxextension – forexample,info.docx.Press theEnterkey to createthe file.Once the file is created, open it usingMicrosoft Word.You can now add some text or content to this file, which we will later access and read using thedocxlibrary in Python. For this example, we have created an article about New York City. You can find the complete article here:https://en.wikipedia.org/wiki/New_York_City. However, you can choose any Word document containing text that you wantto analyze:The United States’ most populous city, often referred to as New York City or NYC, is New York. In 2020, its population reached 8,804,190 people across 300.46 square miles, making it the most densely populated major city in the country and over two times more populous than the nation’s second-largest city, Los Angeles. The city’s population also exceeds that of 38 individual U.S. states. Situated at the southern end of New York State, New York City serves as the Northeast megalopolis and New York metropolitan area’s geographic and demographic center - the largest metropolitan area in the country by both urban area and population. Over 58 million people also live within 250 miles of the city. A significant influencer on commerce, health care and life sciences, research, technology, education, politics, tourism, dining, art, fashion, and sports, New York City is a global cultural, financial, entertainment, and media hub. It houses the headquarters of the United Nations, making it a significant center for international diplomacy, and is often referred to as theworld’s capital.Now that you have created the Word file inside your project, you can move on to the next step, which is to create a new Python file calledapp.pyinside theTranslation Approot directory. This file will contain the code to read and manipulate the contents of the Word file using thedocxlibrary. With the Word file and the Python file in place, you are ready to start writing the code to extract data from the document and use it inyour application.To test whether we can read Word files with thedocx-pythonlibrary, we can implement the following code in ourapp.pyfile:import docxdoc = docx.Document("<full_path_to_docx_file>")text = ""for para in doc.paragraphs: text += para.textprint(text)Make sure to replace<full_path_to_docx_file>with the actual path to your Word document file. Obtaining the file path is a simple task, achieved by right-clicking on your.docxfile in VSCode and selecting theCopy Relative Pathoption from thedrop-down menu.Once you have done that, run theapp.pyfile and verify the output. This code will read the contents of your Word document and print them to the console. If the text extraction works correctly, you should see the text of your document printed in the console (seeFigure 6.1). Thetextvariable now holds the data frominfo.docxas aPython string.Figure 6.1 – Word text extraction console outputPackt library subscribers can continue reading the entire book for free. You can buy Building AI Applications with OpenAI APIs - Second Edition,here.Get the eBook for $31.99 $21.99!Get the Print Book for $39.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
6684

Divya Anne Selvaraj

15 Oct 2024

PythonPro #51: Python 3.13 REPL Enhancements, Python 3.12 vs. 3.13, and Visualizing Named Entities in Text

Divya Anne Selvaraj

15 Oct 2024

Bite-sized actionable content, practical tutorials, and resources for Python programmers.#51Notion for StartupsThousands of startups use Notion as a connected workspace to create and share docs, take notes, manage projects, and organize knowledge—all in one place.We’re offering 6 months of new Plus plans, including unlimited Notion AI so you can try it all for free!To redeem the Notion for Startups offer:1. Submit an application using our custom link: https://ntn.so/packt and select Packt on the partner list.2. Include our partner key: STARTUP4110P19151Get your Free 6-month Notion Plus Acceess!Hi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Python Natural Language Processing Cookbook - Second Edition, which explains how to use the displaCy library from spacy to visualize named entities in text.News Highlights: PEP 762 in Python 3.13 adds multi-line editing, syntax highlighting, and custom commands to the REPL, and Pyinstrument 5 introduces a flamegraph timeline view for better code execution visualization.Here are my top 5 picks from our learning resources today:Python 3.12 vs Python 3.13 – performance testing⚡️Exploring Infrastructure as Code (IaC) with Python: AWS CDK, Terraform CDK, and Pulumi🏗️lintsampler : a new way to quickly get random samples from any distribution🎲Python and SysV shared memory🧠Gradient-Boosting anything (alert: high performance)🚀And, today’s Featured Study, presents a method using LLMs to generate precise, transparent code transformations, improving accuracy and efficiency for compiler optimizations and legacy refactoring.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month's survey is still live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Your cloud deserves dedicated data protection94% of cloud tenants were targeted last year, and 62% were successfully compromised.The hard truth is that organizations are having a hard time securing their cloud data—and cyberattackers are ready to exploit that challenge.Here’s a handy resource you’ll want with you as you map out your plan: Orchestrating the Symphony of Cloud Data Security.You’ll learn how to: Overcome the challenges of securing data in the cloud, Navigate multi cloud data security, and Balance data security with cloud economicsDownload Your Complimentary Copy NowSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPEP 762 – REPL-acing the default REPL: As of Python 3.13, the default REPL has been replaced with a Python-based version (PEP 762), offering modern features like multi-line editing, syntax highlighting, and custom commands.Pyinstrument 5 - Flamegraphs for Python: The new version of the Python statistical profiler introduces a new flamegraph-style timeline view for visualizing code execution, improves on previous timeline modes, and more.💼Case Studies and Experiments🔬Moving all our Python code to a monorepo: pytendi: Describes the migration of Attendi’s Python codebase into a monorepo using the Polylith architecture to improve code discoverability, reusability, and developer experience.How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries: Aims to help you recognize when proficient coding might hinder future maintenance efforts.📊AnalysisIn the Making of Python Fitter and Faster: Provides insights into how Python's evolving interpreter architecture enhances execution speed, memory efficiency, and overall performance for modern applications.Python 3.12 vs Python 3.13 – performance testing: Tests on AMD Ryzen 7000 and Intel 13th-gen processors show Python 3.13 generally performs faster, especially in asynchronous tasks, but there are slowdowns in certain areas.🎓Tutorials and Guides🤓Build a Contact Book App With Python, Textual, and SQLite: Covers creating the app’s text-based interface (TUI), setting up a SQLite database for contact storage, and integrating both elements.Syntactic Sugar: Why Python Is Sweet and Pythonic: Covers various Pythonic constructs like operators, assignment expressions, loops, comprehensions, and decorators, and shows how they simplify code.The Ultimate Guide to Error Handling in Python: Provides a comprehensive guide to Python error handling, exploring common patterns like "Look Before You Leap" (LBYL) and "Easier to Ask Forgiveness than Permission" (EAFP).Exploring Infrastructure as Code (IaC) with Python: AWS CDK, Terraform CDK, and Pulumi: Explains how Python integrates with IaC tools to automate cloud infrastructure management.Web scraping of a dynamic website using Python with HTTP Client: Walks you through analyzing sites with JavaScript-rendered content and using the Crawlee framework to extract data in JSON format.lintsampler : a new way to quickly get random samples from any distribution: Introduces a Python package designed to easily and efficiently generate random samples from any probability distribution.Mastering Probability with Python: A Step-by-Step Guide with Simulations:Through examples like coin tosses, dice rolls, and event probabilities, this tutorial guides you on how to simulate and analyze real-world scenarios.🔑Best Practices and Advice🔏What's In A List—Yes, But What's *Really* In A List: Explains common pitfalls when multiplying lists and why it matters when working with mutable versus immutable data types.Yes, you need to duplicate your frontend business logic on the server: Explains why backend validation is essential to protect data integrity, regardless of frontend sophistication.Python and SysV shared memory: Explains how to wrap C functions like shmget, shmat, and shmctl for shared memory management, handling void pointers, and performing basic operations like writing to shared memory.Gradient-Boosting anything (alert: high performance): Explores using Gradient Boosting with various machine learning models, adapting LSBoost in the Python package mlsauce for both regression and classification tasks.Code Generation with ChatGPT o1-preview as a Story of Human-AI Collaboration: Through experiments in Python and C++, the author demonstrates that human-AI collaboration improves code generation, specifically in building sentiment analysis tools.🔍Featured Study: Don't Transform the Code, Code the Transforms💥In "Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs," researchers from Meta, Cummins et al., introduce a novel method called Code the Transforms (CTT), which leverages LLMs to generate precise code transformations rather than directly rewriting code.ContextCode transformation refers to rewriting or optimising existing code, a task essential for compiler optimisations, legacy code refactoring, or performance improvements. Traditional rule-based approaches to code transformations are difficult to implement and maintain. LLMs offer the potential to automate this process, but direct code rewriting by LLMs lacks precision and is challenging to debug. This study introduces the CTT method, where LLMs generate the transformation logic, making the process more transparent and adaptable.Key Featured of the CTT MethodChain-of-thought process: The method synthesises code transformations by iterating through input/output examples to create a precise transformation logic rather than rewriting code directly.Improved transparency and adaptability: The generated transformations are explicit, making them easier to inspect, debug, and modify when necessary.Higher precision: The method achieved perfect precision in 7 out of 16 Python code transformations, significantly outperforming traditional direct rewriting approaches.Reduced computational costs: By generating transformation logic instead of rewriting code, the method requires less compute and review effort compared to direct LLM rewriting.Iterative feedback loop: The method incorporates execution and feedback to ensure the generated transformations work as expected, leading to more reliable outcomes.What This Means for YouThis study is particularly beneficial for software engineers, developers, and those working on compiler optimisations or legacy code refactoring. By using this method, teams can reduce the time spent on manual code review and debugging, while improving the precision of code transformations.Examining the DetailsThe study's methodology involved testing 16 different Python code transformations across a variety of tasks, ranging from simple operations like constant folding to more complex transformations such as converting dot products to PyTorch API calls. The CTT method achieved an overall F1 score of 0.97, compared to the 0.75 achieved by the direct rewriting method. The precision of transformations ranged from 93% to 100%, with tasks like dead code elimination and redundant function elimination reaching near-perfect performance. In contrast, the traditional direct LLM rewriting approach showed an average precision of 60%, and was prone to more frequent errors, requiring manual correction.You can learn more by reading the entire paper.🧠 Expert insight💥Here’s an excerpt from “Chapter 7: Visualizing Text Data” in the book, Python Natural Language Processing Cookbook - Second Edition by Zhenya Antić and Saurabh Chakravarty, published in September 2024.VisualizingNERNamed entity recognition, orNER, is a very useful tool for quickly finding people, organizations, locations, and other entities in texts. In order to visualize them better, we can use thedisplacypackage to create compelling andeasy-to-read images.After working through this recipe, you will be able to create visualizations of named entities in a text using different formatting options and save the results ina file.Getting readyThedisplaCylibrary is part of thespacypackage. You need at least version 2.0.12 of thespacypackage fordisplaCyto work. The version in thepoetryenvironment andrequirements.txtfileis 3.6.1.The notebook is locatedathttps://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter07/7.3_ner.ipynb.How to do it...We will usespacyto parse the sentence and then thedisplacyengine to visualize thenamed entities:Import bothspacyanddisplacy:import spacyfrom spacy import displacyRun the languageutilities file:%run -i "../util/lang_utils.ipynb"Define the textto process:text = """iPhone 12: Apple makes jump to 5GApple has confirmed its iPhone 12 handsets will be its first to work on faster 5G networks.The company has also extended the range to include a new "Mini" model that has a smaller 5.4in screen.The US firm bucked a wider industry downturn by increasing its handset sales over the past year.But some experts say the new features give Apple its best opportunity for growth since 2014, when it revamped its line-up with the iPhone 6."5G will bring a new level of performance for downloads and uploads, higher quality video streaming, more responsive gaming,real-time interactivity and so much more," said chief executive Tim Cook.There has also been a cosmetic refresh this time round, with the sides of the devices getting sharper, flatter edges.The higher-end iPhone 12 Pro models also get bigger screens than before and a new sensor to help with low-light photography.However, for the first time none of the devices will be bundled with headphones or a charger."""In this step, we process the text using the small model. This gives us aDocobject. We then modify the object to contain a title. This title will be part of theNER visualization:doc = small_model(text)doc.user_data["title"] = "iPhone 12: Apple makes jump to 5G"Here, we set up color options for the visualization display. We set green for theORG-labeled text and yellow for thePERSON-labeled text. We then set theoptionsvariable, which contains the colors. Finally, we use therendercommand to display the visualization. As arguments, we provide theDocobject and the options we previously defined. We also set thestyleargument to"ent", as we would like to display just entities. We set thejupyterargument toTruein order to display directly inthe notebook:colors = {"ORG": "green", "PERSON":"yellow"}options = {"colors": colors}displacy.render(doc, style='ent', options=options, jupyter=True)The output should look like that inFigure 7.4.Figure 7.4 – Named entities visualizationNow we save the visualization to an HTML file. We first define thepathvariable. Then, we use the samerendercommand, but we set thejupyterargument toFalsethis time and assign the output of the command to thehtmlvariable. We then open the file, write the HTML, and closethe file:path = "../data/ner_vis.html"html = displacy.render(doc, style="ent", options=options, jupyter=False)html_file= open(path, "w", encoding="utf-8")html_file.write(html)html_file.close()This will create an HTML file with theentities visualization.Packt library subscribers can continue reading the entire book for free. You can buy Python Natural Language Processing Cookbook - Second Edition,here.Get the eBook for $35.99 $17.99!Get the Print Book for $44.99 $30.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0
5498