





















































Hi ,
Happy New Year! We’re back! Did you try your hand at any exciting Python projects over the holidays that you’d like to share? If so, reply to this email and let me know. If it’s brilliant, we’ll share what you made in next week's issue.
In today’sExpert Insight we bring you an excerpt from the recently published book, XGBoost for Regression Predictive Modeling and Time Series Analysis, which demonstrates the power of XGBoost's multithreaded capabilities, showcasing how adjusting the nthread
parameter can significantly accelerate model training by utilizing multiple CPU cores, as illustrated through a practical example with the California housing dataset.
News Highlights: Python 3.14.0 alpha 4 introduces features like PEP 649 for deferred annotations and improved error messages; Python wins Tiobe's Programming Language of the Year 2024 with a 9.3% popularity surge; and a new PEP proposes SBOMs for better package security and dependency tracking.
My top 5 picks from today’s learning resources:
And, in From the Cutting Edge, we introduce sQUlearn, a Python library for quantum machine learning that integrates seamlessly with classical tools like scikit-learn, offering high-level APIs, low-level customisation, and robust support for NISQ devices.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
{}
for blocks.lambda_handler
, and setting up CI/CD with GitHub Actions to automate deployments.itertools.batched()
), third-party packages like more_itertools
and NumPy, and custom implementations.In the paper, "sQUlearn – A Python Library for Quantum Machine Learning," Kreplin et al. introduce sQUlearn, a Python library for quantum machine learning (QML), designed to integrate seamlessly with classical machine learning tools like scikit-learn.
Quantum Machine Learning (QML) combines quantum computing and machine learning to harness quantum principles for computational efficiency and enhanced algorithmic capabilities. However, many current QML tools demand in-depth quantum computing expertise. Noisy Intermediate-Scale Quantum (NISQ) devices, while promising, pose significant challenges due to their limitations in handling deep quantum circuits. To bridge these gaps, sQUlearn focuses on NISQ-compatibility, usability, and integration with classical ML tools, particularly scikit-learn.
sQUlearn offers:
sQUlearn simplifies quantum machine learning for both researchers and practitioners. For researchers, it offers a flexible low-level framework for exploring novel QML algorithms and quantum circuit designs. For practitioners it simplifies the deployment of QML solutions with minimal quantum-specific knowledge via high-level interfaces and pre-built models using familiar tools like scikit-learn.
sQUlearn’s dual-layer architecture enables flexibility, with high-level APIs for seamless integration into machine learning workflows and low-level tools for advanced customisation. The Executor
module centralises quantum job execution, handling retries, caching results, and transitioning between simulation and real hardware. It supports quantum kernel methods and neural networks while addressing noise challenges on quantum devices through built-in regularisation techniques. This focus on automation and robustness ensures the library is both reliable for practical applications and adaptable for research needs.
You can learn more by reading the entire paper or accessing the library on GitHub.
Here’s an excerpt from “Chapter 13: Deploying Your XGBoost Model” in the book, XGBoost for Regression Predictive Modeling and Time Series Analysis by Partha Pritam Deka and Joyce Weiner.
XGBoost has built-in support for multithreaded computing, which allows you to speed up model training by utilizing multiple CPU cores. You can control this by setting thenthread
parameter, which determines the number of threads to use. By default,
XGBoost will automatically use the maximum number of available threads.
It’s important to note that if you’re using Dask, any value you set fornthread
within XGBoost will take precedence over Dask’s default configuration. The following example demonstrates how the multithreading parameter works. We’ll revisit theCalifornia housing datasetthat you worked with inChapter 4:
multithreaded.py
.sklearn
). You’ll also be usingpandas
,numpy
, a module calledtime
to track how long code execution takes, and, ofcourse,xgboost
:import pandas as pd
import numpy as np
import time
import xgboost as xgb
from sklearn.metrics import r2_score
from sklearn import datasets
from sklearn.model_selection import train_test_split
housingX, housingy = datasets.fetch_california_housing(
return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
housingX,housingy, test_size=0.2, random_state=17)
numpy
orpandas
form intoDmatrix
form by using theDMatrix
function and passing in the data and the labels. In this case, we’ll be usingdtrain = xgb.DMatrix(X_train, y_train)
for the training dataset; do the same for thetest dataset:dtrain = xgb.DMatrix(X_train, y_train)
dtest = xgb.DMatrix(X_test, y_test)
Now, the data is in a format that XGBoost can manipulate with efficiency. As mentioned inChapter 3, XGBoost does some sorting and performs other operations on the dataset to speedup execution.
time
module to get the computation time and print it out so that you can compare the results. First, save the start time with the following lineof code:train_start = time.time()
eta = 0.3
(the learning rate),booster = gbtree
, andnthread =
2
:param = {"eta": 0.3, "booster": "gbtree",
"nthread": 2}
housevalue_xgb = xgb.train(param, dtrain)
train_end = time.time()
print
statement while subtractingtrain_start
fromtrain_end
and converting it into milliseconds by multiplyingby 103:print ("Training time with 2 threads is :{
0:.3f}".format((train_end - train_start) * 10**3),
"ms")
nthread
. Since our computer has eight logical processors, I’vechosen8
:train_start = time.time()
param = {"eta": 0.3, "booster": "gbtree",
"nthread": 8}
housevalue_xgb = xgb.train(param, dtrain)
train_end = time.time()
print ("Training time with 8 threads is :{
0:.3f}".format((train_end - train_start) * 10**3),
"ms")
predict
method on your model and pass thetest dataset:pred_start = time.time()
ypred = housevalue_xgb.predict(dtest)
pred_end = time.time()
print ("Prediction time is :{0:.3f}".format((
pred_end - pred_start) * 10**3), "ms")
xgb_r2 = r2_score(y_true=y_test, y_pred= ypred)
print ("XGBoost Rsquared is {0:.2f}".format(xgb_r2))
Training time with 2 threads is :237.088 ms
Training time with 8 threads is :130.723 ms
Prediction time is :2.012 ms XGBoost
Rsquared is 0.76
On our computer, going from two to eight threads sped up training by over 44%. This demonstrates the benefit XGBoost provides with multithreading. Recall that by default, it will use the maximum number of threads available. Next, you’ll learn about using XGBoost with distributed compute by using Daskon Linux.
XGBoost for Regression Predictive Modeling and Time Series Analysis was published in December 2024.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.
If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!