Using additional libraries with your Spark pool
There are so many cases where you need to rely on additional functionality from third-party libraries. Synapse Spark supports the addition of libraries to your Spark pool and will make them available when the pool is instantiated. There are different options available for you to use this functionality.
Using public libraries
In the case of PyPi packages, you would create a file named requirements.txt
and add it to the configuration of your Spark pool. Within this file, you can list all the packages that you want to include upon starting a Spark instance. The format for how you name the packages follows the pip freeze format and will include the package version next to the package name:
packagename==1.2.1
The requirements.txt
file can be uploaded to the Packages section of the Spark pool properties during creation. You can do this later, too, if you need to.
You'll find the location to upload your file in Figure 6.16...