Working with databases in Julia
In this section, we will explain ways to handle data stored in databases: MySQL and PostgreSQL.
Getting ready
MySQL is an open source relational database. To be able to interact with your MySQL databases from Julia, the database server (along with the relevant Julia package) needs to be installed. Assuming that the database is already set up and the MySQL session is already up and running, install the MySQL bindings for Julia by directly cloning the repository:
Pkg.clone("https://github.com/JuliaComputing/MySQL.jl")
PostgreSQL is an open source object relational database. Similar to the MySQL setup, the server of the PostgreSQL database should be up and running with a session. Now, install the PostgreSQL bindings for Julia by following the given instructions:
- Install the
DBI
package. TheDBI
package is a database-independent API that complies with almost all database drivers. - The
DBI
package from Julia can be installed by directly cloning it from its repository using the following statement:Pkg.clone("https://github.com/JuliaDB/DBI.jl")
- Then, install the
PostgreSQL
library by directly cloning the library's repository using the following statement:Pkg.clone("https://github.com/JuliaDB/PostgreSQL.jl")
- SQLite is a light, server-less, self-contained, transactional SQL database engine. To interact with data in SQLite databases, one has to first install the SQLite server and make sure that it is up and running and displaying a prompt like this:
- Now, the SQLite bindings for Julia can be installed through the following steps:
- Add the SQLite
Julia
package by running the following standard package installation command:Pkg.add("SQLite")
- Add the SQLite
How to do it...
Here, you will learn about connecting to databases and executing queries to manipulate and analyze data. You will also learn about the various protocols and libraries in Julia that will help you interact with databases.
MySQL
A MySQL database can be connected by a simple command that takes in the host, username, password, and database name as parameters. Let's take a look at the following steps:
- First, import the
MySQL
package:using MySQL
- Set up the connection to a MySQL database by including all the required parameters to establish a connection:
conn = mysql_connect(host, user_name, password, dbname)
- Now, let's write and run a basic table creation query:
- Assign the query statement to a variable.
query = """ CREATE TABLE Student ( ID INT NOT NULL AUTO_INCREMENT, Name VARCHAR(255), Attendance FLOAT, JoinDate DATE, Enrolments INT, PRIMARY KEY (ID) );"""
- Now to make sure that the query is successfully created, we can get back the response from the connection.
response = mysql_query(conn, query)
- Check for a successful connection through conditional statements:
if (response == 0) println("Connection successful. Table created") else println("Connection failed. Table not created.") end
- Assign the query statement to a variable.
- Queries on the database can be executed by the
execute_query()
command, which takes the connection variable and the query as parameters. A sampleSELECT
query can be executed through the following steps:query = """SELECT * FROM Student;""" data = execute_query(conn, query)
- To get the query results in the form of a Julia array, an extra parameter called
opformat
should be specified:data_array = execute_query(conn, query, opformat = MYSQL_ARRAY)
- Finally, to execute multiple queries at once, use the
mysql_execute_multi_query()
command:query = """INSERT INTO Student (Name) VALUES (''); UPDATE Student SET JoinDate = '08-07-15' WHERE LENGTH(Name) > 5;""" rows = mysql_execute_multi_query(conn, query) println("Rows updated by the query: $rows")
PostgreSQL
Data handling within a PostgreSQL database can be done by connecting to the database. Firstly, make sure that the database server is up and running. Now, the data in the database can be handled through the following procedure:
- Firstly, import the requisite packages, which are the DBI and the PostgreSQL databases, using the
import
statements:using DBI using PostgreSQL
- In addition, the required packages for the
PostgreSQL
library are as follows:DataFrames.jl
: This has already been installed previously.DataArrays.jl
: This can be installed by running the statementPkg.add("DataArrays"))
.
- Make a connection to a PostgreSQL database of your choice. It is done through the connect function, which takes in the type of database, the username, the password, the port number, and the database name as input parameters. So, the connection can be established using the following statement:
conn = connect(Postgres, "localhost", "password", "testdb", 5432)
- If the connection is successful, a message similar to this appears on the screen:
PostgreSQL.PostgresDatabaseHandle(Ptr{Void} @0x00007 fa8a559f160,0x00000000,false)
- Now, prepare the query and tag it to the connection we prepared in the previous step. This can be done using the prepare function, which takes the connection and the query as parameters. So, the execution statement looks something like this:
query = prepare(conn, "SELECT 1::int, 2.0::double precision, 'name'::character varying, " * "'name'::character(20);"))
- As the query is prepared, let's now execute it, just like we did for MySQL. To do this, we have to enter the query variable, which we created in the previous step, into the execute function. It is done as follows:
result = execute(query)
- Now that the query execution is over, the connection can be disconnected using the finish and disconnect functions, which take the query and the connection variables as the input parameters, respectively. The statements can be executed as follows:
finish(query) disconnect(conn)
- Now, the results of the query are in the result variable, which can be used for analytics by either moulding it into a dataframe or any other data structure of your choice. The same method can be used for all operations on PostgreSQL databases, which include addition, updating, and deleting.
- This resource would help you better understand the Database-Independent API (DBI), which we use to connect local PostgreSQL databases such as SQLite.
- Import the
SQLite
package into the current session and ensure that the SQLite server is up and running. The package can be imported by running the following command:using SQLite
- Now, a connection to any database can be made through the
SQLiteDB()
function in Julia Version 3 and theSQLite.DB()
function in Julia Version 4. - The connection can be made in Julia version 4 as follows:
db = SQLite.DB("dbname.sqlite")
- The connection can be made in Julia version 3 as follows:
db = SQLiteDB("dbname.sqlite")
- Now, as the connection is made, queries can be executed using the
query()
function in Version 3 and theSQLite.query()
function in Version 4.- In Version 3:
query(db, "A SQL query")
- In Version 4:
SQLite.query(db, "A SQL query")
- In Version 3:
The SQLite.jl
package also allows the user to use macros and registers for manipulating and using data. However, the concepts are beyond the scope of this chapter.
So, these are some of the ways through which data can be handled in Julia. There are a lot of databases whose connectors directly connect to DBI, such as SQlite, MySQL, and so on, and through which queries and their execution can be carried out, as shown in the PostgreSQL section. Similarly, data can be scraped from the Internet and used for analytics, which can be achieved through a combination of Julia libraries, but that is beyond the scope of this book.
There's more...
MySQL
The following resource helps you learn more about its advanced features and provides information about the MySQL.jl
library of Julia. This includes performance benchmarks and details, as well as information on CRUD and testing:
PostgreSQL
Visit https://github.com/JuliaDB/DBI.jl to understand better the DBI we use to connect local PostgreSQL databases:
Visit
https://github.com/JuliaDB/DBI.jl
for extended and in-depth documentation on the PostgreSQL.jl
library, which includes dealing with Amazon web services, and so on.
SQLite
Now, as you have learned the ways in which data can be extracted, manipulated, and worked on from various external sources, there are some more interesting things that the database drivers of Julia can do apart from just executing queries. You can find those at https://github.com/JuliaDB/SQLite.jl/blob/master/OLD_README.md#custom-scalar-functions .