The heart of a web application – the database
A typical web application can be seen as an interface for data. Whether it’s a blogging platform, an e-commerce service, or a project-management tool, most user interactions are coupled with reading or storing some information. Of course, there are data-less web applications – for example, proxy services – but you’re unlikely to choose Ruby on Rails to implement them.
Data is likely to be the most valuable part of your product or service. Just imagine you accidentally dropped your production database and all the backups – could you carry on? The database is also usually the most heavily loaded component of your application. The overall performance of your application depends on how you use the database and keep it in a healthy state.
Thus, while designing our application, we should keep in mind possible performance degradations related to the database.
The trade-off between abstractions and database performance
One of the main purposes of abstractions is to hide away the implementation details. In theory, a user should not know what’s happening under the hood of a certain API method. Consider the following example:
class User def self.create(name:) DB.exec "INSERT INTO users (name) values (%)", name end end names = %w[lacey josh] names.each { User.create(name: _1) }
The User
class is our abstraction to work with a database. We added a convenient interface to insert new records into a database table, which is assumed to be used throughout the application.
However, this abstraction could be over-used, thus introducing additional load to our database – whenever we want to create N users, we execute N queries. If we didn’t use the abstraction, we would write a single "INSERT INTO…"
statement – a much more performant way of achieving the same result.
This is just a basic example that demonstrates the following – hiding implementation details is not equal to not taking implementation specifics and limitations into account. Abstractions and APIs should be designed so as to make it harder to shoot yourself in the foot when using them.
One common technique, which leads to non-optimal database interactions, is using domain-specific languages (DSLs) to define query-building rules. DSLs are powerful tools, but with great power comes great responsibility.
Let’s look at a more realistic example using the CanCanCan (https://github.com/CanCanCommunity/cancancan) library. This library allows you to define authorization rules (abilities) using fancy DSL. The DSL defines a ruleset, which could be used to scope database records. Here is an example:
can :read, Article do |article| article.published_at <= Time.now end
The rule states that only the already published articles can be accessed by users. This rule is used every time we call Article.all.accessible_by(user)
(for example, when we want to show a user a list of articles on a home page). How do you think the scoping would be handled in this case?
If we wrote #accessible_by
by hand, we would probably perform a single query to return the desired records – "SELECT * FROM articles WHERE published_at < now()"
. What will our library do? It will fetch and then filter all the records using the rule
block.
The result is the same, but it would require much more system resources (memory to load a lot of records and additional CPU cycles to run the block many times). Luckily, CanCanCan allows you to add a hint on how to transform the block into a query condition:
can :read, Article, "published_at < now()" do |article| article.published_at <= Time.now end
This is an example of a leaky abstraction, an abstraction that exposes implementation details to its users. In the preceding snippet, our DSL-based configuration file contains the parts of the underlying database query. In this case, this is a necessary evil. And it can also be seen as an indicator that we chose the wrong level of abstraction to solve the problem, and now we have to patch it.
When designing abstractions, we should think of potential performance implications beforehand to avoid leaky abstractions in the future.
Database-level abstractions
Abstractions need not be defined in the application code only; we can also benefit from using abstractions in the database.
The main motivation for considering this approach could be the application performance. Another possible reason is consistency – the database is the primary source of truth, and databases (relational) are usually good at enforcing consistency; thus, moving some logic to the database layer can minimize the risk of data becoming inconsistent.
Even though you can move all your business logic into a database by defining custom functions and procedures, that’s not the way web (and especially Rails) applications are built. It could be an ideal way if the only thing we cared about was performance, but we chose web frameworks for productivity.
Nevertheless, some functionality can be implemented at the database level and bring us performance and productivity benefits. Let’s consider particular examples.
One common task that can be handled at the database layer is keeping track of record changes (for audit purposes). We can implement this in our Ruby code by adding hooks everywhere we create, update, or delete records, or go the Rails way and define model-level callbacks (as PaperTrail (https://github.com/paper-trail-gem/paper_trail) does).
Alternatively, we can leverage database features, such as triggers, and make our database responsible for creating audit records (as Logidze does). The latter approach has the benefits of being more performant and reducing the code base complexity. It is worth considering when audit records are not first-class citizens of your business logic (that is, not involved in other processes beyond auditing).
What a gem – logidze
Logidze (https://github.com/palkan/logidze) is a combination of a database extension (via PostgreSQL functions) and a Ruby API to track individual record changes incrementally. It can be used as a general auditing tool and a time-travel machine (to quickly access older versions of a record).
Another potential use case for giving the database a bit more responsibility is soft deletion. Soft deletion is an approach where a record is marked as deleted (and made invisible for users) instead of removing it from the database whenever a logical delete operation should occur. This technique can be used to provide undo/restore functionality or for auditing purposes.
Besides performance considerations, we may want to add database abstractions for the sake of consistency. For example, in PostgreSQL, we can create domain types and composite types. Unlike general constraints, custom types are reusable and carry additional semantics. You can use the pg_trunk
(https://github.com/nepalez/pg_trunk) gem to manage custom types from Rails (as well as other PostgreSQL-specific features).
In general, enhancing database logic with custom abstractions is viable if the purpose of the abstraction is to act as data middleware – that is, treat data in isolation from the application business logic. Technically, such isolation means that abstraction should be set up once and never changed. I use the term middleware here to underline the conceptual similarity with Rack middlewares.