Keeping data lean
Intractable volumes of data, as we have already seen, are one of the causes of data gravity. The sheer volume of data in a system can impede its evolution because the time and effort involved in reshaping data is a powerful deterrent. In the Embracing data life cycle section, we discussed how defining boundaries between the data throughout the phases of the data’s life cycle makes a big improvement as we move these groups of data into separate, leaner databases.
In the Turning the database inside out section, we saw that a large portion of a database’s size is attributable to derived (that is, duplicate) data, such as indices and materialized views. Moving this derived data into the datastores of the services that use it makes the source datastores even more lean.
But we can do more. Upstream services produce events as they create data, and these events become the source of truth in the system-wide transaction log. This frees services to pick...