Keeping data lean
Intractable volumes of data, as we have already seen, are one of the causes of data gravity. The sheer volume of data in a system can impede its evolution because the time and effort involved to reshape data is a powerful deterrent. In the Embracing data life cycle section, we discussed how defining boundaries between the data throughout the phases of the data's life cycle makes a big improvement as we move these groups of data into separate, leaner databases. In the Turning the database inside out section, we saw that a large portion of a database's size is attributable to derived data, such as indices and materialized views. Moving this derived data into the datastores of the services that use it makes the source datastores even more lean.
But we can do more. Upstream services produce events as they create data, and these events become the source of truth in the systemwide transaction log. This frees services to pick and choose the data they need to...