Understanding Data Architecture Concepts
Previously, you understood the importance of data, as well as the importance of data architecture for the enterprise. The data architect—whom you, as a Salesforce Architect, need to work closely with or, in smaller projects, act as one—needs to tackle the data architecture in a similar fashion to normal building architecture. And to a good extent, similar principles should be followed in software architecture.
First, it is imperative to understand the business processes and create a conceptual and logical blueprint. Then, you need to know the underlying technology to build a detailed design and implementation. To understand these better, take a closer look at the three stages/levels of data architecture design—the conceptual, logical, and physical levels.
Conceptual-Level Data Architecture Design
The data architect, at the conceptual-level stage, needs to gain a deep understanding of business knowledge, business processes, and operations. This includes knowledge related to data flows, business rules, and the way data is used or expected to be used in the enterprise. This could be extended to areas such as financial, product, marketing, and industry-specific knowledge domains (such as manufacturing, insurance, and so on). The data architect can then build a blueprint by designing data entities that represent each of the enterprise’s business domains.
The architect will also define the various taxonomies and data flow behind each business process. The data blueprint is crucial for building a robust and successful data architecture. On many occasions, this activity is done at the project level rather than the enterprise-wide level. And it is usually included as part of the business analyst’s activities. Typically, the following areas need to be covered:
- The key data entities and elements (accounts, products, orders, and so on)
- Data entity relationships, including details such as data integrity requirements and business rules
- The output data expected by the end customers
- Required security policies
- The data that will be gathered, transformed, and used to generate the output data
- Ownership of the defined data entities and how will they be gathered and distributed
Logical-Level Data Architecture Design
This data architecture design is sometimes also referred to as data modeling. At this stage, the architect will start considering the type of technology being dealt with and define the data formats to use. The logical data model connects business requirements to the technology and database’s structure. The data architect should consider the standards, capabilities, best practices, and limitations of each of the underlying systems or databases. Data flows must be defined clearly at this stage. Typically, the following areas need to be covered:
- Data integrity requirements and naming conventions: Naming conventions should be defined and consistently used for each database involved. The data integrity requirements between the operational data and the reference data should be considered and enforced—especially if the data needs to reside in multiple underlying systems (at the conceptual level, all of these data entities should, ideally, belong to the same conceptual entity).
- Data archiving and retention policies: Data cannot be piled up on top of other data indefinitely. Failing to address data archiving and retention policies at an early stage normally has an impact on the project’s overall cost, poor system performance (particularly during data updates or data queries), and potential data inconsistencies. The data architect should work with stakeholders to define the data archiving and retention strategy based on business operations and legal or compliance requirements.
- Privacy and security information: While the conceptual design is more concerned with defining which data element is considered sensitive, the logical design dives into more details to ensure that confidential information is protected, and that data is visible only to the right audience. The data architect also needs to work on the strategy related to data replication and how to protect the replicated data both during the transition and at rest.
- Data flows and pipelines: At this stage, details about how data flows between the different systems and databases should be clearly articulated. These flows should be consistent with the flows defined at the conceptual level and should reflect details such as the required data transformations while in the pipeline, as well as the frequency of the data ingestions.
Physical-Level Data Architecture Design
This data architecture design is also known as the internal level. This is the lowest level of data design. At this stage, the architect is concerned with the actual technical aspects of the underlying system or database. It can even go as far as defining how the data is stored in the storage devices (such as within the specific folder structure).
As a Salesforce Architect, you are likely going to be involved in the conceptual design and, more likely, expected to be involved in the logical design. Your involvement in the physical design is not going to be as intensive and mostly be in an advisory context.
In Chapter 1, Starting Your Journey as a CTA, you covered the importance of creating a data model diagram during the review board exam. This is likely to be based on the logical mode (you will see several examples throughout this book). In real life, a logical model diagram is also required during the design phase of your project, while a physical model diagram is a must-have during the implementation phase.
You are expected to document (or lead the activity of documenting) the full Salesforce physical data model of your solution. This includes providing a full description of each standard and custom object used, how it is used (even for standard objects, it is not enough to mention that you are using the account object; for example, you must clearly explain how it is used), and the source of data in each integrated field, with a clear description of all involved data flows.
Next, you will understand more about the principles behind designing and properly documenting a data model.