Creating the logical data schema
A big part of the specification of systems is specifying the inputs and outputs of the system as well as what information a system must retain and manage. The inputs and outputs are data or flows and may be direct flows or may be carried via service requests or responses. Early in the systems engineering process, the information captured about these elements is logical. The definition of a logical schema is provided here, along with a set of related definitions.
The definitions are as follows:
- Data Schema: A data or type model of a specific problem domain that includes blocks, value properties, value types, dimensions, units, their relations, and other relevant aspects collectively known as metadata. This model includes a type model consisting of the set of value types, units, and dimensions, and a usage model showing the blocks and value properties that use the type model.
- Logical Schema: A data schema expressed independently from its ultimate implementation, storage, or transmission means.
- Value Property: A property model element that can hold values. Also known as a variable.
- Value Type: Specify value sets applied to value properties, message arguments, or other parameters that may carry values. Examples include integer (
int
in C++ action language), real (double
in C++), Boolean (bool
in C++), character (char
in C++) and String (oftenchar*
in C++). These base types may have additional properties or constraints, specified as metadata. - Metadata: Literally data about data, this term refers to ancillary properties or constraints on data, including the following:
a. Extent – The set of values of an underlying base value type that are allowed. This can be specified as follows:
- A subrange, as in 0 … 10
- A low value and high value pair, as in low value =-1, high value = 1
- An enumerated list of acceptable values
- A specification of prohibited values that are excluded from the base type
- The specification of a rule or constraint from which valid values can be determined
b. Precision – The degree exactness of specified values; this is often denoted as number of significant digits.
c. Accuracy – The degree of conformance to an actual value, often expressed as ±<value>, as in ± 0.25. Accuracy generally refers to an output or outcome.
d. Fidelity – The degree of exactness of a value. Fidelity is generally applied to an input value.
e. Latency – How long after a value change occurs that the value representation updated.
f. Availability – The percentage of the system life cycle that is actually accessible.
Note
These properties are sometimes not properties of the value type but of the value property specified by that value type. In any case, in SysML, these properties are often expressed in tags and metadata added to describe model elements.
Value types can have kinds of representations in the underlying action language, such as enumeration (
enum
in C++), a language specification (such aschar*
in C++), a structure (struct
in C++), a typedef, or a union. - Dimension: Specifies the kind of value (its dimensionality). Examples include length, weight, pressure, and color. Also known as Quantity Kind in SysML 1.3 and later.
- Unit: Specifies a standard against which values in a dimension may be directly compared. Examples include meters, kilograms, kilopascals, and RGB units. SysML provides a model library of SI Units that are directly available for use in models. However, it is not uncommon to define your own if needed.
Other than schema, SysML directly represents the concepts in its language definition. Note that a value property can be specified in terms of a unit, a dimension, or a value type at the engineer's discretion.
- Recommendation: Each value property should be typed by a unit, unless it is unitless, in which case it should be typed by a defined value type.
Schematically, these definitions are shown in Figure 2.87 in the data schema metamodel:
Note
Although this is called the data schema, it is really an information schema as it applies to elements that are not data per se, such as physical flows. In this book, we will use the common term data schema to apply to flows as well.
Beyond the underlying type model of the schema, described previously, the blocks and their value properties and the relationships between them constitute the remainder of the data schema. These relations are the standard SysML relations: association, aggregation, composition, generalization, and dependency.
A quick example
So, what does a diagram showing a logical data schema look like?
Typically, a data schema is visualized within a block definition diagram, and shows the data elements and relevant properties. Consider an aircraft navigation system that must account for the craft's own position, its velocity, acceleration, jerk, flight plans, attitude, and so on. See Figure 2.88:
You can see in the figure that the Flight Property Set contains Airframe_Position, Airframe_Velocity, Airframe_Acceleration, and so on. These composed blocks contain value properties that detail their value properties; in the case of Airframe_Position, these are altitude, latitude, and longitude. Altitude is expressed in Meters (defined in the Rhapsody SysML type library) while latitude and longitude are defined in terms of the unit Meridian_Degrees, which is not in the SysML model library (and so is defined in the model).
On the left of the diagram, you can see that the Flight Plan contains multiple Flight Property Sets identifying planned waypoints along the commanded flight path. These Flight Property Sets may be actual current information (denoted with the measuredFlightPath role end) or commanded (denoted with the commandedFlightPath role end). The latter forms a list of commanded flight property sets and so stores the set of commanded waypoints. On the left, the diagram shows a superimposed image of the Rhapsody model browser, showing the units and dimensions created to support this data schema.
In the diagram, you see the «qualified» stereotype, which specifies a number of relevant metadata properties of the information, such as accuracy, bit_layout, and precision. Several value properties, along with their values for these metadata tags, are shown in the diagram. We see, for example, that the longitude value property has a range of 0 to 360 Meridian_Degrees, with an accuracy of 10-6 degrees and a representation precision of 10-7 degrees.
Purpose
The purpose of the logical data schema is to understand the information received, stored, and transmitted by a system. In the context of this capture-of-system specification, it is to understand and characterize data and flows that cross the system boundary to conceptually solidify the interfaces a system provides or requires.
Inputs and preconditions
The precondition is that a use case and a set of associated actors have been identified or that structural elements (blocks) have been identified in an architecture or design.
Outputs and postconditions
The output is a set of units, dimensions, types (the type model), and the value properties that they specify, along with the relationships between the value types and blocks that own them (the usage model).
How to do it…
The workflow for this recipe is shown in Figure 2.89:
The Construct Type Model call behavior is shown in Figure 2.90:
Create a collaboration
This task creates the collaboration between elements. This provides the context in which the types may be considered. In the case of system specification, this purpose is served by defining the use case and its related actors, or by the execution context of block stand-ins for those elements. In a design context, it is generally some set of design elements that relate to some larger-scale purpose, such as showing an architectural aspect or realizing a use case.
Define the structure
This step adds blocks and other elements to the collaboration, detailed in the following Identify the block, Add relations, and Identify value properties sections.
Identify the block
These are the basic structural elements of the collaboration, although value properties may be created without an owning block.
Add relations
These relations link the structural elements together, allowing them to send messages to support the necessary interactions.
Identify value properties
This step identifies the data and flow property features of the blocks.
Define the interaction
The interaction consists of a set of message exchanges among elements in the collaboration. This is most often shown as sequence diagrams.
Define the messages
Messages are the primitive elements of interaction. These may be synchronous (such as function calls) or asynchronous (as in asynchronous event receptions). A single interaction typically contains a set of ordered messages.
Add message parameters
Most messages, whether synchronous or asynchronous, carry information in the form of parameters (sometimes called arguments). The types of these data must be specified in the data model.
Construct a type model
Once a datum is identified, it must be typed. This call behavior is detailed in the following steps.
Define the units
Most data relies on units for proper functioning, and too often units are only implied rather than explicitly specified. This step references existing units or creates the underlying unit and then uses it to type the relevant value properties. SysML defines a non-normative extension to include a model library of SI units. Rhapsody, the tool used here, has an incomplete realization of these units, so many common units, such as radians, are missing and must be added if desired. Fortunately, it is easy to do so.
Define the dimensions
Most units reply on a quantity kind (or dimension). For example, the unit meter has the dimension length. Most dimensions have many different units available. Length, for example, can be expressed in units of cm, inches, feet, yards, meters, miles, kilometers, and so on.
Define value types
The underlying value type is expressed in the action language for the model. This might be C, C++, Java, Ada, or any common programming or data language. The Object Management Group (OMG) also defined an abstract action language called ALF (short for Action Language for Foundational UML), which may be used for this purpose. See https://www.omg.org/spec/ALF/About-ALF/ for more information. This book uses C++ as the action language, but there are equally valid alternatives.
Define the relevant value type properties
It is almost always inadequate to just specify the value type from the underlying action language. There are other properties of considerable interest. As described earlier in this section, they include extent, precision, latency, and availability. Other properties of interest may emerge that are domain-specific.
Example
We'll now see an example.
This example will use the Measure Performance Metrics use case. The Model-based threat analysis recipe used this use case to discuss modeling cybersecurity. We will use it to model the logical data schema. For the most part, the data of interest is the performance data itself, although the threat model identified some additional security-relevant data that can be modeled as well.
Create collaboration
The use case diagram in Figure 2.67 provides the context for the data schema, but usually the corresponding IBD of the execution context is used. This diagram is shown in Figure 2.91:
Define the structure
This task is mostly done by defining the execution context, shown in Figure 2.91. In this case, the structure is pretty simple.
Identify the blocks
As a part of defining the structure, we identified the primary functional blocks in the previous figure. But now we need to begin thinking about the data elements as blocks and value types. Figure 2.92 shows a first cut at the likely blocks. Note that we don't need to represent the data schema for the actors because we don't care. We are not designing the actors since they are, by definition, out of our scope of concern:
Add relations
The instances of the core functional blocks are shown in Figure 2.91. The relations of the data elements to the use case block are shown in Figure 2.93. This is the data that the use case block knows (owns) or uses:
Identify the value properties
The blocks provide owners of the actual data of interest, which is held in the value properties. Figure 2.94 shows the blocks populated with value properties relevant to the use case:
Define interactions, define messages, and add message parameters
Another way to find data elements to structure is to look at the messaging; this is particularly relevant for use case and functional analysis since the data on which we focus during this analysis is the data that is sent or received. These three steps – define interactions, define messages, and add message parameters – are all discussed together to save space.
The first interaction we'll look at is for uploading real-time ride metrics during a ride. This is shown in Figure 2.95:
The second interaction is for uploading an entire stored ride to the app. This is in Figure 2.96:
Note that these are just two of many scenarios for the use case, as they do not consider concerns such as dropped messages, reconnecting sessions, and other rainy-day situations. However, this is adequate for our needs.
Construct the type model
Figure 2.94 goes a long way toward the definition of the type model. The blocks define the structured data elements, but at the value property level, there is still work to be done. The underlying value types must be identified, their units and dimensions specified, and constraints placed on their extent and precision.
Define units
It is common for engineers to just reference base types – int, real, and so on – to type value properties, but this can lead to avoidable design errors. This is because value types may not be directly comparable, such as when distanceA and distanceB are both typed as Real but in one case is captured in kilometers and in the other in miles. Further, we cannot reason about the extent of a type (the permitted set of values) unless we also know the units. For this reason, we recommend – and will use here – unit definitions to disambiguate the values we're specifying.
The SI Units model library of the SysML specification is an optional compliance point for the standard. Rhapsody includes some SI units and dimensions but is far from complete. In this model, we will reference those that exist and create those that do not.
Figure 2.94 uses a number of special units for value properties and operation arguments, including the following:
DegreesOfArc
- Radian
- Newton
DateTime
KmPerHour
KmPerHourSquared
- Second
KiloCalorie
- RPM
- Kilometer
ResistanceMode
APP_INTERACTION_TYPE
Two of these (Newton and Second) already exist in the Rhapsody SysML Profile SI Types model library and so may just be referenced. The others must be defined, although two of them – ResistanceMode and APP_INTERACTION_TYPE – will be specified as value types rather than units.
DegreesOfArc is a measure of angular displacement and is used for the cycling incline, while Radian is a unit of angular displacement used for pedal position. RPM is a measure of rotational velocity used for pedaling cadence. DateTime is a measure of when data was measured. Kilometer is a measure of linear distance (length), while KmPerHour is a measure of speed and KmPerHourSquared is a measure of acceleration. KiloCalorie is a measure of energy used to represent the rider's energy output. In our model, we will define all these as units. They will be defined in terms of their dimensions in the next section.
Define dimensions
Dimension is also known as quantity kind and refers to the kind of information held by a unit. For example, kilometer, meter, and mile all have the dimension of distance (or length).
As with the SI units, some of the dimensions are already defined in the Rhapsody SysML SI Types model library (time, length, energy) while others (angular displacement and rotational velocity) are not. We will reference the dimensions already defined and specify in our model the ones that are not.
In keeping with the approach used by the Rhapsody SysML SI Types model library, the dimensions themselves are defined with a typedef
kind to the SysML Real type (which is, in turn, is a typedef
of RhpReal). In models using the C++ action language, this will end up being a double. The advantage of this approach is the independence of the model from the underlying action language.
Figure 2.97 shows the units and dimensions defined for this logical data schema. Dimensions used from the SysML model library are referenced by the units but not otherwise shown on the diagram:
Define value types
Apart from the blocks, units, and dimensions described in the previous sections, there are also a few value types in the model. In this particular case, there are two of interest, both of which are enumerations. Figure 2.98 shows that APP_INTERACTION_TYPE may be either REAL_TIME_INTERACTION, used for loading performance data in real time during a cycling session, or UPLOAD_INTERACTION, used to upload a saved ride to the app:
Another value type, Resistance Mode, can either be ERG_MODE, which in the system maintains a constant power output of the rider regardless of cadence by dynamically adjusting the resistance, and RESISTANCE_MODE, where the power varies as the Rider modifies their cadence, incline, or gearing.
Define relevant value type properties
The last thing we must do is specify relevant value type properties. In the logical data schema, this means specifying the extent and precision of the values. This can be done at the unit/value type level; in this case, the properties apply to all values of that unit or type. These properties can also be applied at the value level, in which case the scope of the specification is limited to the specific values but not to other values of the same unit or type.
The best way to specify these properties is to specify them as SysML tags within a stereotype, apply the stereotype to the relevant model elements, and then elaborate the specific values. To that end, we will create a «tempered» stereotype. This stereotype applies to attributes (value properties), arguments, types, action blocks (actions), object nodes, pins, and types in the SysML metamodel and so can apply to units as well.
The stereotype provides three ways to specify extent. The first is the extent tag, which is a string in which the engineer can specify a range or list of values, such as [0.00
..
0.99
] or 0.1
, 0.2
, 0.4
, 0.8
, 1.0
. Alternatively, for a continuous range, the lowValue and highValue tags, both of type Real, can serve as well; in the previous example, you can set lowValue to 0.0
and highValue to 0.99
. Lastly, you can provide a range or list of prohibited values in the prohibitedValues tag, such as -1, 0
.
The stereotype also provides three means for specifying scale. The scaleOfPrecision tag, of type integer, allows you to define the number of significant digits for the value or type. You can further refine this by specifying scaleOfFidelity to indicate the significant digits when the value is used as an input and scaleOfAccuracy when the value is used as an output.
Another stereotype tag is maxLatencyInSeconds, a Real value that specifies the maximum age of a value. Other metadata can be added to the stereotype as needed for your system specification.
This level of detail of specification of quantities is important for downstream design. Requiring two digits of scale is very different than requiring six and drives the selection of hardware and algorithms. In this example, it makes the most sense to specify the necessary scale at the unit and type level, rather than at the specific value property level for the units that we are defining. Those units are shown in Figure 2.99:
Note
Precision technically refers to the number of significant digits in a number, while scale is the number of significant digits to the right of the decimal point. The number 123.45 has a precision of 5, but a scale of 2. People usually speak of precision while meaning scale.
Lastly, we must specify the extent and scale for the values that are either unitless or use standard predefined units but are constrained within a subrange. Figure 2.100 and Figure 2.101 provide that detail:
Note that the figures show the relevant value properties for the blocks grouped with a rectangle with a dotted border. This rectangle has no semantics and is only used for visual grouping:
And there you have it: a logical data schema for the values and flows specified as a part of the Measure Performance Metrics use case. These, along with data schema from other use cases, will be merged together into the architecture in the architecture design work phase.