This detailed look at domains and domain modeling defines exactly what a domain does and does not specify, and shows how to create and verify a complete and accurate domain definition.
In his series of DBMS articles on logical data modeling in recent issues (May-September 1995), George Tillman touched on the subject of domains at several points in the development of the logical data model. This article discusses domains and domain modeling in greater detail, describes exactly what the completely defined domain does and does not specify, and shows how the domain definition is created and verified. This article also assesses the contributions that domain modeling can offer to activities throughout the software development life cycle.
In data modeling, a domain is the set of values an attribute can have and the common characteristics of that set of values, such as presentation formats and allowable operations. All too frequently, the logical data model defines the attributes in simple terms of an implementation type (such as long integer). This is inadequate, however, for reasons I explore in this article.
A domain's purpose is to specify the requirements of the attribute, as uncovered during analysis. A well-specified domain fully describes the data stored in each attribute to which the domain is assigned and its presentation to the user. It should remain sufficiently abstract to provide latitude for the designer to select the appropriate implementation-specific data type that represents the data. Several fundamental types of domains are common:
Sometimes implementation considerations cause confusion between enumerated domains and foreign keys. The enumerated domain of state abbreviations could be implemented using a lookup table, where attributes containing state abbreviations are foreign keys into the lookup table. However, the choice of verification using a lookup table or explicit code is a design decision. For instance, the use of the lookup table is not always the most natural implementation; some application development tools make it easier to verify values in an enumerated set through other means. The designer may also determine that the set of American states is not likely to change over the life of the application. The designer may then want to hardcode the list of allowable values in a subroutine or as part of the definition of a GUI control such as a drop-down list box, rather than go to the database to look them up. In any event, the choice of verification method rightly belongs to design, not analysis.
Domains are defined as part of the attribute identification and definition activities within logical data modeling; therefore, they are part of the analytical work product. The full domain definition is uncovered through standard information gathering techniques used by the analyst to build the data model. It is important that the analyst probe sufficiently to understand the business requirements for the data so that presentation, validation, storage, and constraint issues can be articulated during the domain modeling process.
Each domain definition articulates the type, meaning, and limitations of the data and the represented information. It is often appropriate to restrict the operations that can sensibly be performed on data of a given domain. For example, there are few applications for the multiplication of two different dollar values.
Presentation requirements are rightly part of the domain because they provide a more consistent presentation of data across applications and application components. Where necessary, attributes that are defined in terms of a domain may override the presentation requirements of the domain. In such cases, the data modeler should specify why this is necessary. For example, consider a cost application rate domain. The definition of the domain might specify the allowable values, presentation, and meaning, as in the following prose fragment:
The value must be a non-negative number less than 100,000, presented with three decimal places, and stored with at least three decimal places. The value represents a rate that is a monetary value per hour or per unit. The value can never be unknown, and should default to zero if not supplied. The value depends on the currency_name domain to identify the currency of the numerator, and on the cost_rate_basis domain to indicate the source of the denominator units or hours.
This definition identifies the necessary dependent domains to make sense of a value within the specified domain. The definition would then go on to describe the allowable uses of the value, such as multiplication by an appropriate basis (hours or units) to yield a monetary value.
Note that the storage requirement has a legitimate place here; the precision at which the value is stored affects the accuracy of calculations using it. However, the domain definition does not require that the domain be implemented as a fixed-precision value. The domain definition also does not stipulate that its implementation be a NOT NULL column, although you would reasonably expect the designer to implement it in this manner if the database provides this distinction.
Domains provide a mechanism to prevent the analysis process from being finalized too soon. Use of domains avoids the need to specify attributes in terms of database-specific types that lock in the choice of physical representation for the attribute. The designer is afforded sufficient scope to make most effective use of the facilities of a particular database during physical data modeling. Design choices that are made when translating the abstract domain into a tangible, implementable data type can be examined during the design-review phase of the project.
The abstractness of the domain data type definition not only preserves the latitude of the designer, but also keeps the discussion of requirements independent of the choice of database. This is invaluable in heterogeneous database environments or in situations in which the choice of database is not firm while requirements are being written.
Domain modeling simplifies and improves the process of requirements review by reducing the amount of subject matter under discussion. If the data representation for each attribute is reviewed independently (perhaps by different reviewers who have different interests within the enterprise), there is a likelihood of producing attribute definitions with dissimilar specifications. For example, some departments may specify costs to eight digits left of the decimal point, while others may only specify five digits. Domains help to identify such situations early and produce consistent attribute definitions.
Identification of units of measure within numeric domains gives the designer the opportunity to check calculations using dimensional analysis. For example, division of a monetary value by a monetary value must yield a dimensionless result. A reviewer who may not fully understand a complex statistical or financial calculation can still see the impossibility of dividing a monetary value by a time period.
Domains can also provide an internal consistency check for the logical data model as a whole. Recall that the cost application rate domain requires, among other things, a currency identifier. Any entity having an attribute in the cost application rate do main must be able to identify a currency identifier attribute uniquely through some business rule. The currency identifier attribute need not be in the same entity, and the relationship path from the dependent attribute need not be direct. A rule such as "each subsidiary uses a specific currency, and all cost computations therein will be conducted in this currency," is sufficient if the entities using the cost rate attribute can trace a path to identify the subsidiary in which they occur, in order to find their currencies. If this check fails, an internal inconsistency is present in the data model, which may cause an application to be unable to make sense of its own data.
Domains enable the documentation and requirements writer to convey intent to designers and implementers. Attributes can be expressed in terms of a set of abstract domains, which are described in full in one place. This eliminates the need to describe each attribute in minute detail, and eliminates any ambiguities. In effect, the use of domains normalizes the specification of attributes.
The creation of thorough, abstract domains during logical data modeling positions the development effort well for the design effort. With a complete set of domain requirements, the analyst has communicated to the designer the information necessary to make informed design decisions. Even if the analyst and designer are the same person, the complete domain requirement serves as a ready reference to clarify the impact of design decisions.
Formalization is the design activity wherein abstract requirements are replaced by specific, implementable constructs. The prose description, which is understandable as a business requirement, is replaced by a more concise technical requirement. The domain description is sufficiently detailed to permit easy formalization without unwarranted assumptions on the part of the designer.
Lexicalization is the identification of implementable constructs to attach to each formalized design element. Applied to database design, it is during this activity that the domain is given a specific data type as provided by a specific database. The information provided in the domain definition lets the designer select the implementation data type that best achieves the demands stated in the requirements.
A well-specified domain (that is, a domain with fully developed constraints and allowable operations) facilitates encapsulation by providing sufficient information to implement the data type as a class in an object-oriented environment. Object databases provide the designer with the ability to specify new data types; all of the information identified as pertinent to the domain description is valuable when describing the class. The methods of the class enforce the constraints on the data values and ensure that operations performed on the data are meaningful. Even in a relational database environment, however, the designer can use the domain information to gather related functional activities.
In unit testing, the implementer performs basic tests on a small segment of code. One of the typical problems encountered in unit testing is the lack of understanding of the data by the implementers, resulting in simple unit test scenarios that fail to fully exercise the code unit under test. As a result, defects that should have been uncovered in unit test are not detected until later test phases, with attendant increases in difficulty of isolation, cost to resolve, and risk to the project.
Domains contribute to unit testing by providing better information on allowable data values. In the cases in which the domain includes a large set of possible values, the analyst can help the process by specifying sample data that is known to exist in real life. This reduces the difficulty in devising unit test scenarios, and increases the likelihood that more thorough scenarios will be created.
A particularly important form of testing is commonly known as boundary testing; that is, the use of data values at the boundaries of the domain. Data for boundary tests are drawn from both inside (permissible) and outside (impermissible) of the domain boundary. Without a thorough understanding of the domains involved, these tests are impossible to construct effectively for any but the simplest data model.
Even later phases, such as integration testing, can benefit from domains. The richness of the domain definition provides the test effort with the benefit of the analysts' understanding of the data. This reduces the research required for integration test case development, and assists with problem localization during test runs. Integration testers can use the database as a division point in testing, comparing database data to expected values to determine where, in a sequence of events, processing went astray. Testers can often utilize the query and update tools of the database vendor (or third-party tools) as integration test tools to prepare database values for subsystem tests and to examine these values during the postmortem.
Whatever the technology, software development seeks to satisfy user requirements fully and completely. Domain modeling, by clarifying the data component of the software requirements, assists in achieving this goal at several levels in the software development process.