Relational Databases

Goals

Understand fundamental relational model theory.
Learn notation for diagramming relational database schemas.
Discover how to map an object-oriented domain model to the relational model for database persistence.
Understand the purpose of normalization, and become proficient with the most important normal forms.
Be familiar with the most popular relational database management system products.

Concepts

anomaly
atomic
attribute
Boyce-Codd Normal Form (BCNF)
candidate key
cardinality
catalog
class table inheritance
cluster
composite key
concrete table inheritance
constraint
“crow's foot” notation
database
database system
deletion anomaly
domain
domain integrity
domain model
entities
Entity-Relationship (ER) diagram
First Normal Form (1NF)
flat files
foreign key
foreign key constraint
functional dependence (FD)
heading
insertion anomaly
integrity
key
many-to-many
many-to-one
modality
natural key
normal forms
normalization
not-null constraint
object-relational impedance mismatch
one-to-many
one-to-one
opaque
primary key
primary key constraint
query
record
referential integrity
relation
relational database management system (RDBMS)
schema
Second Normal Form (2NF)
single table inheritance
surrogate key
Third Normal Form (3NF)
transitive functional dependency
tuple
unique constraint
update anomaly

Preparation

Download and install pgModeler.
1. Use pgModeler version 0.9.1-alpha1 or later.

Lesson

Persisting data is essential in almost every application. Some applications create simple data that must be remembered, such as settings changes and log information. Other data programs need to query or look up data that is already saved somewhere based upon some selection criteria.

For small data sets, the object serialization techniques you've learned in previous lessons may be sufficient. But as soon as the amount of data grows to a moderate level, serializing individual objects becomes difficult, especial when it comes to managing the relationships among the objects. A database typically stores general data, encodes relationships among the data, and allows queries to find data later.

Relational Databases

Some of the simplest databases are stored in so-called flat files, with rows of text data separated by commas or tabs. In flat-file databases, the consumer application usually must “connect the dots”, as it were, and not only determine what the data means but also how pieces of the data relates. If a library is available for taking care of some of this work of reading and connecting the information so that the application doesn't have to, it is called a database system.

A database system represents entities, which are analogous to objects with unique identifiers. A Vehicle is one example of an entity; each vehicle (entity instance) might be identified by a vehicle identification number (VIN). An entity may have various attributes, such as year of production. Any one Vehicle might be related to its Manufacturer, the particular entity that produced it.

Early database systems conceptually arranged entities into trees of hierarchies. Later systems improved these data models by allowing general graph or “network” data models to indicate more complex relationships among entities. But these databases were still inflexible and were hard to query.

Databases were revolutionized in 1970 when E. F. Codd, a computer scientist at IBM, proposed A relational model of data for large shared data banks. Based upon a rigorous mathematical foundation, Codd's proposal resulted in the development of the relational database management system (RDBMS), which remains the most popular database system in use today. An RDBMS allows entities to be represented in general ways, and then queried in an almost unlimited fashion.

Relations

The fundamental modeling unit of an RDBMS is a relation, a mathematical concept that arises from arranging data into rows and columns. Each relation represents an entity. Each row represents an instance of that entity, with each column representing an attribute of that the entity.

Initial “Item Stock” table from Indirection lesson.

Item Stock
`Item`	`Stock`	`Supplier`
table	`20`	Acme Furniture Company, 123 Some Street
chair	`15`	Acme Furniture Company, 123 Some Street
balloon	`99`	Bozo Toy Company, 321 Other Street
kite	`5`	Bozo Toy Company, 321 Other Street
doll	10	Bozo Toy Company, 321 Other Street

Think back to the very first lesson of this course, which explained the concept of indirection. The store owner in that lesson needed to keep track of the stock of toys and furniture, along with their suppliers. The store owner, perhaps using only a pad of paper and a pencil, may have referred to this as a “list”. As you completed the task for that lesson, you may have thought of the information as being arranged in a “table”. In the relational model, the correct terminology for this arrangement of data is a relation.

But a relation is not just any tabular representation of information. A relation has certain formal pieces that must adhere to certain rules for the relational model to work. You may hear people refer to “tables”, “rows”, and “columns” when discussing relational database management systems, but relations are not general tables. The rows and columns of a relation have special rules; ignoring them will cause problems in your model and provide erroneous results in your queries. In casual conversation you may use the words “tables”, “rows”, and “columns” to simplify communication, but don't forget that there is formal distinction between these items in common use and the formal definition of a relation.

Attributes

Each “column” in a relation represents an attribute of the relation. Each attribute is identified in a special “row” called the relation heading. Unlike general table columns, attributes follow certain rules:

Each attribute has a unique name.
Each attribute has a unique type or domain.
The attributes may appear in any order in the relation. Although all the values under Stock must represent the stock count, it makes no difference whether the Item attribute comes before or after the Stock attribute. Attribute order is not significant.

Thus the Item_Stock relation above has three attributes: Item, Stock, and Supplier. Each will have a type, and they may appear in any order.

Tuples

Each “row” in a relation is referred to as a tuple, and represents an instance of the entity. Unlike general table rows, tuples follow certain rules:

Each tuple has unique values. A relation cannot contain duplicate tuples.
Tuples can appear in any order in the relation. In the above example the tuple containing the item named chair might appear first, last, or anywhere in between. Tuple order is not significant.

The Item_Stock relation above contains five tuples, which may appear in any order.

Sets

The relational database model is based on set theory. This point is essential to understanding programming relational databases. Storing, querying, and updating data in a relational database is based on operations or functions performed on sets of data.

Thus the definition of some relation such as Item_Stock is a set of attributes. The actual relation (the data that Item_Stock contains at any particular time) is a set of tuples. Programming a relational database is essentially indicating what operations to perform on these sets of data.

Pay close attention to the distinction between the definition of a relation and the data it contains. The definition of Item_Stock contains no data, but indicates indicates which attributes it has. An instance of Item_Stock containing data is the actual relation. Relations are immutable; when the data in Item_Stock changes, what really happens is that Item_Stock is being replaced with a different relation. The Item_Stock definition then is like a “type” for the relation. For this reason C. J. Date, an expert and popular writer about the relational model, would refer to the Item_Stock definition as a “RELVAR” (“relation variable”), which will refer to a different relation at different times, depending on the data it holds, but always of the same Item_Stock “type”.

Semantics

Even though the tuples in a relation may represent multiple instances of the entity, the table name should be in the singular. Like a class Vehicle in Java, the table itself in the schema represents the definition of a vehicle.

The relational model is more than just a way to organize data. The “tables”, “rows”, and “columns” mean something. As you might have guessed, each relation represents a type of entity such as a vehicle. The heading indicating what attributes a Vehicle is allowed to have.

Each tuple represents a instance of an entity, as if you created a vehicle using new Vehicle() in Java. The value in each “table cell” specifies a value of an entity's attribute. Seen in this way, each tuple is a set of logical propositions about an entity instance, asserting for example “such and such Vehicle has the color blue” and “such and such vehicle has four wheels”.

Keys

“Such and such” a vague way to identify a vehicle. A common aspect of historical database systems has been the storage of data about an entity in a record, essentially an array of variables potentially of different types. But different systems traditionally used different ways to link the records together. Part of the “secret sauce” of relational databases is that the relational model itself provides a way to identify a tuple, which is analogous to a record in previous database models. The term “record” is still used in casual speech when referring to the information in a tuple.

Any attribute or collection of attributes that uniquely identifies a tuple within a relation is called a key. In a relational database need to select one of these keys (sometimes called candidate keys) to serve as the primary key, or official unique identifier for the entity.

Constraints

A primary key is an example of a constraint, which is a rule restricting the values placed in the database. A primary key constraint, for example, merely tells the database that a particular attribute serves as the official tuple identifier. But if there are other attributes that must have different values for each entity, you can add a unique constraint that prevents duplicates.

The benefit of a constraint is that it is maintained by the database system, without any extra work on the part of the application. If your program attempts to add two tuples with attribute values that violate a unique constraint, for example, the attempt will fail with an error. Constraints thus ensure integrity of the database; any data stored is guaranteed to be bound by all constraints.

Another important validation of data is a not-null constraint, which prevents NULL from appearing in a tuple for some attribute. As in Java programming, NULL in a relational database has problem of semantics: sometimes it indicates “there is no value”, sometimes it means “the value is not known”, and sometimes it stands for “not applicable”. But as you will see in a future lesson, NULL presents additional difficulties when querying a database, as the rules for comparing NULL are not what you might expect. Avoid NULL whenever possible by adding a not-null constraint for each attribute.

Databases

While relations and their subcomponents form the heart of a relational database, an RDBMS organizes relations into larger structures still. Although the terminology varies among database products, the following represents the general arrangement of logical structures:

schema: A schema groups a set of related tables. The word “schema” can also refer to the definition of the relations; see Schema Design below.
catalog: All the available schemas are then grouped into a catalog. In many RDBMS products, a catalog is simply called the “database”. Some products use the word “catalog” to refer to special system schemas holding available types and other metadata.
cluster: Multiple catalogs or databases can in turn be grouped into even larger arrangements called “clusters”. In some products a “cluster” may be synonymous with “database server”, the installation of the product on a particular machine. Historically the word cluster referred to the physical grouping of some database elements into a single file for faster access.

Schema Design

Various authors and teachers use terminology in different ways when discussing schema modeling. Some use “logical design” to refer to the database schema, and use “physical design” to refer to implementation details of how a particular database project stores the information in files. Others describe schema modeling as progressing through three phases:

Conceptual Model: A general idea of the central data concepts and relationships.
Logical Model: An initial, abstract representation of the data as it would be stored in the database, with attributes and keys.
Physical Data Model (Schema): The final, revised database schema, modified to more efficiently use the relational paradigm and to leverage the capabilities of a particular RDMS product.

In this lesson we will use our program's domain model as the initial conceptual model, and use the term “schema design” to refer to the process of converting that domain model to a database schema and then refining that schema to use the relational paradigm appropriately. See What is the difference between logical data model and conceptual data model? for a discussion, and Data Model: Neither Conceptual, Nor Logical, Nor Physical for one author's take.

In colloquial use the word “schema” can also mean a set of rules defining relational structures such as relations, attributes, and keys. Designing the schema is similar to designing the classes and methods of an object-oriented program. As with classes, the design of tables is derived from the domain model that defines the conceptual view of the program. (Here the “domain” of the program's domain model, which refers to the world-view the program works with, is different from the “domain” of a relation attribute, which refers to the attribute's value type.)

In creating your program's domain model you used UML to create a “sketch” of of the central objects in the domain, and used the same modeling language to produce a “blueprint” of how those objects could be turned into Java classes. In a similar way you will take the original domain model and create a rough view of of the tables, attributes, and relationships in your database schema. But the initial plan for a database schema is almost never sufficient; you will need to refine your schema so that it represents the data in the most efficient way for the relational paradigm.

Entity Relationship Diagram

While a UML class diagram is the diagram of choice for illustrating the domain model as well as the blueprint of actual classes, the database world mostly uses the Entity-Relationship (ER) diagram. Charles Bachman was one of the first to use diagrams to represent data in the 1960s. Building on Bachman's notation, Peter Chen in the 1970s developed the ER approach, in which rectangle represent entities and lines connecting the rectangles indicate relationships.

Chen's original notation for describing relationships included a diamond to describe the relationship. Because the diamond takes up extra space in a diagram, some authors simplify the notation to a single line, with the name of the relationship above the line. Nowadays the most popular ER diagram notation is the information engineering (IE) or “crow's foot” notation, named for the diagram's approach for representing relationship multiplicity, as explained in Cardinality below.

Cardinality

UML class diagrams come with a notation for indicating the “multiplicity” of a relationship, that is the minimum and maximum of each entity type that can appear in the relationship. For example in UML 1..* indicates “one or more”. The multiplicity consists of two parts: the minimum number (here 1 meaning “one”), and the maximum number (here * meaning unlimited or “many”).

The maximum number of entities that can appear on one side of a relationship is called the cardinality. Referring to an entity on the left of the relationship line, Crow's foot notation indicates a “one” cardinality with a bar |, and a “many” cardinality with a greater-than symbol. Together with the relationship line, the “many” symbol > appears somewhat like a bird's foot, which is how the notation got it's nickname, “crow's foot”.

There are thus three fundamental combinations of modalities in ER diagrams, just as in UML diagrams. A fourth type, a many-to-one (M:1) relationship, is simply the inverse of a one-to-many (1:N) relationship.

one-to-one (1:1): ▭| |▭
one-to-many (1:N): ▭| <▭
many-to-many (M:N): ▭> <▭

There relationship representing “manufactures” between a Company and a Vehicle, for example, would have a cardinality of one-to-many or Company| manufactures <Vehicle. Each company may manufacture many vehicles, but each vehicle is only manufactured by only one company.

Modality

While cardinality refers to the maximum number of a type of entity participating in a relationship, modality refers to the minimum number. In simplest terms this means that the entity is “optional”, denoted by a circle ○, or “required”, again donated by a bar |. The modality, however, is shown on the inside of the relationship line, between the cardinality symbols (that is, not next to the entity it is associated with).

The cardinality of the “manufactures” relationship would be represented as Company|| manufactures ○<Vehicle. A vehicle cannot be manufactured without a company, which is “required”, but a company may exists that has not manufactured any vehicles at all, which would indicate a modality of “optional”.

Entities

As discussed above, a relation represents a class of entities, with each tuple representing an instance of the entity. For much of your domain model, creating your schema will be as simple as creating an entity in the diagram, representing a relation, for each of your domain model classes in your class diagram.

Keys

A more difficult task is finding an attribute to serve as the primary key of an entity. A primary key must be unique and must not change. Some entities may already have an attribute, a natural key, that uniquely identifies instances. For example the automotive industry generates a vehicle identification number (VIN) that uniquely identifies each vehicle produced. A VIN could be used as the primary key of a Vehicle relation in a database for a used-car sales company.

If no suitable natural key is available, you will need to create a surrogate key, an artificial identifier that brings uniqueness to the tuple. Many databases provide a facility to atomically generate an arbitrary counter when a new record is added. Another option is to use a universally unique identifier (UUID), which practically guarantees uniqueness even though they are independently generated.

Relationships

Foreign Keys

“Item Stock” and “Suppliers” tables from Indirection lesson.

Item Stock
Item	Stock	Supplier
table	`20`	`ACME`
chair	`15`	`ACME`
balloon	`99`	`BOZO`
kite	`5`	`BOZO`
doll	10	`BOZO`

Suppliers
ID	Name	Address
`ACME`	Acme Furniture Company	123 Some Street
`BOZO`	Bozo Toy Company	321 Other Street
`TOP`	Top Toys Corporation	456 Far Avenue

These informal tables are not relations! “Item Stock” has no primary key, for example.

In a relational database, a relationship is most always modeled using a foreign key. A foreign key is merely an attribute in one relation that references the primary key of another table. Defining a foreign key constraint allows the database to ensure that for every foreign key in one relation, there exists a matching primary key int the referenced relation; this maintains the referential integrity of your database.

Using a single code to reference rows in another table is exactly the approach you intuitively used in the initial lesson to separate “Item Stock” and “Suppliers” into separate lists, reproduced in the figure. In the “Item Stock” table, the “Supplier” column served as an informal foreign key to the supplier in the “Suppliers” table. There the “ID” column services as a primary key to identify each supplier.

Association Entities

Many-to-many relationships pose an interesting problem in the relational model. Imagine that the store owner has placed each item in one or more categories. That is, each category, such as “furniture”, can contain many items such as “table” and “chair”. Additionally, some items may fall into multiple categories, such as a game table that is both “furniture” and a “toy”. There is therefore a many-to-many relationship between the Item and Category entities.

Department store domain model with category. — Department store domain model with categories.

Simply adding a foreign key to the Item and/or Category entities is insufficient to represent a many-to-many relationship. The solution is to add yet another layer of indirection by creating an entity that represents the relationship of a particular item being in a particular category. This type of entity is called an association entity. It goes by many names including “junction table”.

Furniture schema. — Department store domain model with categories.

Note that the associative entity consists entirely of its single composite key made up of item_code and category_code. Although it could have other attributes to further describe the relationship, all it needs are the foreign keys referencing the entities in the many-to-many relationship.

Generalization

Vehicle class hierarchy with properties.

Mapping an inheritance hierarchy of super classes and subclasses (what UML refers to as “generalization” in class diagrams) to relations requires additional thought. In your object-oriented domain model, generalization indicates that some some class contains common properties that apply to several child classes. The child subclasses likely contain their own specialized properties in addition to those of the super class. Although some database products have added object-oriented features, the relational model does not include the concept of entities that are specializations of other entities, so you will have to use special techniques to represent this in your schema.

To use the examples from the lesson on inheritance, an AbstractVehicle base class might have a color property which would apply to all vehicles. You might have created AbstractWheeledVehicle to serve as a base class to all vehicles with wheels; it might add a wheelCount property. The concrete class Truck might indicate how much of a load it can carry, in some unit such as kilograms. A separate Boat class would inherit the color property from AbstractVehicle, while adding an addition property hullCount to indicate the number of main body portions the boat has.

There are three common approaches for representing inheritance in a relational database schema, each with benefits and trade-offs. You will need to evaluate the approach that best fits your design and use cases. This introduction uses the terminology from Martin Fowler's Patterns of Enterprise Application Architecture; more information and a reference site can be found at the end of this lesson.

single table inheritance: You could create a single relation to represent the entire hierarchy of vehicles. It would need some attribute representing which concrete class was instantiated. The relation heading would include all the possible attributes in the entire hierarchy: in this case color, wheelCount, maxLoad, and hullCount. Each tuple would only supply values for those attributes relating to its indicated type. This technique can result in a large, confusing table with many NULL for many attributes.
concrete table inheritance: Instead of creating a single relation for all classes, you could create a different relation for each concrete class, with each relation including all the attributes related to that class. In this example, you would create a Truck relation with color, wheelCount, and maxLoad attributes; and a Boat relation with color and hullCount attributes. This approach results in duplicate attributes across relations. The database can't enforce relationships in this pattern; you might wind up with two vehicles with conflicting primary keys because they were in different relations.
class table inheritance: Finally you could create a separate relation to represent each entity in the hierarchy—even the abstract classes. The “root” relation would ensure the vehicle identifier is unique across all vehicle types. Each superclass relation would contain attributes relating to itself and all its subclasses. Finally each relation representing a subclass would contain a foreign key indicating the tuple in the super class relation containing its super class properties. You might choose to use the same primary key values across all tables, with these same keys serving both as primary keys and as foreign keys for the subclass relations. Class table inheritance mostly closely reflects the original object-oriented domain model from a relational perspective. This arrangement makes the schema and consequently the application logic more complicated, requiring more complex queries to retrieve the data for a single object instance.

While there are many similarities, the object-oriented paradigm and the relational paradigm have subtle differences in the way they represent information. Relations require some unique, long-lasting identifier as a primary key. Object-oriented systems use memory references for identifiers, and may contain many separate instances in memory at the same time representing the same entity. The relational model also has no concept of inheritance, one of the pillars of the object-oriented model. These differences in paradigms and the problems they cause for mapping between the two models have collectively become known as the object-relational impedance mismatch, although many have argued that the differences are not as big as supposed, and that the two models fundamentally represent the same things. See There is no Such Thing as Object-Relational Impedance Mismatch for one opinion and discussion.

Normalization

The relational model is flexible and permits data to be stuffed into relations in many forms. But not every schema follows relational principles. The relational model has a rigorous mathematical foundation, and it in order to efficiently represent and query information, the schema design should follow certain rules. Breaking these rules may result in duplicated and inconsistent data, as well as make it hard to retrieve data once the data is stored.

Normalization is the process of changing a schema to follow relational principles, preventing data duplication and allowing for flexible querying. Normalization could be considered a type of refactoring of a schema, except that it follows a more formal set of rules. The primary goal of normalization is simple: each tuple in a relation should represent a single instance of a “thing”, identifiable by its primary key. More formally the concept is known as functional dependence (FD), and ensures that the primary key functions like a key in a map, “determining” the values of the other attributes of a tuple.

Normal forms.

Fifth Normal Form

Fourth Normal Form

Boyce-Codd Normal Form

Third Normal Form

Second Normal Form

First Normal Form

There are several ways that a schema could have broken this principle, which is why normalization sets out a series of normal forms that a schema can be in. The normal forms are arranged in a sequence; modifying a schema so that it is one normal form (or verifying that it is already in that normal form) guarantees that it adheres to that form and the normal forms before it. There are at least seven normal forms, and academics are coming up with more all the time.

First Normal Form

Expanded “Item Stock” table from Indirection lesson.

Item Stock
`Item Code`	`Item`	`Stock`	`Supplier`	`Categories`
`100`	table	`20`	Acme Furniture Company, 123 Some Street	furniture
`200`	game table	`12`	Acme Furniture Company, 123 Some Street	furniture, toy
`300`	chair	`15`	Acme Furniture Company, 123 Some Street	furniture
`400`	balloon	`99`	Bozo Toy Company, 321 Other Street	toy
`500`	kite	`5`	Bozo Toy Company, 321 Other Street	toy
`600`	doll	10	Bozo Toy Company, 321 Other Street	toy

The most basic form of normalization, First Normal Form (1NF), requires that each attribute value be atomic, that is, it can't be broken down into smaller pieces. Take another look at the “Item Stock” table the store owner was using in the lesson on indirection. The store owner has decided to note the categories of each item. There are two ways in which this table is not in 1NF:

The Supplier column contains information that really should be separate: the name of a supplier and its address. As mentioned in the lesson on indirection, there would be no way to update a supplier's address without updating all the rows in which the supplier appears.
The Categories column has “repeating values”, a classic example of violation of 1NF. For items in more than one category, such as “game table”, the multiple categories are merely repeated as a single value in the column. Repeating values are hard to search for and manipulate.

You already saw how Category really should be a separate entity. Having a many-to-many relationship with the items, an additional ItemCategory association entity should be created as well.

Second Normal Form

“Item Category” table in only 1NF.

Item Category
`Item Code`	`Category Code`	`Category Name`
`100`	`AAA`	furniture
`200`	`AAA`	furniture
`200`	`BBB`	toy
`300`	`AAA`	furniture
`400`	`BBB`	toy
`500`	`BBB`	toy
`600`	`BBB`	toy

The Second Normal Form (2NF) requires that, in addition to being in 1NF, if there is a composite primary key, the other attributes must be functionally dependent on the entire key. Suppose that the store owner created only an Item Category table that included the item code, the category code, and the category name. This indeed allows a many-to-many relationship; note that item with code 200 (the game table) has both AAA (furniture) and BBB (toy) categories. Similarly there can be many items for one category.

The problem is that the Category Name is functionally dependent on the Category Code, which is only part of the composite key! This results in a lot of duplicated data, and can produce anomalies as explained below. The solution is to recognize that Category should be a separate entity; the Category Name should be moved to a separate Category table, and the Item Category should consist of a composite key to join the Item Stock and Category tables, as illustrated above in the section on Association Entities.

Third Normal Form and Boyce-Codd Normal Form

Suppose further than in addition to splitting out the categories, the owner created Supplier ID, Supplier Name, and Supplier Address columns in the original Item Stock table. As required by 2NF, the Supplier Name and Supplier Address are functionally dependent on the entire primary key, the Item Name. That is, any item code will completely determine the supplier ID, the supplier name, and the supplier address.

Expanded “Item Stock” table in only 1NF.

Item Stock
`Item Code`	`Item`	`Stock`	`Supplier ID`	`Supplier Name`	`Supplier Address`
`100`	table	`20`	`ACME`	Acme Furniture Company	123 Some Street
`200`	game table	`12`	`ACME`	Acme Furniture Company	123 Some Street
`300`	chair	`15`	`ACME`	Acme Furniture Company	123 Some Street
`400`	balloon	`99`	`BOZO`	Bozo Toy Company	321 Other Street
`500`	kite	`5`	`BOZO`	Bozo Toy Company	321 Other Street
`600`	doll	10	`BOZO`	Bozo Toy Company	321 Other Street

But the presence of so much duplicated data should be a hint that there is a problem. Duplicated data not only takes up extra space, it may also lead to an anomaly, in which changes to the data produce other, unintended results. This table exhibits three common anomalies of a relational database that has not been normalized. These anomalies correspond to “create”, “update”, and “delete” of the CRUD operations you are familiar with.

insertion anomaly: It is impossible to add a supplier without there being some item associated with the supplier.
update anomaly: Because the supplier address is duplicated, updating the supplier address for one item, such as “chair”, results in a discrepancy with the same supplier's address as listed for the other items, such “table”.
deletion anomaly: Removing an item that is no longer sold would also remove the supplier, if that supplier didn't supply any other items.

The problem with the new Item Stock table is that the Supplier Name and Supplier Address attributes are functionally dependent on the full Item Code key, they exhibit a transitive functional dependency. Rather than being directly dependent on Item Code, they are each actually functionally dependent on Supplier ID, which in turn is dependent on Item Code.

The Third Normal Form (3NF) requries that, in addition to being in 2NF, there must be no transitive functional dependencies. The solution as before is to recognize that Supplier is a separate entity with Supplier ID, Supplier Name, and Supplier Address attributes. This in fact is exactly what the store owner did in the lesson on indirection, explained in the above section, Foreign Keys.

Products

The relational model has provided popular and useful for decades, so there are many RDBMS products available. Here are some of the significant products you should be aware of, along with any modeling software tailored specifically to that product. The popularity of a particular product does not always reflect how closely it follows the relational model or related industry standards.

Oracle Database: Oracle Corporation, makers of the Oracle Database, became the owner of the Java™ programming language when it acquired Sun Microsystems in 2010.
One of the oldest relational database products, Oracle is very popular for corporate enterprise deployments. Oracle provides the free SQL Developer Data Modeler software, written in Java, for modeling relational database for the Oracle platform. If you plan on actually connecting to an Oracle database, you should probably opt instead for the full Oracle SQL Developer software, also free, which provides live data query and manipulation capabilities in additional to all SQL Developer Data Modeler functionality.
SQL Server: SQL Server is Oracle's main enterprise competition from Microsoft Corporation. For database design Microsoft provides SQL Server Data Tools for Visual Studio as a component of its larger development environment.
PostgreSQL: Raphael Araújo e Silva, the primary developer of pgModeler, is based in Brazil, and added software features specifically in support of this lesson.
With roots going back over three decades, PosgreSQL is a robust, enterprise-class, open-source database system. The open-source pgModeler software is an attractive package for database design tuned specifically for PostgreSQL. This course will primarily use PostgreSQL. You can compile the pgModeler source for free, or download a precompiled demo version supporting a limited number of tables. A full, unlimited version of the software is available for free to official students of this GlobalMentor course; contact your instructor for details.
MySQL: MySQL is another open-source database. Oracle Corporation, which now owns MySQL, makes the product available in a free community addition along with various commercial distributions. The MySQL Workbench software is a colorful, user-friendly tool for designing databases to run on MySQL. It is available in a free community edition, either bundled with MySQL or as a separate installation. You are required to create and sign in with a free Oracle account before you can download any MySQL software.
SQLite: SQLite is a public-domain database with a small footprint. It is made to be embedded in the application, and it claims to have the most installations of any database system. Although not a full modeling tool, the open-source DB Browser for SQLite provides a visual interface for designing and editing SQLite database files. The SQLite database system does not ensure that attribute values are of the same type as the the attribute domain; it should only be chosen if its ability to be embedded outweighs its lack of domain integrity.
H2: Thomas Müller, the creator of H2, first wrote a pure-Java database engine named Hypersonic SQL in the early years of Java. When that project stopped, others created the HSQLDB project based on the Hypersonic SQL source code. Meanwhile Thomas Müller launched H2 or “Hypersonic 2”, writing this new version from scratch. See Write Your Own Database, Again.
The H2 Database Engine is a small, fast, open-source, pure-Java database that is highly compliant with the most recent database industry standards. It can be used in both embedded and client-server mode, and even has an option for storing the entire database in memory. There is no data modeling software specifically targeted for H2, but many modeling tools allow connections to H2 databases.

Review

Gotchas

Don't depend on attributes or “columns” in a relation appearing in any particular order.
Don't depend on tuples or “rows” in a relation appearing in any particular order.
Don't encode meaning into a surrogate key.

In the Real World

The presence of NULL in a database tuple causes many problems; avoid it if you can.

Think About It

Will I ever need to query components of this attribute value? If so, maybe the value should be a separate entity, stored in a separate relation with its own attributes.

Self Evaluation

What mathematical construct is the relational model based on?
What a relation different from a general table?
How can a relational database be considered a set of logical propositions about the world?
Give an example of a natural key.
What does it mean for a key to be “opaque”, and is this a good or a bad thing?
Name and explain at least three types of database constraints.
What is domain integrity, and why is it important? Which popular database product lacks domain integrity?
How are two ways that the word “schema” can be used? Provide examples.
What is “object-relational impedence mismatch”? Give some examples. Do you believe it is more a difference of model semantics or design philosphy?
Explain three techqniques for modeling an object inheritance hierarchy.
What is the primary purpose of the lower normal forms?
What is the minimal normal form you should aim for in your schema?

Task

Enhance your Booker domain model to keep track of the name, address, and web site of each publisher.

Use pgModeler to design a relational database schema for your Booker domain model. Represent the schema with an ER diagram using crow's foot notation.

You will need to use pgModeler version 0.9.1-alpha1 or later for crow's foot notation support, which can be configured in the settings.
Upon startup on Windows, pgModeler may complain that it is unable to write to the plugins directory. This should not prevent any functionality needed to complete this task.
Export your ER diagram to SVG format. Store both the pgModeler file and the SVG file in the doc/ directory you used for storing your UML class diagrams, which now should be located in the root of your repository.

Relational Databases

Goals

Concepts

Preparation

Lesson

Relational Databases

Relations

Attributes

Tuples

Sets

Semantics

Keys

Constraints

Databases

Schema Design

Entity Relationship Diagram

Cardinality

Modality

Entities

Keys

Relationships

Foreign Keys

Association Entities

Generalization

Normalization

First Normal Form

Second Normal Form

Third Normal Form and Boyce-Codd Normal Form

Products

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

See Also

References

Resources

Acknowledgments