SQL

Goals

Concepts

Language

SQL

Library

Java

Preparation

  1. Download and install PostgreSQL.
    • Use the installer from EnterpriseDB if given a choice for your platform.
    • Install PostgreSQL version 10 or above.
    • You will only need the PostgreSQL command line tools for this lesson.
    • Add the PostgreSQL bin/ directory to your system PATH if you wish to access the CLI from the OS command line.
    • You may decide to also install the GUI front-end pgAdmin to use once you have mastered the command-line interface.
    • You may install the full PostgreSQL server for local testing if you desire.
  2. Sign up for a free Herokuaccount.

Lesson

The relational database model, invented by E. F Codd while working at IBM, has been the workhorse of data storage for decades. To access data in its relational database prototypes, the company created a language named SEQUEL for “Structured English Query Language”. Eventually this language became the Structured Query Language (SQL), which is now the most popular form of accessing relational databases. SQL is proving more popular than the relational model itself: even some newer “NoSQL” databases (which you will learn about in future lessons) are providing front-end languages based on SQL.

As relational database products proliferated, vendors adopted different forms of SQL. In an attempt to bring some unification to the various implementations, the American National Standards Institute (ANSI) in the 1980s developed a standard for SQL. Over the years ANSI and the International Organization for Standardization (ISO) have continued expanding and refining the language, releasing a series of standards from SQL-87 to the 15-part SQL:2016 (ISO/IEC 9075:2016). The core language features were already well-defined in the SQL:1999 and SQL:2003 versions, and these lessons will concentrate on those releases. This level of SQL compliance is sometimes referred to as “SQL3”, which was the unofficial name of SQL:1999 while it was being developed.

Declarative Processing

You will remember that relational database theory is is based on sets, and mathematical set theory comes with its own functions for combining sets and and extracting information from them. SQL is therefore a declarative language; rather than indicate processing steps to perform, you provide a definitional statement of which data you want to retrieve. The database system itself will determine the steps it needs to take to perform the request. In many ways SQL programming is like functional programming; you indicate what needs to be done, not how to do it.

In SQL you will describe which sets of information you desire, in which ways they will be combined, and what filters will be used in returning the final result. You will not write code to iterate through the tuples. In SQL, and in the relational model in general, you manipulate data, not on a per-variable or per-tuple basis, but in the unit of the entire set of data in a relation.

Relational Algebra

In the relational model, each relation is a set of tuples—a read-only value, just like an immutable set in Java. When you perform operations on a database, conceptually you use what is called the relational algebra to perform operations on the relations. The relational algebra is based upon set theory, and has similar operations as you might perform with sets, such as “union”.

An important aspect of the relational algebra is that all of its operations produce new relations! Similar to a Java stream processing, the relations produced by the relational algebra can be used in operations themselves. Although SQL does not completely implement a relational model with a pure relational algebra, you will understand the better what is happening and write better code if you think in term of the relational model and relational algebra operations.

Statement Classes

Beyond the day to day storage and retrieval of data, there are other tasks to perform on a database, such as defining the database schema. Legacy database specifications have used the term Data Definition Language (DDL) to refer to those commands related to creating and updating the schema, and Data Manipulation Language (DML) for CRUD operations such as storing data. Another category, Data Control Language (DCL), refers to commands that set the permissions and user access levels. DCL commands are usually highly specific to the database product being used. These “language” designations have long been used as a way to divide SQL commands or statements into categories.

Starting with SQL:1999, the standard classifies SQL statements by function. Here are some of these SQL statement categories.

SQL schema statements
Statements to define the data schema. Roughly corresponds to the DDL category.
SQL data statements
Statements for storing and retrieving data. Roughly corresponds to the DML category.
SQL transaction statements
Statements that control the atomicity of a group of statements.
SQL connection statements
Statements for connecting to a database server.
SQL session statements
Statements for controlling the parameters for interaction with a database server.

Identifiers

Unlike Java and most other modern programming languages,SQL is case-insensitive. You can enter SQL statements in uppercase or lowercase, or even mixed case. The same goes for the names of database objects such as tables; these regular identifiers are case-insensitive as well. Your table can be named FooBar or foobar or FOOBAR, and SQL won't notice the difference.

SQL also allows delimited identifiers, sometimes referred to as “quoted identifiers”, which are case sensitive. If you surround an identifier with quotation mark " characters, SQL will maintain its case and moreover will consider case significant when referring to the identifier. If you create a table named "FooBar", for example, SQL would not find the table if you referred to "foobar" or "FOOBAR". Note that these case sensitivity rules apply to identifiers; the comparison of actual string values are case sensitive unless you request otherwise.

NULL

SQL provides a special value called NULL that is best considered not a value at all. The NULL in SQL is different than the null in Java, the latter of which you can compare as if you would any other value. In SQL NULL is neither equal to nor not equal to any value! This means that NULL does not even equal NULL in SQL! Dealing with NULL in SQL requires special handling; avoid it as much as possible. Dealing with SQL's peculiar NULL logic will be addressed in a future lesson.

Types

SQL indicates several predefined types as general categories. Unfortunately the compliance of the different dialects is very diverse, each providing additional variations. It is therefore difficult to choose types that have the same names and work identically across products.

Numeric

The ISO SQL standard divides number types into exact numeric, and approximate numeric. By “approximate” SQL means that the implementation will use a format, probably IEEE 754, which cannot exactly represent all decimal values. In other words, “approximate numeric” in practice indicates the use of floating point numbers.

All the numeric types have a precision, the total number of digits the number can have. Exact numbers also have a scale, the maximum number of digits that can appear after the decimal point.

Character String

Strings in SQL usually allow you to indicate some maximum size. SQL provides two types of string in general: fixed-length strings and variable-length strings. For fixed-length strings, the database will pad the string by adding trailing spaces to ensure that the string is as long as the size specified.

In SQL a string literal is placed inside apostrophe (single quote) characters, such as 'foo bar', unlike Java which uses double quotes. If you need to include an actual apostrophe in a string, such as “Foo's Bar”, there is no escape characters; you simply repeat the apostrophe character in the middle of the string literal, producing for example 'Foo''s Bar'.

Boolean

A Boolean type represents TRUE or false. Boolean operations in SQL, are more confusing because the presence of NULL can produce an UNKNOWN outcome, which will be discussed in a future lesson. SQL actually allows the UNKNOWN value unless there is a not-null constraint, although few products implement this part of the specification.

Datetime

The ISO SQL specification provides supports for several temporal types, but not as many as Java. The DATE and TIME types correspond to java.time.LocalDate and java.time.LocalTime, respectively. The TIMESTAMP type corresponds to a Java java.time.Instant. But the names of the TIME WITH TIME ZONE and TIMESTAMP WITH TIME ZONE types are misleading, as they refer to time zone offsets; they thus correspond to java.time.OffsetTime and java.time.OffsetDateTime, respectively.

Interval

To express differences in time, SQL has two groups of interval types. A year-month interval refers to an interval expressed in years and/or months—a date-based interval. This is similar in concept to java.time.Period. A day-time interval refers to some number of days, hour, minutes, and/or seconds—a time-based interval. This is similar in concept to java.time.Duration. In addition to storing intervals, subtracting two dates or times produces an interval of the appropriate type.

The SQL specification allows intervals to be specified in various ways, but the simplest is to use INTERVAL followed by a qualifier indicating YEAR, MONTH, DAY, HOUR, MINUTE or SECOND. The qualifier allows a precision to be indicated in parentheses; if not present the precision defaults to 2 except for SECOND, which defaults to a precision of 6. Lastly the qualifier can be followed by TO and another qualifier to specify granularity. For example INTERVAL YEAR(3) TO MONTH indicates an interval of up to 999 years and some months.

Category Type Description PostgreSQL Oracle H2
Exact Numeric INTEGER / INT Represents an integer value. INTEGER / INT
  • NUMBER(precision, scale)
  • NUMBER(precision)
  • NUMBER
INTEGER / INT
SMALLINT Integer with smaller precision and scale of zero. SMALLINT SMALLINT
BIGINT Integer with larger precision and scale of zero. BIGINT BIGINT
NUMERIC[(precision[, scale])] Exact decimal representation, with defined precision and scale.
  • NUMERIC(precision, scale) / DECIMAL(precision, scale)
  • NUMERIC(precision) / DECIMAL(scale)
  • NUMERIC / DECIMAL
  • NUMERIC(precision, scale) / DECIMAL(precision, scale) / DEC(precision, scale)
  • NUMERIC(precision) / DECIMAL(precision) / DEC(precision)
  • NUMERIC / DECIMAL / DEC
DECIMAL[(precision[, scale])] / DEC[(precision[, scale])] May have higher precision than requested. In many database products, DECIMAL is merely a synonym for NUMERIC.
Approximate Numeric REAL Single-precision floating-point; usually IEEE 754. Corresponds to Java float. REAL BINARY_FLOAT REAL
DOUBLE PRECISION Double-precision floating-point; usually IEEE 754. Corresponds to Java double. DOUBLE PRECISION BINARY_DOUBLE DOUBLE PRECISION
FLOAT[(precision)] Arbitrary-precision floating-point; usually IEEE 754. Seldom used in real life.
  • FLOAT
  • FLOAT(precision)
  • FLOAT
  • FLOAT(precision)
FLOAT
Character String CHARACTER[(size)]; CHAR[(size)] Fixed-size string, padded with trailing spaces.
  • CHARACTER / CHAR
  • CHARACTER(size) / CHAR(size)
  • CHAR
  • CHAR(size)
  • CHARACTER / CHAR
  • CHARACTER(size) / CHAR(size)
CHARACTER VARYING(size) / CHAR VARYING(size) / VARCHAR(size) Variable-length string with maximum size specified.
  • CHARACTER VARYING / VARCHAR
  • CHARACTER VARYING(size) / VARCHAR(size)
  • TEXT
Maximum size is ~1GB if unspecified.
  • VARCHAR(size)
  • VARCHAR2(size)
Maximum size is 4000, or 32767 if MAX_STRING_SIZE = EXTENDED in v12.1.
  • VARCHAR
  • VARCHAR(size)
  • VARCHAR2(size)
Maximum size is 2147483647.
Boolean BOOLEAN Represents either TRUE or FALSE. BOOLEAN / BOOL N/A BOOLEAN / BOOL
Datetime DATE A local date as marked on a calendar. Equivalent to java.time.LocalDate. DATE DATE DATE
TIME A local time as appears on a clock. Equivalent to java.time.LocalTime. TIME N/A TIME
TIME WITH TIME ZONE A time with a zone offset. Equivalent to java.time.OffsetTime. TIME WITH TIME ZONE N/A N/A
TIMESTAMP An instant in time in terms of UTC. Equivalent to java.time.Instant. TIMESTAMP TIMESTAMP TIMESTAMP
TIMESTAMP WITH TIME ZONE The date and time in some local zone, designated by an offset from UTC. Equivalent to java.time.OffsetDateTime. TIMESTAMP WITH TIME ZONE TIMESTAMP WITH TIME ZONE TIMESTAMP WITH TIME ZONE
Year-Month Interval INTERVAL YEAR[(precision)] TO MONTH A date-based interval. Similar to java.time.Period.
  • INTERVAL YEAR TO MONTH
  • INTERVAL YEAR(precision) TO MONTH
  • INTERVAL YEAR TO MONTH
  • INTERVAL YEAR(precision) TO MONTH
N/A
Day-Time Interval INTERVAL DAY[(precision)] TO SECOND A time-based interval. Similar to java.time.Period.
  • INTERVAL DAY TO SECOND
  • INTERVAL DAY(precision) TO SECOND
  • INTERVAL DAY TO SECOND
  • INTERVAL DAY(precision) TO SECOND
N/A

SQL Schema Statements

Before getting to the point where you can actually store and retrieve information, you must first provide the database with a definition of your tables and other schema items using SQL DDL statements.

CREATE TABLE

Primary in the schema statement category is the CREATE TABLE statement. It has the general form CREATE TABLE table (elements) as shown in the example below.

Creating an Item table.
CREATE TABLE Item (
  code VARCHAR(99) NOT NULL,
  name VARCHAR(9999) NOT NULL,
  stock INTEGER DEFAULT 0 NOT NULL,
  supplierId INTEGER NOT NULL,
  CONSTRAINT Item_pk PRIMARY KEY (code),
  CONSTRAINT Item_name_un UNIQUE (name),
  CONSTRAINT Item_supplierId_fk FOREIGN KEY (supplierId) REFERENCES Supplier(id)
);
IDENTITY

SQL:2003 introduced a standard way to create a column for a surrogate key that automatically increments when rows are added to a table. An identity column can be useful because it the database takes responsibility for atomically generating unique surrogate key values. Previous solutions used proprietary types such as “serial”. Another solution was to use a sequence, a named counter that exists separate from a table. Identity columns are available in PostgreSQL 10, Oracle 12.1, and above.

Creating a Supplier table.
CREATE TABLE Supplier (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL UNIQUE,
  address VARCHAR(9999) NOT NULL
);

ALTER TABLE

If you need to modify a table later, such as redefining columns or even adding new ones, you can use the ALTER TABLE statement. Unfortunately the syntax for altering a table varies among database products.

ALTER TABLE table ADD CONSTRAINT

One of the most common uses for the ALTER TABLE statement is to add a constraint after a table has already been created. Some developers prefer to add all constraints separately; this is for the most part a matter of style.

Adding a primary key constraint to the Supplier table.
ALTER TABLE Supplier
  ADD CONSTRAINT Supplier_pk PRIMARY KEY (id)
);

SQL Data Statements

After defining your tables, you can populate and then query them using SQL DML statements.

INSERT INTO

The SQL statement for inserting data is straightforward, most commonly taking the form INSERT INTO table [(columns)] VALUES (values). The columns is a comma-separated list of common names; the values are the values to insert, each corresponding to a named column.

Inserting rows into the Supplier table.
INSERT INTO Supplier (name, address) VALUES ('Acme Furniture Company', '123 Some Street');
INSERT INTO Supplier (name, address) VALUES ('Bozo Toy Company', '321 Other Street');
INSERT INTO Supplier (name, address) VALUES ('Top Toys Corporation', '456 Far Avenue');

SELECT

The main statement for searching and retrieving data from a database is SELECT. The simplest form is SELECT columns FROM table, which lists all rows in the indicated table. The comma-separated columns indicate which columns should be included. For columns you may use the asterisk * character as a wildcard to include all columns.

Listing all the rows in the Supplier table.
SELECT * FROM Supplier;

PostgreSQL

As you can see from the above table, the open-source database PosgreSQL is PostgreSQL is highly compliant with the ANSI/ISO SQL standard. The current product started as POSTGRES in the 1980s, but its roots go back to even earlier projects. PostreSQL, often still referred to as “Postgres”, has evolved with the standards and offers many plugins and advanced features not available on other databases.

These lessons will use PostgreSQL in a client/server configuration, as might typically be found in an enterprise setting.  Because of this configuration, installing a local database is optional. At the moment you will only need the command-line tool, but you may wish to install other tools such as the GUI front-end pgAdmin for future use.

  1. Download and install PostgreSQL.
    • Use the installer from EnterpriseDB if given a choice for your platform.
    • You will only need the PostgreSQL command line tools for this lesson.
    • Add the PostgreSQL bin/ directory to your system PATH if you wish to access the CLI from the OS command line.

PostgreSQL provides several organizing structures at a higher level than tables.

schema
As with standard SQL, PostgreSQL uses the word “schema” to refer to a collection of tables and related objects.
database
PosgreSQL refers to a collection of schemas as a “database”. This corresponds to what ANSI/ISO SQL calls a “catalog”. PostgreSQL uses the term “catalog” simply to refer to a system schema.

PostgreSQL provides two system schemas holding metadata.

INFORMATION_SCHEMA
Metadata about the database as prescribed by SQL specification.
PG_CATALOG
Provides more extensive database metadata and PostgreSQL system information.

Heroku

The service Heroku is an entire online platform for deploying your Internet applications, which Heroku calls “apps”. As a cloud platform as a service (PaaS), Heroku allows each app add and remove virtual server container processes, each called a “dyno” because it is dynamically provisioned. The applications code itself is deployed using Git.

Each Heroku app allows “add-ons”, which are extra features or tools. Most important for the purposes of this lesson, one add-on is Heroku Postgres, a live deployment of PostgreSQL you can access securely over the Internet. Moreover Heroku's Hobby Dev plan is completely free, supporting a limited number of connections.

  1. If you don't already have an account, go to Heroku and sign up.
  2. Go to the Heroku dashboard to see your apps.
  3. Select Create New App.
  4. Choose a region for the dyno and select Create app. If you do not specify an app name, Heroku will assign one for you using a series of words and numbers, for example infinite-wildwood-58236.
  5. Go back to your apps and select the app you just created.
  6. On the dashboard of the new app, select Open app.
  7. On the Overview tab, select Configure Add-ons.
  8. Under Add-ons, type “Postgress” in the search field and select Heroku Postgres from the options that appear.
  9. Select the appropriate plan. The Hobby Dev — Free plan should be sufficient for this course.
  10. Select Provision and the heroku-postgresql add-on will be added to your app.
  11. Select the Heroku Postgres :: Database that appears in the list of installed add-ons to bring up the heroku-postgresql dashboard. Verify that the PostgreSQL version is 10 or above.
  12. On the Settings tab, select View Credentials…. The properties that appear are the parameters you will need to connect to your new database.

psql

psql is the official command-line interface for PostgreSQL and is included in the distribution. Start the program by entering psql on the command line. Make sure you have added the PosgreSQL installation directory on the system PATH environment variable, as indicated above.

There are several important psql command-line options you should be familiar with:

--help | -?
Lists the available command-line options.
--version | -V
Shows the version of the psql utility.
--host | -h
Indicates the host on which the database server is running. Defaults to the local system.
--port | -p
Specifies the port to use when connecting to the database. Defaults to port 5432.
--dbname | -d
Specifies which database to connect to. Defaults to the current operating system user for the database name.
--username | -U
Indicates the username to use when connecting to the database. Defaults to the current operating system user.
--command | -c
Runs a a single SQL statement or internal command. Don't forget to use quotes around the command if it contains spaces.
--file | -f
Executes the SQL and/or commands in a specified file.

Use psql to connect to your Heroku Postgres instance, using the information provided using View Credentials… as explained above. Heroku Postgres is likely using the default PostgreSQL port, so you don't need to indicate it when connecting.

Connect to Heroku Postgres using psql.
psql --host example.amazonaws.com --port 5432 --dbname exampledatabase --username exampleusername

Once you are connected, psql provides several useful commands you can use from within its interactive shell. These psql commands, unlike SQL statements, do not need to end with a semicolon ; character.

\?
Provides a list of all commands.
\h command
Provides help on a single command.
\q
Quits psql and exits to the operating system command line.
\conninfo
Displays information about the current connection.
\d[S[+]]
Provides a description of the tables, views sequences, and indexes. If S is included, includes system objects. Adding + indicates that extra details should be included
\connect database | \c database
Connects to a different database.
\g file
Redirects query output to some external file. Use \o to stop redirecting output.
\i file
Executes the SQL in some external script file. This is the interactive equivalent of the command-line --file | f option.
\! [command]
Executes a system command or program in the operating system. By itself \i will open a new command prompt for you to type as many commands as you like, using exit to return to psql.
\copy
Exports data to a delimited text file, or imports a delimited text file into the database. Use \h copy to see more details on the command syntax and options.

View your connection information using \conninfo to ensure that you are connected to the database.View all the tables using \d or \d+. Because your database is empty, nothing should be listed. Use \dS or \dS+ to include the system objects, and you should see all the tables in the pg_catalog schema. Feel free to list the contents of some of these pg_catalog schema tables to get an idea of the sort of metadata PostgreSQL keeps related to your database.

Checking the psql connection and viewing a description of all database objects.
\conninfo
…
\d
…
\dS

Create the necessary tables based upon the model expressed by your ER diagram using SQL DDL statements.

Creating the tables for the item stock schema.
CREATE TABLE Supplier (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL UNIQUE,
  address VARCHAR(9999) NOT NULL
);

CREATE TABLE Item (
  code VARCHAR(99) NOT NULL,
  name VARCHAR(9999) NOT NULL,
  stock INTEGER DEFAULT 0 NOT NULL,
  supplierId INTEGER NOT NULL,
  CONSTRAINT Item_pk PRIMARY KEY (code),
  CONSTRAINT Item_name_un UNIQUE (name),
  CONSTRAINT Item_supplierId_fk FOREIGN KEY (supplierId) REFERENCES Supplier(id)
);

CREATE TABLE Category (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL,
  CONSTRAINT Category_name_un UNIQUE (name)
);

CREATE TABLE ItemCategory (
  itemCode VARCHAR(99) NOT NULL,
  categoryId INTEGER NOT NULL,
  CONSTRAINT ItemCategory_pk PRIMARY KEY (itemCode, categoryId),
  CONSTRAINT ItemCategory_itemCode_fk FOREIGN KEY (itemCode) REFERENCES Item(code),
  CONSTRAINT ItemCategory_categoryId_fk FOREIGN KEY (categoryId) REFERENCES Category(id)
);

Now when you show all database objects using \d, you should see the tables you just created.

Now that the tables exist, populate them with data using SQL DML statements.

Adding data to the tables for the item stock schema.
-- Supplier
INSERT INTO Supplier (name, address) VALUES ('Acme Furniture Company', '123 Some Street');
INSERT INTO Supplier (name, address) VALUES ('Bozo Toy Company', '321 Other Street');
INSERT INTO Supplier (name, address) VALUES ('Top Toys Corporation', '456 Far Avenue');

-- Item
-- The item codes here are arbitrary, and might come from your store catalog.
-- You should first verify the values that were generated in the Supplier.id identity column.
INSERT INTO Item (code, name, stock, supplierId) VALUES ('A1', 'table', 20, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('B2', 'game table', 12, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('C3', 'chair', 15, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('D4', 'balloon', 99, 2);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('E5', 'kite', 5, 2);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('F6', 'doll', 10, 2);

-- Category
INSERT INTO Category (name) VALUES ('furniture');
INSERT INTO Category (name) VALUES ('toy');

-- ItemCategory
-- You should first verify the values that were generated in the Category.id identity column.
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('A1', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('B2', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('B2', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('C3', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('D4', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('E5', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('F6', 2);

Now that you've populated the tables, you will be able to show the contents of any table using the SELECT statement presented above.

Listing all the rows in the Item table.
SELECT * FROM Item;

You now know how to create a database schema, populate the tables with rows, and then list the rows of each table. In coming lessons you'll learn how to perform more advanced queries on your data, taking into account the relationships among the tables.

    Review

    Gotchas

    In the Real World

    Think About It

    Self Evaluation

    Task

    1. Create a set of SQL DDL statements for creating the Booker database schema from the ER diagram. Save these statements in a file data/Booker.ddl.sql.
      • You may of course experiment as much as needed using the psql command-line utility.
      • When finished, you should be able to create your schema from scratch using the psql --file option, specifying the file you created.
    2. Create a set of SQL DML statements for populating the Booker tables with all the publications in your snapshot repository. Save these statements in a file data/Booker.dml.sql.
      • You may of course experiment as much as needed using the psql command-line utility.
      • When finished, you should be able to populate your database from the command line using the psql --file option, specifying the file you created.
    3. Export each of your tables to a different CSV file encoded in UTF-8, with a header naming the columns, and store them in the data/ directory as well. Each CSV file should be named after the official name of the table, with a csv extension.
    4. Submit a pull request as normal with the files you added to the data/ directory of your project.

    See Also

    References

    Resources

    Acknowledgments