SQL Subqueries

Goals

Concepts

Language

SQL

Lesson

One of the most powerful aspects of SQL is that it allows you to use subqueries, which are literally queries that appear within the main query, surrounded by parentheses. Subqueries can themselves contain subqueries, and so on. There are two locations in which you would typically use a subquery: in the WHERE clause, where the subquery serves as data that plays a part in the restriction predicate; and in the FROM clause, where the subquery serves as a source of data, just like a table.

Like queries in general, a subquery technically returns a table. Depending on what the query requested, the result table may contain a single column with a single row, a single column with multiple rows, or a general table with multiple columns and possibly multiple rows. Which type of result is called for determines where the subquery can be placed as an expression in the main query.

Scalar Subqueries

Listing all items with under-average stock.
SELECT name, stock FROM Item
  WHERE stock <
    (SELECT AVG(stock) FROM Item);
name stock
table 20
game table 12
chair 15
kite 5
doll 10
Finding the average item stock.
SELECT AVG(stock) FROM Item;
avg
 26.8333333333333333

If a subquery returns a table with a single column containing a single row, it is referred to as a scalar subquery. This sort of subquery can be used where a single value would be expected in an expression.

By now you know that you could list the items with stock greater than 10 with the statement SELECT name FROM Item WHERE stock < 20. But if you wish to find the items that have a stock less than the average stock, you would have to somehow know the average stock ahead of time—unless you include a query to determine the average stock as a subquery within the statement itself! As you have seen, you can use SELECT AVG(stock) FROM Item to determine the average stock, so SELECT name FROM Item WHERE stock < (SELECT AVG(stock) FROM Item) would list only items that have less stock than the average. In this example the average is high because there are so many balloons in stock.

Multiple-Value Subqueries

IN

A subquery may be structured so as to potentially return multiple rows in a single column. The values in this column can be used in an expression where multiple values would be expected. You've studied the IN keyword, which checks if a given value occurs in a set of values, such as WHERE color IN ('red', 'green', 'blue'). A subquery that returns multiple rows of values in a single column can be used as the set of values, creating a dynamic expression.

Suppose you want to list the suppliers of items with stock under 20. Finding the supplier ID of items with stock less than 20 is not so different than the queries above: SELECT supplierId FROM Item WHERE stock < 20. As this query returns a single column of supplier IDs, you can use the result as a set of values in the predicate for restricting the list of suppliers: SELECT Supplier.name FROM Supplier WHERE Supplier.id IN (SELECT supplierId FROM Item WHERE stock < 20). The columns Supplier.name and Supplier.id are qualified for clarity; simply name and id could have been used.

ANY

The IN operator implies that an equality comparison is being performed with each of the items returned by a subquery. The ANY operator is a more general form that allows any comparison to be compared against items, not just quality. You could for example find all items with stock less than the stock of any item (not necessarily all items) from Acme Furniture Company using SELECT name FROM Item WHERE Item.stock < ANY (SELECT stock FROM Item JOIN Supplier ON Item.supplierId = Supplier.id WHERE Supplier.name = 'Acme Furniture Company').

ALL

A similar operator, ALL, allows you to create a predicate to ensure that a comparison holds against all the results of a subquery. As an example you could find all items with stock greater than the stock of all items from Acme Furniture Company using SELECT name FROM Item WHERE Item.stock > ALL (SELECT stock FROM Item JOIN Supplier ON Item.supplierId = Supplier.id WHERE Supplier.name = 'Acme Furniture Company').

Multiple-Table Subqueries

TODO

Subqueries as Data Sources

The previous statements have used subqueries as predicates in the WHERE clause, providing information to use when restricting the returned rows. Because subqueries produce tables, those tables can themselves be used as a source of data in their own right. In the FROM clause a subquery acts like a table that existed in the original schema. The main difference is that the subquery table does not have a name. If the columns returned by the subquery are ambiguous, you will need to create an table name for subquery, as you would normally rename columns using the AS keyword. The Grouping section above already included an example of a subquery as a source of data.

Using a subquery to provide names for categories grouped by ID.
SELECT Category.name, CategoryGroup.stockSum
  FROM (
    SELECT categoryId, SUM(stock) AS stockSum
      FROM ItemCategory
      JOIN Item ON ItemCategory.itemCode = Item.code
      GROUP BY categoryId
  ) CategoryGroup
  JOIN Category ON CategoryGroup.categoryId = Category.id;

If you wanted to find the total number of items in stock in each category, you would need to join the Item table with the ItemCategory table (because items can be in multiple categories) and group by the item category ID: SELECT categoryId, SUM(stock) FROM ItemCategory JOIN Item ON ItemCategory.itemCode = Item.code GROUP BY categoryId. Listing the categories by category ID, however, is less than useful to users. You could do a three-table join with the Category table to find the category name, but because you are grouping on categoryId rather than Category.name, SQL will not let you include the Category.name column in the result, as it is neither a grouping column nor an aggregate function.

Instead you can consider the grouping result as a source of data, joining the entire result table with the Category table. To use the result you will need to rename the subquery result table so that you can refer to it in the outer query. You also need to rename the SUM(stock) column so that you can refer to it in the outer query as well. The request is shown in the future on the side.

Correlated Subqueries

The subqueries shown so far have been completely independent of the outer query in which it appears. These sort of subqueries can be performed once for the entire query; the result is then fed into the outer query. SQL also allows a subquery to reference to reference variables in the outer query. In such a correlated subquery the subquery is dependent on (“correlated with”) the outer query.

You already saw that you can list all items with under the average stock using SELECT name FROM Item WHERE stock < (SELECT AVG(stock) FROM Item). Because the subquery SELECT AVG(stock) FROM Item requests the average stock of all items, the database only needs to execute the query once; the same resulting average can be used in the comparison for each row in the outer query.

Listing all items with stock under supplier average.
SELECT name, stock
  FROM Item CurrentItem
  WHERE stock <
    (SELECT AVG(stock)
      FROM Item
      WHERE Item.supplierId = CurrentItem.supplierId);
name stock
game table 12
kite 5
doll 10
chair 15

Suppose however that you wanted to list all items each of which was under the average stock of all items from its supplier. The subquery would need to calculate the average stock of that item's supplier, and to do this it would have to reference the item from the outer query to be able to find the supplier. Because many times both the inner query and the outer query reference the same tables, when crafting a correlated subquery you will normally need to use an alias one one or more tables in the outer query for the inner table to refer to.

Assuming that the outer query continues to reference the Item table but using the alias CurrentItem, the inner query could calculate the average stock for that item's supplier: SELECT AVG(stock) FROM Item WHERE Item.supplierId = CurrentItem.supplierId. The logic of the outer query would stay the same, as shown in the figure. Only the restriction predicate, the subquery, has changed to be dependent on the outer query rather than independent.

EXISTS

A common operator used with correlated subqueries is EXISTS, which simply checks to see that a result table has at least one row (i.e. the result “set” is not empty). If you want to list all items in the furniture category, could use a query to determine the furniture category ID: SELECT id FROM Category WHERE Category.name = 'furniture'. You could then use this query as a subquery inside a correlated subquery to only return items that match the same category: SELECT * FROM ItemCategory WHERE ItemCategory.categoryId = (SELECT id FROM Category WHERE Category.name = 'furniture'). Lastly the outer query would list all items for which there exist an item category association that matches that item and the furniture category ID: SELECT name FROM Item CurrentItem WHERE EXISTS (SELECT * FROM ItemCategory WHERE ItemCategory.itemCode = CurrentItem.code AND ItemCategory.categoryId = (SELECT id FROM Category WHERE Category.name = 'furniture')).

Correlated Updates

Ordering the maximum stock an item's supplier allows.
UPDATE Item
  SET stock = stock +
    (SELECT maxStockOrder
      FROM Supplier
      WHERE Item.supplierId = Supplier.id);

Correlated queries are indispensable when updating columns with values that depend on other tables. Suppose that you wanted to increase the stock off all the items by the same amount, such as 50. As you've seen this is possible using a simple update query: UPDATE Item SET stock = stock + 50.

But consider if each supplier were to indicate a maximum stock order, valid for for all the items provided by that supplier. You wish to increase the stock of each item by the maximum the supplier allows items. For each item you would need to know the item's supplier in order to find out the supplier's maximum stock order. You would use a correlated subquery to find the maxStockOrder for in the Supplier table, for the supplier of each item you update, as shown in the figure.

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

See Also

References

Acknowledgments