Home

Aggregate Queries

 

Aggregate Functions

 

Introduction

Microsoft SQL Server is a powerful application that can be used in various scenarios. For example, a statistitian can use it to keep records and analyze the meaning of numbers stored in tables. To assist you with this, Transact-SQL provides many statistic-based functions, referred to as aggregate functions. They allow you to create particular views named aggregate queries.

Practical LearningPractical Learning: Introducing Aggregate Queries

  1. Start the computer and log in
  2. Launch Microsoft SQL Server and click Connect.
    Make sure you have the RealEstate2 database from previous lessons
  3. In the Object Explorer, expand Databases and expand RealEstate2

Creating an Aggregate Query

Before creating an aggregate query, in the Object Explorer, expand the database on which you will work. Right-click Views and click New View... On the Add Table dialog box, select the table(s) (or view(s)) and close it. To start a summary query:

  • Right-click somewhere in the middle window and click Add Group By
  • On the View Designer toolbar, click the Add Group By button Add Group By
  • On the main menu, click View Designer, and click Add Group By

This gets the view ready for summary.

The formula to programmatically create an aggregate query is:

SELECT WhatField(s)
FROM WhatObject(s)
GROUP BY Column(s)

The new expression in this formula is GROUP BY. This indicates that you want to group some values from one or more columns. There are rules you must follow.

Although you can create an aggregate query with all fields or any field(s) of a view, the purpose of the view is to summarize data. For a good summary view, you should select a column where the records hold categories of data. This means that the records in the resulting view have to be grouped by categories. To support this, the SQL provides the GROUP BY expression. It means where the records display, they would be grouped by their categories. For example, if you want to get the number of videos by rating, you would select the column that holds that information: Rating.

As stated already, the purpose of an aggregate query is to provide some statistics. Therefore, it is normal that you be interested only in the column(s) that hold(s) the desired statistics and avoid the columns that are irrelevant. As a result, if you select (only) the one column that holds the information you want, in the resulting list, each of its categories would display only once.

Practical LearningPractical Learning: Creating an Aggregate Query

  1. Under RealEstate2, right-click Views and click New View...
  2. In the Add Table dialog box, click Properties (Listing) and click Add
  3. Click Close
  4. On the View Designer toolbar, click the Add Group By button Add Group By

Summarizing the Values

To get the types of statistics you want, you must put similar values of records together. That is, you must group the records. For example, if you want to find the total number of boys in a list of students, obviously you must count the number of occurrences of a particular gender in the appropriate column. That's why you use the GROUP BY clause. In the SQL statement, as seen previously, create a GROUP BY clause and  add the list of necessary columns, usually not more than 2.

To visually summarize the values, in the Criteria pane, add the same column one more time and give it a caption in the Alias section. Then, click the Group By box that corresponds to the duplicate column:

Summary Query

Transact-SQL and Statistics

 

The Number of Values (The Size of a Sample)

Probably the most basic piece of information you may want to get about a list is the number of records it has. In statistics, this is referred to as the number of samples. To help you get this information, Transact-SQL provides a function named Count. It counts the number of records in a column and produces the total. This function also counts NULL values. The syntax of the Count() function is:

int COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } ) 

This function takes one argument. To get the count of occurrences of a value, in the Criteria pane, you can select COUNT(*).

The Count() function returns an int value. If you are working on a large number of records, you can call the Count_Big() function. Its syntax is:

bigint COUNT_BIG ( { [ ALL | DISTINCT ] expression } | * )

Practical LearningPractical Learning: Getting the Number of Records

  1. In the Criteria pane, click the first box under Column, then click the arrow of its combo box and select PropertyNumber
     
    Getting the Number of Records
  2. Under Alias, type Number of Properties
  3. Under Group By, click the arrow of the combo box and select Count
     
    Count
  4. To execute, on the View Designer toolbar, click the Execute SQL button Execute SQL
  5. Close the view
  6. When asked whether you want to save, click No
  7. To start a new view, in the Object Explorer, right-click Views and click New View...
  8. In the Add Tables dialog box, double-click PropertyTypes (Listing) and Properties (Listing)
  9. Click Close
  10. To prepare to see the number of properties in each category, in the View Designer toolbar, click the Add Group By button Add Group By
  11. From the PropertyTypes list, drag PropertyTypeID and drop it on top of PropertyTypes in the Properties table
  12. In the PropertyTypes list, click the check box of PropertyType
  13. In the Criteria pane, click the box under PropertyType and select Listing.Properties.PropertyNumber
  14. Press Tab and type Number of Properties
  15. In the Corresponding Group By box, select Count
  16. Set the alias of PropertyType to House Category
  17. To execute, on the View Designer toolbar, click the Execute SQL button Execute SQL
     
    Count
  18. Right-click the PropertyTypes (Listing) title bar and click Remove
  19. In the Criteria pane, right-click the box at the left of PropertyNumber and click Delete

The Minimum Value of a Series

If you have a list of numeric values, you may want to get the lowest value. For example, in a list of houses of a real estate company with each property having a price, you may want to know which house is the cheapest. To let you get this information, Transact-SQL provides the Min() function. The syntax of this function is:

DependsOnType MIN ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Minimum of a Series

  1. In the Criteria Pane, click the first combo box under Column and select MarketValue
  2. Press Tab and type Cheapest Property to change the string of the Alias field
  3. Under Group By, click the arrow of the combo box and select Min
  4. To execute, right-click somewhere in the window and click Execute SQL
     
    Minimum
  5. To prepare to see the cheapest house per category, right-click the Diagram pane and click Add Table...
  6. In the Add Table dialog box, double-click PropertyTypes (Listing)
  7. Click Close
  8. Drag PropertyTypeID from the PropertyTypes table and drop it on PropertyTypeID in the Properties table
  9. In the PropertyTypes (Listing) table, click the check box of PropertyType
  10. In the Criteria pane, move the PropertyType row to the top
  11. To execute, right-click somewhere in the window and click Execute SQL
     
    Minimum
  12. Right-click the title bar of PropertyTypes (Listing) and click Remove

The Maximum Value of a Series

The opposite of the lowest is the highest value of a series. To assist you with getting this value, Transact-SQL provides the Max() function. Its function is:

DependsOnType MAX ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Highest Value of a Series

  1. In the Criteria Pane, click the combo box of Min and select Max
  2. Click Alias and change it to Most Expensive
  3. To execute, right-click somewhere in the window and click Execute SQL
     
    Maximum
  4. To see the most expensive house in each category, right-click the Diagram pane and click Add Table...
  5. In the Add Table dialog box, double-click PropertyTypes (Listing)
  6. Click Close
  7. Drag PropertyTypeID from the PropertyTypes table and drop it on PropertyTypeID in the Properties table
  8. In the PropertyTypes (Listing) table, click the check box of PropertyType
  9. In the Criteria pane, move the PropertyType row to the top
  10. To execute, right-click somewhere in the window and click Execute SQL
     
    Maximum
  11. Right-click the title bar of PropertyTypes table and click Remove

The Sum of Values

The sum of the values of a series is gotten by adding all values. In algebra and statistics, it is represented as follows:

∑x

To let you calculate the sum of values of a certain column of a table, Transact-SQL provides a function named Sum. The syntax of the Sum() function is:

Number SUM ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Sum of Value

  1. In the Criteria pane, click MarketValue, press Tab and type Current Inventory to change the value of the Alias
  2. Under Group By, click Max, then click the arrow of its combo box and select Sum
  3. To execute, right-click somewhere in the window and click Execute SQL
     
    Sum

The Mean

In algebra and statistics, the mean is the average of the numeric values of a series. To calculate it, you can divide the sum by the number of values of the series. It is calculated using the following formula:

Mean

From this formula:

  • x representes each value of the series (called a sample)
  • n represents the number (count) of values
  • x represents the mean of the x values
  • ∑x represents the sum of x values

To support this operation, Transact-SQL provides the Avg function. Its syntax is:

Number AVG ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Mean of Value

  1. In the Criteria pane, click MarketValue, press Tab and type Average Market Value to change the value of the Alias
  2. Under Group By, click Max, then click the arrow of its combo box and select Avg
  3. To execute, right-click somewhere in the window and click Execute SQL
     
    Average
  4. Before seeing the average price of house for each category, right-click the Diagram pane and click Add Table...
  5. In the Add Table dialog box, click PropertyTypes (Listing)
  6. Click Add and click Close
  7. Drag PropertyTypeID from the PropertyTypes table and drop it on PropertyTypeID in the Properties table
  8. In the PropertyTypes click the check box of PropertyType
  9. In the Criteria pane, move the PropertyType row to the top
  10. Change the alias of MarketValue to Mean Per House Type
  11. To execute, right-click somewhere in the window and click Execute SQL
     
    Average
  12. Right-click the title bar of PropertyTypes table and click Remove
 
 
 

The Standard Deviation of a Series

Imagine you have a column with numeric values. You already know how to get the sum and the mean. The standard deviation is a value by which the elements vary (deviate) from the mean. The formula to calculate the standard deviation is:

Standard Deviation

From this formula:

  • x representes each value of the field (column)
  • n represents the number (count) of records
  • x represents the mean of the x records
  • ∑ represents a sum

The above formula wants you to first calculate the mean. As an alternative, you can use a formula that does not require the mean. It is:

Standard Deviation

Instead of creating your own function, Transact-SQL can assist you. First there are two types of standard deviations. The sample standard deviation relates to a sample. To let you calculate it, Transact-SQL provides a function named StdDev. Its syntax is:

float STDEV ( [ ALL | DISTINCT ] expression )

The other standard deviation relates to a population. To help you calculate it, Transact-SQL provides the STDDEVP() function. Its syntax is:

float STDEVP ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Standard Deviation

  1. In the Criteria pane, change the Alias of MarketValue to Standard Deviation
  2. In the corresponding Group By box, select StdDev
  3. To execute, on the View Designer toolbar, click the Execute SQL button
     
    Standard Deviation
  4. On the main menu, click Query Designer -> Add Table...
  5. In the Add Table dialog box, double-click PropertyTypes
  6. Click Close
  7. Drag PropertyTypeID from the PropertyTypes table and drop it on PropertyTypeID in the Properties table
  8. In the Criteria pane, click the first empty box under Column and select dbo.PropertyTypes.PropertyType
  9. Move the PropertyType row to the top
  10. Change the alias of MarketValue to Standard Deviation
  11. To execute, right-click somewhere in the window and click Execute SQL
     
    Standard Deviation
  12. Right-click the title bar of PropertyTypes table and click Remove

The Variance of a Series

The variance is the square of the standard deviation. This means that, to calculate it, you can just square the value of a standard deviation.

As seen with the standard deviation, there are two types of variances. A sample variance relates to a sample. To help you calculate a sample variance of records, Transact-SQL provides VAR function. Its syntax is:

float VAR ( [ ALL | DISTINCT ] expression )

The function used to calculate a population variance is VARP and its syntax is:

float VARP ( [ ALL | DISTINCT ] expression )

Practical LearningPractical Learning: Getting the Standard Deviation

  1. In the Criteria pane, change the Alias of MarketValue to Sample Variance
  2. In the Group By box, select Var
  3. To execute, on the View Designer toolbar, click the Execute SQL button
     
    Variance
  4. Change the alias of MarketValue to Variance Per Category
  5. On the View Designer toolbar, click the Add Table button Add Table
  6. In the Add Table dialog box, double-click PropertyTypes (Listing)
  7. Click Close
  8. Drag PropertyTypeID from the PropertyTypes table and drop it on PropertyTypeID in the Properties table
  9. In the Criteria pane, click the first empty box under Column and select dbo.PropertyTypes.PropertyType
  10. Move the PropertyType row to the top
  11. To execute, right-click somewhere in the window and click Execute SQL
     
    Variance
  12. Close the view

Other Types of Aggregate Queries

 

Using an Expression

As its name indicates, the Expression option allows you to write your own expression that will be applied on the column. Here is an example:

Creating a Condition

Consider the following summary view that calls the Count(*) function:

Where

Notice that it includes videos rated NULL, which is not a real rating. Imagine you want to include only videos that have a true rating. To assist you with setting a condition, you can use a Where option. To do this, in the Criteria pane, add the column on which the summary should be applied and select Where for the Group By field. Then, in the equivalent Filter box, type the condition, and execute the statement. Here is an example:

Where

Having a Criterion

When we mentioned a Where condition in our summary views, we saw that we had to add a duplicate column to apply it. In some cases, this is not necessary. Consider the following summary of payments for each account:

Summary Query: Sum

Imagine you want to see only the accounts whose payments exceed 250.00. To support this, you can add a clause named HAVING to the statement. In this case, the formula to programmatically create an aggregate query is:

SELECT What
FROM WhatObject(s)
GROUP BY Column(s)
HAVING Condition

The new operator in this formula is HAVING. It allows you to specify the criterion by which the SELECT statement should produce its results. For example, to show only the sums that are higher than 250.00, you would write:

SELECT   CustomerAccount AS [Account #], SUM(Amount) AS [Total Payments]
FROM     CreditCardPayments
GROUP BY CustomerAccount
HAVING   SUM(Amount) > 250

To visually specify a HAVING criterion, in the Filter corresponding to the column, type the conditional expression, and execute the statement. Here is an example:

HAVING

Options on Aggregate Functions

 

Computing

Imagine you have a table that has one or more fields with numeric values and you use a SELECT statement to select some of those columns. At the end the statement, you can ask the database engine to perform a calculation using one or more of the aggregate functions and show the result(s). To do this, you use the COMPUTE keyword in a formula as follows:

[ COMPUTE 
    { { AVG | COUNT | MAX | MIN | STDEV | STDEVP | VAR | VARP | SUM } 
  ( expression ) } [ ,...n ] 
    [ BY expression [ ,...n ] ] 
]

As you see, you start with COMPUTE followed by the desired function, which uses parentheses. In the parentheses, include the name of the column that holds the numeric values.

Practical LearningPractical Learning: Ending the Lesson

  1. In the Object Explorer, right-click Databases -> New Query
  2. In the Query window, type the following:
    USE master;
    GO
    DROP DATABASE RealEstate2;
    GO
  3. To execute, on the main menu, click Query -> Execute
  4. Close the Query window
  5. If asked whether you want to save anything, click No
  6. Close Microsoft SQL Server

Exercises

   

Lesson Summary Questions

  1. Which ones of the following are aggregate functions (Select 3)?
    1. LOW
    2. MIN
    3. HIGH
    4. COUNT
    5. SUM
  2. Which ones of the following are aggregate functions (Select 2)?
    1. AVG
    2. STD_DEV
    3. AVERAGE
    4. MAX
    5. TOTAL
  3. What is the basic syntax to create or add a GROUP BY statement?
    1. SELECT Table/View Column(s)
      GROUP BY Column(s)
      FROM Table/View
    2. GROUP BY Column(s)
      SELECT Table/View Column(s)
      FROM Table/View
      
    3. SELECT AND GROUP BY Column(s)
      FROM Table/View
    4. SELECT Table/View Column(s)
      FROM Table/View
      GROUP BY Column(s)
    5. SELECT FROM Table/View
      INCLUDE Column(s)
      GROUP BY Column(s)
  4. What is the basic syntax to create or add a GROUP BY statement?
    1. HAVING Condition
      SELECT What
      FROM WhatObject(s)
      GROUP BY Column(s)
      
    2. SELECT What
      GROUP BY Column(s)
      HAVING Condition
      FROM WhatObject(s)
    3. SELECT What
      FROM WhatObject(s)
      GROUP BY Column(s)
      HAVING Condition
    4. SELECT What
      FROM WhatObject(s)
      HAVING Condition
      GROUP BY Column(s)
    5. GROUP BY Column(s)
      HAVING Condition
      SELECT What
      FROM WhatObject(s)

Answers

  1. Answers
    1. Wrong Answer: There is no function named LOW in Transact-SQL
    2. Right Answer: The MIN() function produces the lowest value of a series
    3. Wrong Answer: There is no function aggregate function named HIGH
    4. Right Answer: The COUNT() function returns a number of records
    5. Right Answer: The SUM() function gives the total of numeric values of a column
  2. Answers
    1. Right Answer: The AVG() function produces the average of a series of numeric values
    2. Wrong Answer: That's not the name of the function for a standard deviation
    3. Wrong Answer: AVERAGE is not the name of an aggregate function
    4. Right Answer: The MAX() function returns the highest value of a numeric series
    5. Wrong Answer: There is no aggregate function named TOTAL in Transact-SQL
  3. Answers
    1. Wrong Answer: The GROUP BY expression must come after FROM
    2. Wrong Answer: The statement must start with SELECT
    3. Wrong Answer: That formula would produce an error
    4. Right Answer: That's the right formula
    5. Wrong Answer: FROM cannot immediately follow SELECT
  4. Answers
    1. Wrong Answer: The formula does not start with HAVING
    2. Wrong Answer: The FROM clause must start with SELECT
    3. Right Answer: That's the right formula
    4. Wrong Answer: The GROUP BY and HAVING clauses have been inversed
    5. Wrong Answer: You must start with a SELECT statement
 
 
   
 

Previous Copyright © 2007-2011 FunctionX.com Next