 |
Analyzing the Distribution of Records |
|
|
The First and Last Values of a Selected Field |
|
|
The First Value of a Selected Field |
|
|
|
We have already seen how to create groups (partitions)
of records. One of the ways you can analyze your records is to get some
statistics about each group. Transact-SQL provides a series of functions,
named analytic functions that allow you to get the variations and tendencies
of records within a group.
|
Inside of each group of records, you may want to know
what value comes first based on a column of your choice. To get this
information, you can call the FIRST_VALUE() function. Its syntax
is:
FIRST_VALUE ( [scalar_expression )
OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
RETURNS Data Type of scalar_expression
The FIRST_VALUE() function takes as argument
one the columns of the table or view on which data selection is made. The
function is followed by a call to OVER() that requires arranging
the records.
|
Practical
Learning: Getting the First Value of a Selected Field
|
|
- Start Microsoft SQL Server and connect
- Right-click the name of the server and click New Query
- To see the list of properties from the LambdaPropertiesManagement1
database (the database was created in the previous lesson), type the
following code:
USE LambdaPropertiesManagement1;
GO
SELECT props.Bedrooms,
props.MonthlyRate
FROM Listing.Properties props
ORDER BY props.MonthlyRate;
GO
- To execute, right-click inside the Query Editor and click Execute
- Click inside the Query Editor, press Ctrl + A, and press Delete
- If you use only one column for your SQL statement, if you pass it
both to the FIRST_VALUE() function and to OVER(), the
statement would produce the lowest value of the column passed to the
FIRST_VALUE() function. To see an example, type the following
code:
USE LambdaPropertiesManagement1;
GO
SELECT FIRST_VALUE(props.MonthlyRate) OVER(ORDER BY props.MonthlyRate) [First Monthly Rate]
FROM Listing.Properties props;
GO
- To execute, right-click inside the Query Editor and click Execute.
The result is an instance of each record but with the lowest monthly
rate of all our properties, which is 740
- Of course, you can pass a different value to the
FIRST_VALUE() function and to OVER(). To see an example,
change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType [Property Type],
props.MonthlyRate Rate,
props.City,
FIRST_VALUE(props.MonthlyRate) OVER(ORDER BY props.PropertyType) [First Monthly Rate in Selected Property Type]
FROM Listing.Properties props;
GO
- To execute, on the main menu, click Query -> Execute. Notice that
the result consists of the first value of the monthly rate (passed to
the function) in conjunction with the property type (passed to
OVER() and the value is the same for all records:
- To create different groups of records, change the statement as
follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType [Property Type],
props.MonthlyRate Rate,
props.City,
FIRST_VALUE(props.MonthlyRate) OVER(PARTITION BY props.PropertyType ORDER BY props.PropertyType) [First Monthly Rate in Selected Property Type]
FROM Listing.Properties props;
GO
- Press F5 to execute. This time, groups (partitions) are created
based on the column applied to PARTITION BY. The value produced
by the FIRST_VALUE() function is the first value of the column
passed to that function:
- Click inside the Query Editor, press Ctrl + A, and press Delete
|
The Last Value
of a Selected Field
|
|
As opposed to the first value of a group of records,
you may want to get the last one. To do this, you can call the
LAST_VALUE() function. Its syntax is:
LAST_VALUE ( [scalar_expression )
OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
RETURNS Data Type of scalar_expression
The LAST_VALUE() function takes one the columns
of the table or view as argument. The function follows the same logic as
FIRST_VALUE(), in reverse.
|
Practical
Learning: Getting the First Value of a Selected Field
|
|
- To call the If use only one column for your SQL statement, if you
pass it both to the FIRST_VALUE() function and to OVER(),
the statement would produce the lowest value of the column passed to
the FIRST_VALUE() function. To see an example, type the
following code:
SELECT FIRST_VALUE(props.MonthlyRate) OVER(ORDER BY props.MonthlyRate) [First Monthly Rate]
FROM Listing.Properties props;
GO
- To execute, right-click inside the Query Editor and click Execute.
The result is an instance of each record but with the lowest monthly
rate of all our properties, which is 740
- Of course, you can pass a different value to the
FIRST_VALUE() function and to OVER(). To see an example,
change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType [Property Type],
props.MonthlyRate Rate,
props.City,
FIRST_VALUE(props.MonthlyRate) OVER(ORDER BY props.PropertyType) [First Monthly Rate in Selected Property Type]
FROM Listing.Properties props;
GO
- To execute, on the main menu, click Query -> Execute. Notice that
the result consists of the first value of the monthly rate (passed to
the function) in conjunction with the property type (passed to
OVER() and the value is the same for all records:
- To create different groups of records, change the statement as
follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType [Property Type],
props.MonthlyRate Rate,
props.City,
FIRST_VALUE(props.MonthlyRate) OVER(PARTITION BY props.PropertyType ORDER BY props.PropertyType) [First Monthly Rate in Selected Property Type]
FROM Listing.Properties props;
GO
- Press F5 to execute. This time, groups (partitions) are created
based on the column applied to PARTITION BY. The value produced
by the FIRST_VALUE() function is the first value of the column
passed to that function:
- Click inside the Query Editor, press Ctrl + A, and press Delete
|
The Distribution of Records
|
|
A percentage is a fractional number from a 100 scale.
The fractions go from 1 to 100 and each number is represented by following
it with the % sign. Examples are 1%, 2%, 3%, 4%, 5%, and so on. Imagine
you are in charge of managing rent for 100 houses or properties, or you
are teaching a course to 100 students. Based on these examples, each house
(or student) represents 1% of the total number of houses (or students). 2
houses (or students) represent 2% of the total, and so on.
A fraction is a section of a whole thing. For example,
you can cut a cantaloupe in two parts or fractions. Each fraction is half
the whole:

A whole can have many fractions. For our example of
managing rent for 100 houses or teaching a course to 100 students, for
managing purposes you can split the houses (or students) in half. Each
half would be a fraction and would have 50 houses (or students).
|
You can
number the houses (or students) in the first group from 1 to 50 and the
houses (or student) in the second group from 51 to 100. In the same way,
you can divide the number of houses (or students) in 4 groups. Each group
would contain 25 houses (or students). You can number the first group from
1 to 25, the second group from 26 to 50, and so on.
For some statistical reasons, you may want to take
some action for the house or the student in the 25th position, which is
the last one in the 25% range. Or imagine you want to take some action for
the house or the student in the 50th position, which is the last in the
50% range.
|
 |
A percentile is a positional (position or location)
value that corresponds to the percentage value of a whole. Actually, to
get the percentiles, the whole must be divided in equal parts and each
part can be given a name. If the whole is divided in 4 parts, each part is
called a quartile:
- The first part is called the first quartile. Its internal
positions go from the 1st quartile to the 25th quartile, which include
the 2nd percentile, the third percentile, and so on
- The second part is called the second quartile. Its internal
positions go from the 26th quartile to the 50th quartile
- The third part is called the third quartile. Its internal
positions go from the 51st quartile to the 75th quartile
- The fourth part is called the fourth quartile. Its internal
positions go from the 76th quartile to the 100th quartile
If the whole is divided in two, the divider is called
a median, and is made of two quartiles (the first quartile and the second
quartile).
Transact-SQL provides a series of functions used to
get percentile-related statistics. These are referred to as analytic
functions.
 |
When you have a group of values, each occupies a
certain position. If the list is arranged (ordered), each value holds an
incrementing position as 1, 2, 3, and so on. This position is also
referred to as the rank. For statistical purposes, that rank can be
treated as a weight. To give the same weight or the same importance to
each value, the position, which is the value of the rank, is divided by
the total number of values. This produces a fraction that can be, or is,
converted to a percentage value. To let you get this information,
Transact-SQL provides a function named PERCENT_RANK. Its syntax is:
|
PERCENT_RANK() OVER( [ partition_by_clause ] order_by_clause )
RETURNS float(53)
The PERCENT_RANK() function takes no argument.
The function is followed by a call to OVER() that requires
arranging the records.
|
Practical
Learning: Getting the Percentage Distribution of Records
|
|
- To get the positional distribution of the records, type the
following code:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyNumber [Property #],
props.PropertyType Type,
props.MonthlyRate Rate,
PERCENT_RANK() OVER(ORDER BY props.MonthlyRate) "Rank Distribution"
FROM Listing.Properties props;
GO
- To execute, right-click inside the Query Editor and click Execute.
Notice that values produced by the PERCENT_RANK() function
range from 0 (the lowest) to 1:
- Notice that the values between the extremes are decimal fractions
between 0 and 1. Also notice that the first plus the last values equal
1, the second plus the before last value = 1, the third value from the
beginning plus the third value back from the last = 1, and so on.
Notice that the distribution of fraction is for all records. To show
the distribution for each type of property, change the statement as
follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyNumber [Property #],
props.PropertyType Type,
props.MonthlyRate Rate,
PERCENT_RANK() OVER(PARTITION BY props.PropertyType ORDER BY props.MonthlyRate) "Rank Distribution"
FROM Listing.Properties props;
GO
- Press F5 to execute
- To display the values in percent, change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyNumber [Property #],
props.PropertyType Type,
props.MonthlyRate Rate,
FORMAT(PERCENT_RANK() OVER(ORDER BY props.MonthlyRate), N'P') "Rank Percentage"
FROM Listing.Properties props;
GO
- To execute, on the main menu, click Query -> Execute. Notice that
the result consists of the first value of the monthly rate (passed to
the function) in conjunction with the property type (passed to
OVER() and the value is the same for all records:
- To specify a partition, change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyNumber [Property #],
props.PropertyType Type,
props.MonthlyRate Rate,
FORMAT(PERCENT_RANK() OVER(PARTITION BY props.PropertyType ORDER BY props.MonthlyRate), N'P') "Rank Distribution"
FROM Listing.Properties props;
GO
- To execute, press F5
- Click inside the Query Editor, press Ctrl + A, and press Delete
|
The Discrete
Percentile of a Distribution
|
|
If a table is not empty, one of its obvious
characteristics is that the table has values that have been created (the
values exist and are not hidden) and each value occupies a specific
position (the values of a table of a database are not truly inserted; each
value is added to the end of the existing list, which is referred to as
appending a value to the list). The fact that the values exist and are
known, we say that they are spread or distributed. By definition, data or
record distribution is the characteristic that the values of a table are
available (to be accessed and used).
Discrete distribution is the characteristic that:
- The values of a table are known and available, which means they
are distributed
- The number of values at a certain time is known and constant. This
means that the number of values can be counted. We also say that the
number is finite
A percentile discrete distribution is a value that
represents the percentage position (the percentile) of a value of a column
of a table. To let you calculate the percentile discrete distribution,
Transact-SQL provides a function named PERCENTILE_DISC. Its syntax
is:
PERCENTILE_DISC ( numeric_literal )
WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
OVER ( [ <partition_by_clause> ] )
RETURNS Data Type of order_by_expression;
The PERCENTILE_DISC() function takes one
argument and requires some conditions. The value passed to the function
must be a decimal value between 0.00 and 1.00. The value represents the
fraction by which the percentile will be calculated. We will see how that
number influences the result. After calling the function, you must add a
WITHIN GROUP clause that resembles a function. In the parentheses
of that clause, you arrange the values of a column of your choice. Then,
you must call OVER(). If you don't pass a parameter to it, the
result is the first value of the column passed to WITHIN GROUP().
Otherwise, you can create partitions over but you must not create another
arrangement of records OVER().
|
Practical
Learning: Getting the Discrete Percentile
|
|
- To get a percentile of a continuous distribution of the values of
a column, type the following code:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0) WITHIN GROUP(ORDER BY props.MonthlyRate) OVER() "Mohnthly Rate of First City Selected"
FROM Listing.Properties props;
GO
- To execute, right-click inside the Query Editor and click Execute.
Notice that the function produces the first value of the column passed
to WITHIN GROUP:

- To create partitions, change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0) WITHIN GROUP(ORDER BY props.MonthlyRate) OVER(PARTITION BY props.PropertyType) "Mohnthly Rate of First City Selected"
FROM Listing.Properties props;
GO
- To execute, on the main menu, click Query -> Execute
- To change the rate of the percentile distribution, change the
statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0.25) WITHIN GROUP(ORDER BY props.MonthlyRate) OVER(PARTITION BY props.PropertyType) "First City Within Group"
FROM Listing.Properties props;
GO
- To execute, press F5:
- To try a different rate of the percentile distribution, change the
statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0.50) WITHIN GROUP(ORDER BY props.MonthlyRate) OVER(PARTITION BY props.PropertyType) "Mohnthly Rate of First City Selected"
FROM Listing.Properties props;
GO
- Press F5 to execute
- To try a different rate of the percentile distribution, change the
statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0.75) WITHIN GROUP(ORDER BY props.MonthlyRate) OVER(PARTITION BY props.PropertyType) "Mohnthly Rate of First City Selected"
FROM Listing.Properties props;
GO
- Press F5 to execute
- To change the order within, change the statement as follows:
USE LambdaPropertiesManagement1;
GO
SELECT props.PropertyType,
props.City,
props.MonthlyRate,
PERCENTILE_DISC(0.850) WITHIN GROUP(ORDER BY props.City) OVER(PARTITION BY props.PropertyType) "Mohnthly Rate of First City Selected"
FROM Listing.Properties props;
GO
- To execute, press F5
- Click inside the Query Editor, press Ctrl + A, and press Delete
|
|