InterSystems SQL Reference
VARIANCE, VAR_SAMP, VAR_POP


Aggregate functions that return the statistical variance of a data set.
Synopsis
VARIANCE([ALL  DISTINCT [BY(collist)]] expression [%FOREACH(collist)] [%AFTERHAVING])
VAR_SAMP([ALL  DISTINCT [BY(collist)]] expression [%FOREACH(collist)] [%AFTERHAVING])
VAR_POP([ALL  DISTINCT [BY(collist)]] expression [%FOREACH(collist)] [%AFTERHAVING])
Arguments
These functions return the NUMERIC
data type unless
expression is data type DOUBLE. If
expression is DOUBLE, they return DOUBLE.
Description
These three variance
aggregate functions return the statistical variance of the values of
expression, after discarding NULL values. That is, the amount of variation from the mean value of the data set, expressed as a positive number. The larger the return value, the more variation there is within the data set of values. InterSystems SQL also supplies aggregate functions to return the
standard deviation corresponding to each of these variance functions.
There are slight variations in how this statistical variation is derived:

VARIANCE: Returns 0 if all of the values in the data set have the same value (no variability). Returns 0 if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values.
The VARIANCE calculation is:
(SUM(expression**2) * COUNT(expression))  SUM(expression**2)
_____________________________________________________________
COUNT(expression) * (COUNT(expression)  1)

VAR_SAMP: Sample variance. Returns 0 if all of the values in the data set have the same value (no variability). Returns NULL if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values. Uses the same variant calculation as VARIANCE.

VAR_POP: Population variance. Returns 0 if all of the values in the data set have the same value (no variability). Returns 0 if the data set consists of only one value (no possible variability). Returns NULL if the data set has no values.
The VAR_POP calculation is:
(SUM(expression**2) * COUNT(expression))  (SUM(expression) **2)
_____________________________________________________________
(COUNT(expression) **2 )
These variance aggregate functions can be used in a
SELECT query or subquery that references either a table or a view. They can appear in a
SELECT list or
HAVING clause alongside ordinary field values.
These variance aggregate functions cannot be used in a WHERE clause. They cannot be used in the ON clause of a JOIN, unless the SELECT is a subquery.
These variance aggregate functions return a value of data type NUMERIC with a precision of 36 and a scale of 17, unless expression is data type DOUBLE in which case the function returns data type DOUBLE.
These variance aggregate functions are normally applied to a field or expression that has a numeric value. They evaluate nonnumeric values, including the empty string (''), as zero (0).
These variance aggregate functions ignore NULL values in data fields. If no rows are returned by the query, or the data field value for all rows returned is NULL, they return NULL.
The statistical variance functions, like all aggregate functions, can take an optional
DISTINCT clause.
VARIANCE(DISTINCT col1) returns the variance of those col1 field values that are distinct (unique).
VARIANCE(DISTINCT BY(col2) col1) returns the variance of the col1 field values in records where the col2 values are distinct (unique). Note however that the distinct col2 values may include a single NULL as a distinct value.
Changes Made During the Current Transaction
Like all aggregate functions, the variance functions always returns the current state of the data, including uncommitted changes, regardless of the current transaction’s isolation level. For further details, refer to
SET TRANSACTION and
START TRANSACTION.
Examples
The following example uses VARIANCE to return the variance in the ages of the employees in Sample.Employee, and the variance in the distinct ages represented by one or more employees:
SELECT VARIANCE(Age) AS AgeVar,VARIANCE(DISTINCT Age) AS PerAgeVar
FROM Sample.Employee
The following example uses VAR_POP to return the population variance in the ages of the employees in Sample.Employee, and the variance in the distinct ages represented by one or more employees:
SELECT VAR_POP(Age) AS AgePopVar,VAR_POP(DISTINCT Age) AS PerAgePopVar
FROM Sample.Employee
See Also
Content Date/Time: 20190920 05:47:55