Description
The optional WHERE clause can be used for the following purposes:
-
To specify predicates that restrict which data values are to be returned.
-
To specify an explicit join between two tables.
-
To specify an implicit join between the base table and a field in another table.
The WHERE clause is most commonly used to specify one or more predicates that are used to restrict the data (filter out rows) retrieved by a SELECT query or subquery. You can also use a WHERE clause in an UPDATE command, DELETE command, or in a result set SELECT in an INSERT (or INSERT OR UPDATE) command.
The WHERE clause qualifies or disqualifies specific rows from the query selection. The rows that qualify are those for which the condition-expression is true. The condition-expression can be one or more logical tests (predicates). Multiple predicates can be linked by the AND and OR logical operators. See “Predicates and Logical Operators” for further details and restrictions.
If a predicate includes division and there are any values in the database that could produce a divisor with a value of zero or a NULL value, you cannot rely on order of evaluation to avoid division by zero. Instead, use a CASE statement to suppress the risk.
A WHERE clause can specify a condition-expression that includes a subquery. The subquery must be enclosed in parentheses.
A WHERE clause can specify an explicit join between two tables using the = (inner join) symbolic join operator. For further details, refer to the JOIN page of this manual.
A WHERE clause can specify an implicit join between the base table and a field from another table using the arrow syntax (–>) operator. For further details, refer to Implicit Joins in Using InterSystems SQL.
Specifying a Field
The simplest form of a WHERE clause specifies a predicate comparing a field to a value, such as WHERE Age > 21. Valid field values include the following: a column name (WHERE Age > 21); an %ID, %TABLENAME, or %CLASSNAME; a scalar function specifying a column name (WHERE ROUND(Age,-1)=60), a collation function specifying a column name (WHERE %SQLUPPER(Name) %STARTSWITH ' AB').
You cannot specify a field by column number.
Because the name of the RowID field can change when a table is re-compiled, a WHERE clause should avoid referring to the RowID by name (for example, WHERE ID=22). Instead, refer to the RowID using the %ID pseudo-column name (for example, WHERE %ID=22).
You cannot specify a field by column alias; attempting to do so generates an SQLCODE -29 error. However, you can use a subquery to define a column alias, then use this alias in the WHERE clause. For example:
SELECT Interns FROM
(SELECT Name AS Interns FROM Sample.Employee WHERE Age<21)
WHERE Interns %STARTSWITH 'A'
You cannot specify an aggregate field; attempting to do so generates an SQLCODE -19 error. However, you can supply an aggregate function value to a WHERE clause by using a subquery. For example:
SELECT Name,Age,AvgAge
FROM (SELECT Name,Age,AVG(Age) AS AvgAge FROM Sample.Person)
WHERE Age < AvgAge
ORDER BY Age
Integers and Strings
If a field defined as integer data type is compared to a numeric value, the numeric value is converted to canonical form before performing the comparison. For example, WHERE Age=007.00 parses as WHERE Age=7. This conversion occurs in all modes.
If a field defined as integer data type is compared to a string value in Display mode, the string is parsed as a numeric value. For instance, an empty string (''), like any non-numeric string, is parsed as the number 0. This parsing follows ObjectScript rules for handling strings as numbers. For example, WHERE Age='twenty' parses as WHERE Age=0; WHERE Age='20something' parses as WHERE Age=20. For further details, refer to Strings as Numbers in the “Data Types and Values” chapter of Using ObjectScript. SQL only performs this parsing in Display mode; in Logical or ODBC mode comparing an integer to a string value returns null.
To compare a string field with a string containing a single quote, double the single quote. For example, WHERE Name %STARTSWITH 'O''' returns O’Neil and O’Connor, but not Obama.
Date and Time
In InterSystems SQL, dates and times are compared and stored using a Logical Mode internal representation. They can be returned in Logical mode, Display Mode, or ODBC mode. For example, the date September 28, 1944 is represented as: Logical mode 37891, Display mode 09/28/1944, ODBC mode 1944-09-28. When specifying a date or time in a condition-expression, an error can occur due to a mismatch of SQL mode and date or time format, or due to an invalid date or time value.
A WHERE clause condition-expression must use the date or time format that corresponds to the current mode. For example, when in Logical mode, to return records with a date of birth in 2005, the WHERE clause would appear as follows: WHERE DOB BETWEEN 59901 AND 60265. When in Display mode, the same WHERE clause would appear as follows: WHERE DOB BETWEEN '01/01/2005' AND '12/31/2005'.
Failing to match the condition-expression date or time format to the display mode results in an error:
-
In Display mode or ODBC mode, specifying date data in the incorrect format generates an SQLCODE -146 error. Specifying time data in the incorrect format generates an SQLCODE -147 error.
-
In Logical mode, specifying date or time data in the incorrect format does not generate an error, but either returns no data or returns unintended data. This is because Logical mode does not parse a date or time in Display or ODBC format as a date or time value. The following WHERE clause, when executed in Logical mode, returns unintended data: WHERE DOB BETWEEN 37500 AND 38000 AND DOB <> '1944-09-28' returns a range of DOB values, including DOB=37891 (September 28, 1944), which the <> predicate was attempting to omit.
An invalid date or time value also generates an SQLCODE -146 or -147 error. An invalid date is one that you can specify in Display mode/ODBC mode, but InterSystems IRIS cannot convert into a Logical mode equivalent. For example, in ODBC mode the following generates an SQLCODE -146 error: WHERE DOB > '1830-01-01' because InterSystems IRIS cannot process a date value prior to December 31, 1840. The following in ODBC mode also generates an SQLCODE -146 error: WHERE DOB BETWEEN '2005-01-01' AND '2005-02-29', because 2005 is not a leap year.
When in Logical mode, a Display mode or ODBC mode value is not parsed as a date or time value, and therefore its value is not validated. For this reason, in Logical mode a WHERE clause such as WHERE DOB > '1830-01-01' does not return an error.
Stream Fields
In most situations, you cannot use a stream field in a WHERE clause predicate. Doing so results in an SQLCODE -313 error. However, the following uses of stream fields are allowed in a WHERE clause:
-
Stream null testing: you can specify the predicate streamfield IS NULL or streamfield IS NOT NULL.
-
Stream length testing: you can specify a CHARACTER_LENGTH(streamfield), CHAR_LENGTH(streamfield), or DATALENGTH(streamfield) function in a WHERE clause predicate.
-
Stream substring testing: you can specify a SUBSTRING(streamfield,start,length) function in a WHERE clause predicate.
List Structures
InterSystems IRIS supports the list structure data type %List (data type class %Library.List). This is a compressed binary format, which does not map to a corresponding native data type for InterSystems SQL. It corresponds to data type VARBINARY with a default MAXLEN of 32749. For this reason, Dynamic SQL cannot use %List data in a WHERE clause comparison. For further details, refer to the Data Types reference page in this manual.
To reference structured list data, use the %INLIST predicate or the FOR SOME %ELEMENT predicate.
To use the data values of a list field in a condition-expression, you can use %EXTERNAL to compare the list values to a predicate. For example, to return all records in which the FavoriteColors list field value consists of the single element 'Red':
SELECT Name,FavoriteColors FROM Sample.Person
WHERE %EXTERNAL(FavoriteColors)='Red'
When %EXTERNAL converts a list to DISPLAY format, the displayed list items appear to be separated by a blank space. This “space” is actually the two non-display characters CHAR(13) and CHAR(10). To use a condition-expression against more than one element in the list, you must specify these characters. For example, to return all records in which the FavoriteColors list field value consists of the two elements 'Orange' and 'Black' (in that order):
SELECT Name,FavoriteColors FROM Sample.Person
WHERE %EXTERNAL(FavoriteColors)='Orange'||CHAR(13)||CHAR(10)||'Black'
Variables
A WHERE clause predicate can specify:
A %TABLENAME, or %CLASSNAME pseudo-field variable keyword. %TABLENAME returns the current table name. %CLASSNAME returns the name of the class corresponding to the current table. If the query references multiple tables, you can prefix the keyword with a table alias. For example, t1.%TABLENAME.
One or more of the following ObjectScript special variables (or their abbreviations): $HOROLOG, $JOB, $NAMESPACE, $TLEVEL, $USERNAME, $ZHOROLOG, $ZJOB, $ZNSPACE, $ZPI, $ZTIMESTAMP, $ZTIMEZONE, $ZVERSION.
List of Predicates
The SQL predicates fall into the following categories:
Predicate Case-Sensitivity
A predicate uses the collation type defined for the field. By default, string data type fields are defined with SQLUPPER collation, which is not case-sensitive. The “Collation” chapter of Using InterSystems SQL provides details on defining the string collation default for the current namespace and specifying a non-default field collation type when defining a field/property.
The %INLIST, Contains operator ([), %MATCHES, and %PATTERN predicates do not use the field’s default collation. They always uses EXACT collation, which is case-sensitive.
A predicate comparison of two literal strings is always case-sensitive.
Predicate Conditions and %NOINDEX
You can preface a predicate condition with the %NOINDEX keyword to prevent the query optimizer using an index on that condition. This is most useful when specifying a range condition that is satisfied by the vast majority of the rows. For example, WHERE %NOINDEX Age >= 1. For further details, refer to Using %ALLINDEX, %IGNOREINDEX, and %NOINDEX in the SQL Optimization Guide.
Predicate Condition on Outlier Value
If the WHERE clause in a Dynamic SQL query selects on a non-null outlier value, you can significantly improve performance by enclosing the outlier value literal in double parentheses. These double parentheses cause Dynamic SQL to use the outlier selectivity when optimizing. For example, if your business is located in Massachusetts (MA), a large percentage of your employees will reside in Massachusetts. For the Employees table Home_State field, 'MA' is the outlier value. To optimally select for this value, you should specify WHERE Home_State=(('MA')).
This syntax should not be used in Embedded SQL or in a view definition. In Embedded SQL or a view definition, the outlier selectivity is always used and requires no special coding.
A WHERE clause in a Dynamic SQL query automatically optimizes for a null outlier value. For example, a clause such as WHERE FavoriteColors IS NULL. No special coding is required for IS NULL and IS NOT NULL predicates when NULL is the outlier value.
Outlier selectivity is determined by running the Tune Table utility.
BETWEEN Predicate
The BETWEEN comparison operator allows you to select those data values that are in the range specified by the syntax BETWEEN lowval AND highval. This range is inclusive of the lowval and highval values themselves. This is equivalent to a paired greater than or equal to operator and a less than or equal to operator. This comparison is shown in the following example:
SELECT Name,Age FROM Sample.Person
WHERE Age BETWEEN 18 AND 21
This returns all the records in the Sample.Person table with an Age value between 18 and 21, inclusive of those values. Note that you must specify the BETWEEN values in ascending order; a predicate such as BETWEEN 21 AND 18 would return no records.
Like most predicates, BETWEEN can be inverted using the NOT logical operator, as shown in the following example:
SELECT Name,Age FROM Sample.Person
WHERE Age NOT BETWEEN 20 AND 55
ORDER BY Age
This returns all the records in the Sample.Person table with an Age value less than 20 or greater than 55, exclusive of those values.
BETWEEN is commonly used for a range of numeric values, which collate in numeric order. However, BETWEEN can be used for a collation sequence range of values of any data type.
BETWEEN uses the same collation type as the column it is matching against. By default, string data types collate as not case-sensitive.
For further details, refer to the BETWEEN predicate reference page in this manual.