$ZCONVERT (ObjectScript)
Synopsis
$ZCONVERT(string,mode,trantable,handle)
$ZCVT(string,mode,trantable,handle)
Arguments
Argument | Description |
---|---|
string | The string to convert, specified as a quoted string. This string can be specified as a value, a variable, or an expression. |
mode | A letter code specifying the conversion mode, either the type of case conversion or input/output encoding. Specify mode as a quoted string. |
trantable | Optional — The translation table to use, specified as either an integer or a quoted string. |
handle | Optional — An unsubscripted local variable that holds a string value. Used for multiple invocations of $ZCONVERT. The handle contains the remaining portion of string that could not be converted at the end of $ZCONVERT, and supplies this remaining portion to the next invocation of $ZCONVERT. |
Description
$ZCONVERT converts a string from one form to another. The nature of the conversion depends on the arguments you use.
$ZCONVERT Returns a Converted String
$ZCONVERT(string, mode) returns string with the characters converted as specified by mode. The conversions are of two types:
-
Case conversion
-
Encoding translation
Case conversion changes the case of each letter character in the string. You can change all letter characters in a string to their lowercase, uppercase, or titlecase form. You can change the initial letter of words or sentences in a string to their uppercase form. Characters that are already in the specified case, and characters with no case (usually any non-alphabetic character) in the string are passed through unchanged. To output a literal quote character (“) within a string, input two quote characters (““). For further case conversion options, including non-ASCII and customized case conversion, see System Classes for National Language Support.
Encoding translation translates string between the internal encoding style used on your system and another encoding style. You can perform input translation; that is, translate string from an external encoding style to the encoding style of your system. You can also perform output translation; that is, translate string from the encoding style of your system to an external encoding style. For further I/O translation options, including non-ASCII and customized translation, see System Classes for National Language Support.
The values you can use for mode are as follows:
Mode Code | Meaning |
---|---|
U or u | Uppercase translation: Convert all characters in string to uppercase. |
L or l | Lowercase translation: Convert all characters in string to lowercase. |
T or t | Titlecase translation: Convert all characters in string to titlecase. Titlecase is only meaningful for those alphabets (principally Eastern European) that have three forms for a letter: uppercase, lowercase, and titlecase. For all other letters, titlecase translation is the same as uppercase translation. |
W or w | Word translation: Convert the first character of each word in string to uppercase. Any character preceded by a blank space, a quotation mark ("), an apostrophe ('), or an open parenthesis (() is considered the first character of a word. Word translation converts all other characters to lowercase. Word translation is locale specific; the above syntax rules for English may differ for other language locales. |
S or s | Sentence translation: Convert the first character of each sentence in string to uppercase. The first non-blank character of string, and any character preceded by a period (.), question mark (?), or exclamation mark (!) is considered the first character of a sentence. (Blank spaces between the preceding punctuation character and the letter are ignored.) If this character is a letter, it is converted to uppercase. Sentence translation converts all other letter characters to lowercase. Sentence translation is locale specific; the above syntax rules for English may differ for other language locales. |
I or i | Perform input encoding translation on a specified string. For the two-argument form, the translation is performed using the current process I/O translation handle. If a current process I/O translation handle has not been defined, InterSystems IRIS performs translation based on the default process I/O translation table name. |
O or o | Perform output encoding translation on a specified string. For the two-argument form, the translation is performed using the current process I/O translation handle. If a current process I/O translation handle has not been defined, InterSystems IRIS performs translation based on the default process I/O translation table name. |
A or a | Remove accents from a string. |
AU or au | Remove accents from a string, then convert to upper case. |
AL or al | Remove accents from a string, then convert to lower case. |
If mode is a null string or any value other than the valid characters, you receive a <FUNCTION> error.
Letter Case Translation
You can convert letters in strings to all uppercase letters or all lowercase letters. Conversion works on Unicode letters as well as ASCII letters. The following example converts the Greek alphabet from lowercase to uppercase:
FOR i=945:1:969 {WRITE $ZCONVERT($CHAR(i),"U")}
However, a small number of letters only have a lowercase letter form. For example, the German eszett ($CHAR(223)) is only defined as a lowercase letter. Attempting to convert it to an uppercase letter results in the same lowercase letter:
IF $ZCONVERT($CHAR(223),"U")=$ZCONVERT($CHAR(223),"L") {
WRITE "uppercase and lowercase letter are the same" }
ELSE {WRITE "uppercase and lowercase are different" }
For this reason, when converting alphanumeric strings to a single letter case it is always preferable to convert to lowercase.
You can perform similar letter case translations using the $TRANSLATE function, as shown in the following example:
WRITE $TRANSLATE(text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz")
Word and Sentence Translation
“W” and “S” modes determine whether a non-blank character is the first character of a word or the first character of a sentence, and if that character is a letter, translate it to uppercase. All other letters are translated to lowercase. Case translation works on letters in any alphabet, as shown in the following example which converts Greek letters ($CHAR(945) is lowercase alpha; $CHAR(913) is uppercase alpha):
SET greek=$CHAR(945,946,947,913,914,915)
WRITE $ZCONVERT(greek,"W")
However the rules determining what constitutes a word or sentence are locale dependent. For example, the following example uses the Spanish inverted exclamation point $CHAR(161). The default (English) locale does not recognize this character as beginning a sentence or word. In this example, all letters in spanish are translated to lowercase:
SET spanish=$CHAR(161)_"ola MuNdO! "_$CHAR(161)_"olA!"
SET english="hElLo wOrLd! heLLo!"
WRITE !,$ZCONVERT(english,"S")
WRITE !,$ZCONVERT(spanish,"S")
Titlecase Translation
Titlecase (“T”) mode converts every letter in the string to its titlecase form. Titlecase does not selectively uppercase letters based on their position in a word or string. Titlecase is the case that a letter is represented in when it is the first character of a word in a title. For standard Latin letters, the titlecase form is the same as the uppercase form.
Some languages (for example, Croatian) represent particular letters by two letter glyphs. For example, “lj” is a single letter in the Croatian alphabet. This letter has three forms: lowercase “lj”, uppercase “LJ”, and titlecase “Lj”. $ZCONVERT titlecase translation is used for this type of letter conversion.
Three-argument Form: Encoding Translation
$ZCONVERT(string, mode, trantable) performs either an input encoding translation or an output encoding translation on string. In the three-argument form, the mode values you can use are either "I" or "O". You must define the mode value. For “I” translations, the string may be a hexadecimal string, such as %4B (the letter “K”); hexadecimal strings are not case-sensitive.
You can use ZZDUMP to display the hexadecimal encoding for a string of characters. You can use $CHAR to specify a character (or string of characters) by its decimal (base 10) encoding; you can use $ZHEX to convert a hexadecimal number to a decimal number, or a decimal number to a hexadecimal number. If the translated value is a non-printing character, InterSystems IRIS displays it as a null string. If the target device cannot represent a translated character, InterSystems IRIS substitutes a question mark (?) character for the non-displayable character.
The trantable value can be a numeric character or a string that specifies the translation table or translation handle to use. The trantable value can be:
-
An integer value specifying a process I/O translation object. Available values are 0 through 3 (0 represents the current process I/O translation object).
-
An uppercase string value identifying an I/O translation table. See Translation Tables for a list and details.
-
A string value specifying an I/O translation table defined by an NLS locale. For example, Latin2 or CP1252. See Translation Tables for a list and details.
-
A string value specifying a user-defined I/O translation table. A named table can be defined in a locale and points to one or two translation tables. Use a named table to define a specific system-to/from-device encoding.
-
An empty string ("") specifying the use of the default process I/O translation table. (For equivalent functionality, see the $$GetPDefIO^%NLS() function of the %NLS utility.)
Four-argument Form: Input/Output String
The handle argument is a local variable that $ZCONVERT reads at the beginning of execution and writes when it completes execution. It is used to hold information between consecutive invocations of the $ZCONVERT function. It can be used for two purposes: concatenating a string to the beginning of string, and converting extremely long strings.
To concatenate a string to the beginning of string, set handle before invoking $ZCONVERT:
SET handle="the "
WRITE $ZCVT("quick brown fox","O","URL",handle),!
/* the%20quick%20brown%20fox */
WRITE $ZCVT("quick brown fox","O","URL",handle),!
/* quick%20brown%20fox */
Note that $ZCONVERT resets handle when it completes execution. In the previous example, it resets handle to the empty string.
This handle argument may be used for input conversions. Specifying a handle is useful when dealing with multibyte character sequences when working with partial sets of characters, such as a stream read. In these cases, $ZCONVERT uses the handle argument to hold a partial character sequence that may be the leading bytes of a multibyte sequence. If there are input characters left in the buffer at the end of a $ZCONVERT which do not make a complete translation unit, these leftover characters are returned in the handle. At the beginning of next $ZCONVERT, if the handle contains data, these leftover characters are prepended to the normal input data. This is particularly valuable for use in UTF8 conversions, as shown in the following example:
SET handle=""
WHILE 'stream.AtEnd() {
WRITE $ZCONVERT(stream.Read(20000),"I","UTF8",handle)
}
To convert an extremely long string, it may be necessary to perform more than one string conversions by invoking $ZCONVERT multiple times. $ZCONVERT provides the optional handle argument to hold the remaining unconverted portion of string. If you specify a handle argument, it is updated by each invocation of $ZCONVERT. When the string conversion completes, $ZCONVERT sets handle to the empty string.
SET handle=""
SET out = $ZCVT(hugestring,"O","HTML",handle)
IF handle '= "" {
SET out2 = $ZCVT(handle,"O","HTML",handle)
WRITE "Converted string is: ",out,out2 }
ELSE {
WRITE "Converted string is: ",out }
Examples
The following example returns "HELLO":
WRITE $ZCONVERT("Hello","U")
The following example returns "hello":
WRITE $ZCVT("Hello","L")
The following example returns "HELLO":
WRITE $ZCVT("Hello","T")
The following example uses the concatenate operator (_) to append and case-convert an accented character:
WRITE "TOUCH"_$CHAR(201),!, $ZCVT("TOUCH"_$CHAR(201),"L")
returns:
TOUCHÉ
touché
The following example converts the angle brackets in the string to HTML escape characters for output, returning “<TAG>”
WRITE $ZCVT("<TAG>","O","HTML")
Note that how these angle brackets display depends on the output device; try running this program here and then running it from the Terminal prompt.
The following example shows how $ZCONVERT substitutes a ? character for a translated character it cannot display. Both the UTF8 and the current process I/O translation object (trantable 0) conversions in this example display $CHAR(63), which is the actual ? character. UTF8 cannot display translated characters above $CHAR(127). Translation table 0 cannot display translated characters above $CHAR(255):
FOR i=1:1:300 {IF $ZCONVERT($CHAR(i),"I","UTF8") '= "?"
{ CONTINUE }
ELSE {WRITE "UTF8 ",i,"=",$ZCONVERT($CHAR(i),"I","UTF8")}
IF $ZCONVERT($CHAR(i),"I",0)="?"
{WRITE " trantable 0 ",i,"=",$ZCONVERT($CHAR(i),"I",0),!}
ELSE {WRITE !}
}
See Also
-
$ASCII function
-
$CHAR function
-
$ZSTRIP function
-
Pattern Match (@) operator