Db2 for Z/OS CCSID Overview

Db2 for Z/OS CCSID Overview

1

Db2 for z/OS CCSID Overview

Db2 for z/OS leverages the z/OS operating system’s Conversion Services facilities to enable Db2 support of character data stored in various encoding schemes and code pages. These z/OS services enable Db2 for z/OS to work with (i.e. receive as input, process, store, and retrieve) character data represented by the various code pages in the encoding schemes EBCDIC, ASCII, and UNICODE.

This paper summarizes, at a moderately technical level, the ways and methods in which Db2 for z/OS can be configured and employed to manage character data from a disparate set of encoding schemes and code pages. It also discusses the ways in which Db2 for z/OS leverages z/OS Conversion Services for the purposes of managing and manipulating character data.

For the remainder of this discussion, the term Db2 will mean Db2 for z/OS (as opposed to Db2 LUW, etc.). This discussion is exclusively for Db2 for z/OS, where the term Db2 is concerned and used. References to z/OS are exclusively for those services and programs executing in a z/OS operating system.

This discussion pertains only to character data, never to numeric data or binary (BIT) data. For this purpose, one should note the difference between the character ‘1’ and the number 1. ‘1’ is a character data type, 1 is a numeric data type. This discussion only pertains to character data types. Numeric and binary (BIT) data in Db2 are not a part of this discussion.

DISCLAIMER: The following discussion is a personal interpretation of the IBM provided documentation on these topics. Because interpretations of technical documentation may vary from person to person, and are therefore subject to scrutiny, the user should always rely on IBM documentation for the definitive and authoritative explanation of each topic.

This paper is as of Db2 12 for z/OS and z/OS 2.2, and was written on Oct. 12, 2017 by:

Kurt Bohnert

Manager, Db2Systems Programming

Rocket Software, Inc.

In computing, an “encoding scheme” can be defined as a set of specific definitions and rules that describe a philosophy used to represent character data. One may think of an encoding scheme as simply a set of rules used for representing character data.

Some examples of encoding scheme rulesmight include (but are not limited to) the following: The number of bits used for each character, the number of bytes used for each character, the allowable ranges of bytes, maximum number of characters, … there are many other possible rules that might apply in an encoding scheme. The rules of an encoding scheme govern the way in which characters in the encoding scheme are representedin computing.

Db2 for z/OS is enabled to work with (manage) data governed by any one of the following three encoding schemes: EBCDIC, ASCII, and UNICODE.

Within each encoding scheme, there can be many “code pages”. A code page consists of a table of values that describes the character set for a given language. Each code page in an encoding scheme must conform to the specific rules for that encoding scheme. The various code pages in an encoding scheme are different however, in that they may represent different languages (as one example), yet they all conform to the rules of the encoding scheme in which they reside. There will be examples of this later in this paper.

IBM Db2 employs an IBM concept called a “CCSID”. CCSID stands for Coded Character Set ID. A CCSID identifies an encoding scheme and a code page for that encoding scheme. Although a CCSID is technically a more precise definition of a code page, one may think of a CCSID and a code page as synonymous for the remainder of this paper. In those cases where it is important to distinguish between the two, the distinction will be made.

Db2assigns a CCSID to all character data it manages. All character data passed to Db2 from an application/agent is assigned a CCSID. The character data is said to be “tagged” with the CCSID. The CCSID is stored with the character data if the data is stored in a Db2 data base. It becomes part of the “metadata” for that character data. Metadata is “data about data”. Metadata contains information about the data being managed. CCSID is always a part of the metadata for character data stored in Db2 and manipulated by Db2.

CCSID’s in Db2are grouped in three’s … referred to in this discussion as “triplets”. One of the three CCSID’s in a triplet is the Single-Byte-Character-Set (SBCS), one of the three CCSID’s is the Double-Byte-Character-Set (DBCS), and one of the three CCSID’s is the MIXED character set. This will be discussed in greater detail below. The important point to note for now is that Db2 CCSID’s come in three’s … in this discussion referred to as triplets.

So CCSID’s are Db2’s implementation of character data management within theDb2 DBMS (Data Base Management System). From this point forward in the paper, we are only concerned with how, when, and where Db2 employs CCSID’s in support of character data interpretation, manipulation, and storage.

Db2 recognizes many CCSID tripletsfor both the EBCDIC and ASCII encoding schemes. Again, a triplet is three CCSID’s … a SBCS CCSID, a DBCS CCSID, and a MIXED CCSID. Db2 only recognizes one CCSID triplet for UNICODE however, because UNICODE (unlike EBCDIC and ASCII) only contains one triplet. As will be seen later in this paper, UNICODE is an industry wide attempt to standardize and unify all character schemes (i.e. all the various characters) from all over the computing world, into a single triplet. UNICODE can be thought of as a single code page (triplet) containing all characters from all other encoding schemes and code pages. This will be discussed in greater detail below.

It should be noted here that (as previously stated) a CCSID is a more precise definition of a code page. This means (among other things) that there can be a one-to-many relationship between code page and CCSID.

Here is an example of two CCSID’s in the same code page within the EBCDIC encoding scheme. EBCDIC code page 37 has (at least) two CCSID’s. They are CCSID 0037 and CCSID 1140. The only difference between the two CCSID’s is at code point 9F in the two CCSID tables. In CCSID 0037 you will find the "¤" (currency sign) character at code point 9F, but in CCSID 1140 at code point 9F you will find the "" (Euro sign) character. CCSID 0037 mightbe considered a standard SBCS CCSID in North America, and CCSID 1140 a standard SBCS CCSID in Europe.

On the next page is a copy of the CCSID table for CCSID 0037. It was cut and paste off the internet. You can find many such examples of CCSID’s and code pages on the internet. In fact, you can likelyfind the table for every CCSID and code page ever constructed, on the internet.

Note in the example below that the code point 9F contains the "¤" (currency sign) character. If one looked at CCSID 1140, one would see that the only difference between the two CCSID’s is at this code point (9F), where CCSID 1140 contains the "€" (Euro sign) character.
CCSID 1140 is the Euro currency update of CCSID 0037. In CCSID 1140, the "¤" (currency sign) character at code point 9F is replaced with the "€" (Euro sign) character.

See code point 9F in red, in the table below.

CCSID 0037
_0 / _1 / _2 / _3 / _4 / _5 / _6 / _7 / _8 / _9 / _A / _B / _C / _D / _E / _F
0_
/ NUL
0000
0 / SOH
0001
1 / STX
0002
2 / ETX
0003
3 / SEL
009C
4 / HT
0009
5 / RNL
0086
6 / DEL
007F
7 / GE
0097
8 / SPS
008D
9 / RPT
008E
10 / VT
000B
11 / FF
000C
12 / CR
000D
13 / SO
000E
14 / SI
000F
15
1_
/ DLE
0010
16 / DC1
0011
17 / DC2
0012
18 / DC3
0013
19 / RES ENP
009D
20 / NL
0085
21 / BS
0008
22 / POC
0087
23 / CAN
0018
24 / EM
0019
25 / UBS
0092
26 / CU1
008F
27 / IFS
001C
28 / IGS
001D
29 / IRS
001E
30 / IUS ITB
001F
31
2_
/ DS
0080
32 / SOS
0081
33 / FS
0082
34 / WUS
0083
35 / BYP INP
0084
36 / LF
000A
37 / ETB
0017
38 / ESC
001B
39 / SA
0088
40 / SFE
0089
41 / SM SW
008A
42 / CSP
008B
43 / MFA
008C
44 / ENQ
0005
45 / ACK
0006
46 / BEL
0007
47
3_
/
0090
48 /
0091
49 / SYN
0016
50 / IR
0093
51 / PP
0094
52 / TRN
0095
53 / NBS
0096
54 / EOT
0004
55 / SBS
0098
56 / IT
0099
57 / RFF
009A
58 / CU3
009B
59 / DC4
0014
60 / NAK
0015
61 /
009E
62 / SUB
001A
63
4_
/ SP
0020
64 / RSP
00A0
65 / â
00E2
66 / ä
00E4
67 / à
00E0
68 / á
00E1
69 / ã
00E3
70 / å
00E5
71 / ç
00E7
72 / ñ
00F1
73 / ¢
00A2
74 / .
002E
75 /
003C
76 / (
0028
77 / +
002B
78 / |
007C
79
5_
/
0026
80 / é
00E9
81 / ê
00EA
82 / ë
00EB
83 / è
00E8
84 / í
00ED
85 / î
00EE
86 / ï
00EF
87 / ì
00EC
88 / ß
00DF
89 / !
0021
90 / $
0024
91 / *
002A
92 / )
0029
93 / ;
003B
94 / ¬
00AC
95
6_
/ -
002D
96 / /
002F
97 / Â
00C2
98 / Ä
00C4
99 / À
00C0
100 / Á
00C1
101 / Ã
00C3
102 / Å
00C5
103 / Ç
00C7
104 / Ñ
00D1
105 / ¦
00A6
106 / ,
002C
107 / %
0025
108 / _
005F
109 /
003E
110 / ?
003F
111
7_
/ ø
00F8
112 / É
00C9
113 / Ê
00CA
114 / Ë
00CB
115 / È
00C8
116 / Í
00CD
117 / Î
00CE
118 / Ï
00CF
119 / Ì
00CC
120 / `
0060
121 / :
003A
122 / #
0023
123 / @
0040
124 / '
0027
125 / =
003D
126 / "
0022
127
8_
/ Ø
00D8
128 / a
0061
129 / b
0062
130 / c
0063
131 / d
0064
132 / e
0065
133 / f
0066
134 / g
0067
135 / h
0068
136 / i
0069
137 / «
00AB
138 / »
00BB
139 / ð
00F0
140 / ý
00FD
141 / þ
00FE
142 / ±
00B1
143
9_
/ °
00B0
144 / j
006A
145 / k
006B
146 / l
006C
147 / m
006D
148 / n
006E
149 / o
006F
150 / p
0070
151 / q
0071
152 / r
0072
153 / ª
00AA
154 / º
00BA
155 / æ
00E6
156 / ¸
00B8
157 / Æ
00C6
158 / ¤
00A4
159
A_
/ µ
00B5
160 / ~
007E
161 / s
0073
162 / t
0074
163 / u
0075
164 / v
0076
165 / w
0077
166 / x
0078
167 / y
0079
168 / z
007A
169 / ¡
00A1
170 / ¿
00BF
171 / Ð
00D0
172 / Ý
00DD
173 / Þ
00DE
174 / ®
00AE
175
B_
/ ^
005E
176 / £
00A3
177 / ¥
00A5
178 / ·
00B7
179 / ©
00A9
180 / §
00A7
181 / ¶
00B6
182 / ¼
00BC
183 / ½
00BD
184 / ¾
00BE
185 / [
005B
186 / ]
005D
187 / ¯
00AF
188 / ¨
00A8
189 / ´
00B4
190 / ×
00D7
191
C_
/ {
007B
192 / A
0041
193 / B
0042
194 / C
0043
195 / D
0044
196 / E
0045
197 / F
0046
198 / G
0047
199 / H
0048
200 / I
0049
201 / SHY
00AD
202 / ô
00F4
203 / ö
00F6
204 / ò
00F2
205 / ó
00F3
206 / õ
00F5
207
D_
/ }
007D
208 / J
004A
209 / K
004B
210 / L
004C
211 / M
004D
212 / N
004E
213 / O
004F
214 / P
0050
215 / Q
0051
216 / R
0052
217 / ¹
00B9
218 / û
00FB
219 / ü
00FC
220 / ù
00F9
221 / ú
00FA
222 / ÿ
00FF
223
E_
/ \
005C
224 / ÷
00F7
225 / S
0053
226 / T
0054
227 / U
0055
228 / V
0056
229 / W
0057
230 / X
0058
231 / Y
0059
232 / Z
005A
233 / ²
00B2
234 / Ô
00D4
235 / Ö
00D6
236 / Ò
00D2
237 / Ó
00D3
238 / Õ
00D5
239
F_
/ 0
0030
240 / 1
0031
241 / 2
0032
242 / 3
0033
243 / 4
0034
244 / 5
0035
245 / 6
0036
246 / 7
0037
247 / 8
0038
248 / 9
0039
249 / ³
00B3
250 / Û
00DB
251 / Ü
00DC
252 / Ù
00D9
253 / Ú
00DA
254 / EO
009F
255
_0 / _1 / _2 / _3 / _4 / _5 / _6 / _7 / _8 / _9 / _A / _B / _C / _D / _E / _F

Now that we have defined many of the pertinent terms, it is time to discuss how, when, and whereDb2 employs CCSID’s in support of character data interpretation, manipulation, and storage. The important point to keep in mind during this discussion is this: Every character string used in aDb2SQL operation, and every character string manipulated and/or stored by Db2 in any way, is tagged with a CCSID.

Recall that CCSID’s in Db2 come in three’s … there are three (3) CCSID’s associated together into a related group (a triplet):

  • SBCSSingle Byte Character Set CCSID
  • DBCSDouble Byte Character Set (also called GRAPHIC) CCSID
  • MIXEDContains both SBCS data and DBCS data. SBCS and DBCS

are both always a subset of MIXED for their triplet

All CCSID triplets reside within one of the Encoding Schemes supported by Db2 for z/OS: EBCDIC, ASCII, or UNICODE.

When a Db2 subsystem is initially configured (built), there is an Application Programming Defaults load module created named DSNHDECP. This load module contains several application default parameters which are directly associated with this discussion. Here is a samplelist of all parameters defined in DSNHDECP for Db2V11. The parameters of interest in this discussion are in bold blue text:

ASCCSID=1115,

AMCCSID=1381,

AGCCSID=1380,

SCCSID=836,

MCCSID=935,

GCCSID=837,

USCCSID=367,

UMCCSID=1208,

UGCCSID=1200,

ENSCHEME=EBCDIC,

APPENSCH=EBCDIC,

DATE=USA,

DATELEN=0,

DECARTH=DEC15,

DECIMAL=COMMA,

DEF_DECFLOAT_ROUND_MODE=ROUND_HALF_EVEN,

DEFLANG=IBMCOB,

DELIM=DEFAULT,

IMPLICIT_TIMEZONE=CURRENT,

MIXED=YES,

NEWFUN=V11,

PADNTSTR=YES,

SQLDELI=DEFAULT,

DSQLDELI=APOST,

SSID=QB1M,

STDSQL=NO,

TIME=JIS,

TIMELEN=0,

DYNRULS=YES,

LC_CTYPE=

A default CCSID triplet is assigned for each of the three encoding schemes employed by Db2 (EBCDIC, ASCII, and UNICODE) in DSNHDECP (above) at subsystem installation. In our example above the defined defaults are:

ASCCSID=1115,

AMCCSID=1381,

AGCCSID=1380,

SCCSID=836,

MCCSID=935,

GCCSID=837,

USCCSID=367,

UMCCSID=1208,

UGCCSID=1200,

Along with these default CCSID triplets, three other related parameters are assigned in DSNHDECP:

ENSCHEME=EBCDIC,

APPENSCH=EBCDIC,

MIXED=YES,

Here is a summary for each of the DSNHDECP parameters of interest in this discussion (in bold blue text above):

ASCCSIDThe ASCII SBCS default CCSID

AMCCSIDThe ASCII MIXED default CCSID

AGCCSIDThe ASCII DBCS default CCSID

SCCSIDThe EBCDIC SBCS default CCSID

MCCSIDThe EBCDIC MIXED default CCSID

GCCSIDThe EBCDIC DBCS default CCSID

USCCSIDThe UNICODE SBCS default CCSID (always 367 in Db2)

UMCCSIDThe UNICODE MIXED default CCSID(always 1208 in Db2)

UGCCSIDThe UNICODE DBCS default CCSID (always 1200 in Db2)

ENSCHEMEThis is the default Encoding Scheme if not specified with the

CCSID parameter in the Db2 SQL CREATE statement.

APPENSCHThis is the default Encoding Scheme (or CCSID)if not specified

with the ENCODING parameter in the Db2BIND command.

MIXEDIs mixed character data (SBCS and DBCS together) allowed (YES|NO)?

This isdiscussed in greater detail below.

There are a variety of CCSID triples supported by Db2 for z/OS for both EBCDIC and ASCII. Tables of valid (supported) EBCDIC and ASCII CCSID triplets can be found in the Db2for z/OS Installation Guide. If you look at these tables, you will see that in our sample DSNHDECP above the defaults selected were for the language named “Simplified Chinese”:

Table 154. EBCDIC double-byte coded character set identifiers (CCSIDs)

National languageMCCSID SCCSID GCCSID

Simplified Chinese935 836 837

Table 155. ASCII double-byte coded character set identifiers (CCSIDs)

National language MCCSID SCCSID GCCSID

Simplified Chinese1381 1115 1380

The UNICODE triplet is always as follows in Db2 for z/OS:

USCCSID=367,

UMCCSID=1208,

UGCCSID=1200,

For each triplet (regardless of encoding scheme):

  • The SBCS CCSID (SCCSID, ASCCSID, and USCCSID) is a 1 byte CCSID (i.e. each character in the CCSID is a 1-byte character).
  • The DBCS CCSID (GCCSID, AGCCSID, and UGCCSID) is a 2-4 byte CCSID (i.e. each character in the CCSID is a 2-to-4 bytes character). Note that this CCSID is often referred to as the “GRAPHIC” CCSID in the triplet.
  • The MIXED CCSID (MCCSID, AMCCSID, and UMCCSID) is a 1-4 byte CCSID, since it contains all associated SBCS CCSID characters and all associated DBCS CCSID characters in the triplet.

A good practice is to set the EBCDIC triplet to the “green screen” 3270 emulator settings employed by the users of this Db2 subsystem.

A good practice is to set the ASCII triplet to the most commonly used workstation client settings for the users of this Db2 subsystem.

Note that in the above sample, MIXED=YES. The value of MIXED (YES|NO) indicates how the EBCDIC CCSID and ASCII CCSID fields are to be interpreted by Db2.

If MIXED=NO, then the EBCDIC (GCCSID and MCCSID) and ASCII (AGCCSID and AMCCSID) DBCS and MIXED CCSID’s are set to the default 65534, which means NO CCSID. This is because double-byte character data is not allowed for EBCDIC and ASCII character data in this Db2 subsystem, if MIXED=NO.

If MIXED=YES, then double byte character data is allowed for EBCDIC and ASCII character data in this Db2 subsystem. The EBCDIC (GCCSID and MCCSID) and ASCII (AGCCSID and AMCCSID) DBCS and MIXED CCSID’s are set to the appropriate CCSID’s to complete the requested triplet (the matching CCSID’s for the SBCS CCSID), as in our sample above.

Note: Do not confuse CCSID 65534 versus 65535:

CCSID 65534 means NO CCSID (used when MIXED = NO).

CCSID 65535 means NO CONVERSION (used to tag “FOR BIT DATA”data types … see

below for more on this).

UNICODE is always considered to be MIXED=YES, and as previously stated, is always set to USCCSID=367, UMCCSID=1208, andUGCCSID=1200in Db2. This means that regardless of the value for MIXED in DSNHDECP, UNICODE character data can always contain single-byte and double-byte data.

Side note: NEVER change the values for either MIXED or any CCSID in the DSNHDECP of an existing Db2 subsystem, without first consulting with IBM!!! Per IBM, once these values have been specified, they should not be changed without IBM consultation.

When configuring a Db2 subsystem, we are primarily interested in three (3) issues pertaining to the setting of the DSNHDECP parameters listed above:

  1. What CCSID is used to tag character data for storage within a Db2 table?
  2. What CCSID is used to interpret incoming character data to Db2?
  3. What CCSID is used to interpret incoming SQL during Db2 precompile (PC)?

Below is a discussion on each of the three issues of interest listed above.

  1. What CCSID is used to tag character data for storage within a Db2 table (i.e. how is character data “tagged” for storage in Db2)?

Note that in the explanation below, character data is said to have one of the character (e.g. CHAR) data types in Db2.

FIRST: Identify the ENCODING SCHEME that was assigned to the TABLEin the SQL CREATE statements used to create the DATABASE.TABLESPACE.TABLE (DB.TS.TB):

IF CREATE TABLE … includeskeywordCCSID UNICODE|EBCDIC|ASCII

THEN … it must match the value assigned at CREATE TS (below).

ELSE … it defaults to the value assigned at CREATE TS (below).

IF CREATE TABLESPACE … includes keywordCCSID UNICODE|EBCDIC|ASCII

THEN … it is assigned to the TS.

ELSE … it defaults to the value assigned at CREATE DB (below).

IF CREATE DATABASE … includes keywordCCSID UNICODE|EBCDIC|ASCII

THEN … it is the default if CREATE TS does not assign CCSID (above).

ELSE … it defaults to the value found in DSNHDECP parameterENSCHEME.

Note that the CCSID parameter is always optional in the three CREATE statements above, and defaults based on the rules above.

SECOND: Identify theCCSID to use (based on the encoding scheme identified above), the MIXED value inDSNHDECP, and the data type (SBCS, DBCS, MIXED) of the character column:

IF MIXED = NO

IF “FOR MIXED DATA” data type THEN the DB.TS must be UNICODE (CCSID UNICODE in CREATE TS) else you get an errorin the CREATE TABLE. The character data in this column is tagged with UNICODE MIXED CCSID1208.

IF “GRAPHIC” data type THEN the DB.TS must be UNICODE (CCSID UNICODE in CREATE TS) else you get an error in the CREATE TABLE. The character data in this column is tagged with UNICODE DBCS CCSID 1200.

ELSE (not “FOR MIXED DATA” and not “GRAPHIC”) THEN the character data in this column is tagged with theSBCS CCSID of the DB.TSabove.

IF MIXED = YES

IF “FOR SBCS DATA” data type THEN the character data in this column is tagged with the SBCS CCSID of the DB.TSabove.

IF “GRAPHIC” data type THEN the character data in this column is tagged with the DBCS CCSIDof the DB.TSabove.

ELSE (not “FOR SBCS DATA” and not “GRAPHIC”) THEN the character data in this column is tagged with the MIXED CCSID of the DB.TSidentified above.

  1. What CCSID is used to interpret incoming character data to Db2 (e.g. character data passed to Db2 from a PLAN/PACKAGE)?

IF SQL is STATIC THEN

IF PLAN/PKG bound with ENCODING UNICODE|EBCDIC|ASCII|ccsid … THEN use that CCSID or the triplet associated with that encodingscheme to

interpret incoming character data from the PLAN/PKG.

ELSE use the triplet associated with the default encoding scheme in

APPENSCH to interpret incoming character data from the PLAN/PKG.

ELSE (SQL is DYN) use the triplet associated with the encoding scheme specified in the special register DEFAULT APPLICATION ENCODING SCHEME to interpret incoming character data from the PLAN/PACKAGE. The default value for this special register is the BIND option ENCODING (or DSNHDECP parameter APPENSCH if ENCODING not specified in the BIND … see above).

The application sending the character data to Db2 can override the above defaults and explicitly specify the CCSID to be used by Db2 to interpretincoming character data from the application, in one of several ways:

  • EXEC SQLSET CURRENT APPLICATION ENCODING SCHEME = ‘…’;
  • Any host variable (:hv) in SQL can override the above for CCSID via:

EXEC SQLDECLARE :hv VARIABLECCSID UNICODE|EBCDIC|ASCII|ccsid;

  • There is also a way to do it in SQLDA … see IBM doc for details.
  1. What CCSID is used to interpret incoming SQL during Db2 precompile (PC)?
  • There is a CCSID keyword in the PC: This PC parm tells the PC how to interpret incoming SQL for PC. It must be an EBCDIC CCSID!!

The default (if not specified) is the EBCDIC triplet in DSNHDECP.