Collection of Raw Data
Task Force
meeting Nº 10
11 may 2005
XML Security for Foreign Trade statisticsFor information
1
11 May 2005CoRD Task Force Meeting No. 10
XML Security for Foreign Trade Statistics
Introduction
As a format for exchanging information over the Internet, XML's popularity is continuing to grow -- and one of the key issues associated with information exchange is security. No information exchange format is complete without a mechanism for ensuring the security and reliability of the information.
In order to establish a complete data exchange scenario in the field of foreign trade statistics (using for example the already standardized INSTAT/XML messages) security related concepts such as digital signature, encryption, authentication and authorization must be taken into account. Their application extends beyond the XML schema and provides us with the tools to fully exploit such a practical deployment in the framework of FT statistics.
The following steps are considered in the case of an enterprise that sends an Intrastat declaration to a collecting center and can be performed in a automatic way with the help of a web service:
Step 1. Validate the certificate of the signature to ensure the declarant identity (using the services of a TTP for example)
Step 2. Use the signature to validate the integrity of the document
Step 3. Access non-encrypted information to identify the nature of the submission without decrypting sensitive information;
Step 4. Authenticate the declarant (i.e. check that he is known)
Step 5. Authorise the submission (i.e. check that the declarant has the right to make this particular submission)
Step 6. Route the document to the application or user authorised to decrypt, validate and further process the submission.
Step 7. Exception and error codes are returned (e.g using an enriched INSRES/XML message)
Nevertheless specific amendments to the existing foreign trade statistics standard documents must be performed in order to encompass security related concepts.
- The business model part (use cases and sequence diagrams) has to be amended in order to include information security related processes (i.e. signature, encryption, authentication, authorisation, certification)
- The models (class diagrams) have to be amended in order to provide for digital signatures of declarant and counter-signatures of collecting centres.
- Sensitive message parts that require encryption have to be identified
- Message parts which must be excluded from encryption (e.g. for routing and intermediate processing) have to be identified
- Security related response codes in INSRES response messages must be included.
In the following sections we will discuss the technologies that play a crucial role in securing XML. This article focuses on the basic plumbing technologies, defining security in an XML context, XML canonicalization, and PKI infrastructure, and providing a step-by-step guide to generating keys. Moreover XML encryption and XML signature issues will be discussed.
Encryption and Digital Signatures Fundamentals
There are generally two primary requirements for sending XML data securely over the Internet: encryption to keep confidential information private; and digital signatures to provide authenticity, integrity and non-repudiation.
- encryption: W3C XML encryption standard specifies how to use XML (syntax and processing) to represent digitally encrypted Web resource (including XML itself) with arbitrary encryption algorithms.
- digital signatures: a joint effort between W3C and IETF leads to current working standards of XML Digital Signature using PKI. A key requirement is to allow XML document senders to sign just parts of an XML document while allowing other users to legitimately alter other parts of the document (e.g., a form in which the user needs to fill in with data).
The very features that make XML so powerful for business transactions (e.g., semantically rich and structured data, text-based, and Web-ready nature) provide both challenges and opportunities for the application of encryption and digital signature operations to XML-encoded data. For example, in many workflow scenarios where an XML document flows stepwise between participants, and where a digital signature implies some sort of commitment or assertion, each participant may wish to sign only that portion for which they are responsible and assume a concomitant level of liability. Older standards for digital signatures provide neither syntax for capturing this sort of high-granularity signature nor mechanisms for expressing which portion a principal wishes to sign.
Two new security initiatives designed to both account for and take advantage of the special nature of XML data are XML Signature and XML Encryption. Both are currently progressing through the standardization process. XML Signature is a joint effort between the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF), and XML Encryption is solely W3C effort. This article presents a brief introduction to the XML Signature specification and the underlying cryptographic concepts.
What does security mean?
The term security encompasses all of the following:
- Confidentiality ensures that only the intended recipient reads the intended part of the XML. To ensure confidentiality, it's important to encrypt the XML -- XML Encryption is the standard for doing this.
- Integrity ensures that the XML is not altered during transit from the source to its final destination. The XML Signature standard allows the sender to attach a digital signature to the content whose integrity is to be ensured.
- Authenticity is the ability to ensure that the XML was actually sent by the person who claims to have sent it. XML signatures sent along with content help ensure the identity of the sender. The recipient of the payload can validate the digital signature using the public key of the sender. If the digital signatures are valid, then the identity is confirmed; otherwise it isn't.
- Non-repudiation is used to ensure that a sender cannot deny sending the XML. An XML signature generated from the private key of the sender and affixed to the XML ensures the non-refutability of the sender.
What is XML canonicalization?
One can create XML documents that appear to be different, but have identical data or identical semantical value. Differences may lie in entity structure, attribute ordering, character encoding, or insignificant whitespace. Because of such physical differences, equivalence testing cannot be done at byte level for arbitrary XML documents. Herein lies the problem: Digital signatures rely on byte-level equivalence, whereas it is possible to have two XML documents that are logically the same, but contain different byte sequences.
Hence, if one digitally signs some XML markup and then try to verify the digital signature after modifying the order of some attributes -- or adding or removing some insignificant whitespace without logically changing the XML -- the verification will fail. To ensure that you get success every time you try to verify the digital signatures of logically equivalent XML -- irrespective of its physical representation -- you must make sure that the XML is in an agreed-upon standard format. That standard is called canonicalization and it is a standard mechanism for serializing XML. Canonicalization comes in two forms:
- Normal canonicalization: When a sub-part of the XML is serialized, the ancestor element's context and all namespace declarations and attributes in the xmlns namespace are included.
- Exclusive canonicalization: When a sub-part of the XML is serialized, the ancestor element's context is not included.
So which one should be used, normal or exclusive canonicalization? Consider this: With digital signatures, the digitally signed payload may have to be inserted into a different context after it is removed from its original message. If normal canonicalization is used, the payload will include the context of its original message's ancestor elements, and all namespace declarations and attributes in the xmlns namespace. The payload, thus extracted from the original message, may not be inserted faithfully into a different context.
- Exclusive canonicalization is required for digital signatures in which the ancestor element's context, attribute, and declaration of the xmlns namespace are excluded, thus making the digitally signed payload portable to different contexts.
- APIs that do digital signatures manage canonicalization in the background; you do not have to do anything extra for canonicalization while digitally signing the XML.
I have covered enough of canonicalization to help you understand its importance and how it fits into the basic framework of the XML security infrastructure. The next topic focuses on another important component of the XML security infrastructure -- PKI.
PKI basics
Public Key Infrastructure, or PKI, helps make technologies like digital signature and XML encryption generally available to the public at large. At the heart of digital signature and XML encryption are keys. Keys are used for digitally signing documents and verifying signatures, and they help with the encryption and decryption processes. PKI is entrusted with managing everything related to the creation, manipulation, and management of these keys.
PKI ensures the following:
- Trusted and efficient management of public and private keys
- Any time you use a public key, you can be sure that the associated private key is indeed owned by the subject whose private key you are using
Keys are a critical element of PKI. Different types of keys can be generated using specially designed algorithms.
Public-private key combination is at the heart of Public Key Infrastructure (PKI), and is based on asymmetric cypher. Asymmetric keys can be created and can be used in order to encrypt and digitally sign XML data.
In encryption and digital signatures, everything ultimately boils down to the cryptography algorithms used to generate keys.
XML encryption
The primary objectives of XML encryption are:
- Support the encryption of any arbitrary digital content, including XML documents
- Ensure that the encrypted data, whether it's in transit or in storage, cannot be accessed by unauthorized persons
- Maintain the security of the data even beyond one message hop -- meaning, the security of the data is persisted not only when the data is being transferred (which is what SSL guarantees), but also when the data is at rest at a particular node
- Represent the encrypted data in XML form
- Make it possible for portions of the XML to be selectively encrypted
Compare this with what SSL over HTTP (also known as HTTPS) has to offer. Using SSL over HTTP, the entire message gets encrypted; the whole message is then decrypted at the first destination and is open for snooping before it is encrypted again as a whole for the second hop. The encryption offered by SSL over HTTP only exists for the duration of transit and is not persistent.
One of the defined objectives clearly states that encrypted XML data should be represented in XML form. In the resulting XML, two important elements are worth understanding: <EncryptedData> and <EncryptedKey>. <EncryptedData> contains all of the encrypted content other than the encryption key. When the key is encrypted, the resulting content is placed inside the <EncryptedKey> element.
In addition to the encrypted content, XML encryption allows you to specify the algorithm used for encryption or the encryption key used as part of the two elements discussed above. This means you don't have to keep track of them separately for later reference, or send them to the receiving parties through some other transport mechanism.
Note: XML encryption does not define any new algorithms, but instead uses existing ones.
XML encryption example
The core element in the XML encryption syntax is the EncryptedData element which, with the EncryptedKey element, is used to transport encryption keys from the originator to a known recipient, and derives from the EncryptedType abstract type. Data to be encrypted can be arbitrary data, an XML document, an XML element, or XML element content; the result of encrypting data is an XML encryption element that contains or references the cipher data. When an element or element content is encrypted, the EncryptedData element replaces the element or content in the encrypted version of the XML document. When it's arbitrary data that is being encrypted, the EncryptedData element may become the root of a new XML document or it may become a child element. When an entire XML document is encrypted, then the EncryptedData element may become the root of a new document. Further, EncryptedData cannot be the parent or child of another EncryptedData element, but the actual data encrypted can be anything including existing EncryptedData or EncryptedKey elements.
The encryption working draft gives examples of how the granularity of encryption may differ according to different requirements and what the consequences might be. The code fragment in Listing 1 shows an unencrypted XML document with credit card and other personal information. In some cases (for example, concealing information on payment mechanisms) it may be desirable to encrypt everything other than the customer name, and the code fragment in Listing 2 shows how this can be done.
Code Listing 1. Information on John Smith showing his bank, limit of $5,000, card number, and expiration date
<?xml version='1.0'?>
<PaymentInfo xmlns='
<Name>John Smith<Name/>
<CreditCard Limit='5,000' Currency='USD'>
<Number>4019 2445 0277 5567</Number>
<Issuer>Bank of the Internet</Issuer>
<Expiration>04/02</Expiration>
</CreditCard>
</PaymentInfo>
Code Listing 2. Encrypted document where everything other than the name is encrypted
<?xml version='1.0'?>
<PaymentInfo xmlns='
<Name>John Smith<Name/>
<EncryptedData Type='
xmlns='
<CipherData<CipherValue>A23B45C56</CipherValue</CipherData>
</EncryptedData>
</PaymentInfo>
XML Signatures
XML signatures are digital signatures designed for use in XML transactions. The standard defines a schema for capturing the result of a digital signature operation applied to arbitrary (but often XML) data. Like non-XML-aware digital signatures (e.g., PKCS), XML signatures add authentication, data integrity, and support for non-repudiation to the data that they sign. However, unlike non-XML digital signature standards, XML signature has been designed to both account for and take advantage of the Internet and XML.
A fundamental feature of XML Signature is the ability to sign only specific portions of the XML tree rather than the complete document. This will be relevant when a single XML document may have a long history in which the different components are authored at different times by different parties, each signing only those elements relevant to itself. This flexibility will also be critical in situations where it is important to ensure the integrity of certain portions of an XML document, while leaving open the possibility for other portions of the document to change. Consider, for example, a signed XML form delivered to a user for completion. If the signature were over the full XML form, any change by the user to the default form values would invalidate the original signature.
An XML signature can sign more than one type of resource. For example, a single XML signature might cover character-encoded data (HTML), binary-encoded data (a JPG), XML-encoded data, and a specific section of an XML file.
Signature validation requires that the data object that was signed be accessible. The XML signature itself will generally indicate the location of the original signed object. This reference can
- be referenced by a URI within the XML signature;
- reside within the same resource as the XML signature (the signature is a sibling);
- be embedded within the XML signature (the signature is the parent);
- have its XML signature embedded within itself (the signature is the child).
XML signature components
The main XML signature components are depicted in the following figure.
XML signature example
XML signatures can be applied to any arbitrary data content. Those that are applied to data within the same XML document as the signature are termed enveloping or enveloped signatures while those in which the data is external to the signature element are termed detached signatures. The following listing, taken from the signature candidate recommendation document, is an instance of a simple detached signature.
Code Listing 3. Example of a simple detached signature
[s01] <Signature Id="MyFirstSignature" xmlns="
[s02] <SignedInfo>
[s03] <CanonicalizationMethod Algorithm="
REC-xml-c14n-20010315"/>
[s04] <SignatureMethod Algorithm="
[s05] <Reference URI="
[s06] <Transforms>
[s07] <Transform Algorithm="
20010315"/>
[s08] </Transforms>
[s09] <DigestMethod Algorithm="
[s10] <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
[s11] </Reference>
[s12] </SignedInfo>
[s13] <SignatureValue>MC0CFFrVLtRlk=...</SignatureValue>
[s14] <KeyInfo>
[s15a] <KeyValue>
[s15b] <DSAKeyValue>
[s15c] <p>...</p<Q>...</Q<G>...</G<Y>...</Y>
[s15d] </DSAKeyValue>
[s15e] </KeyValue>
[s16] </KeyInfo>
[s17] </Signature>
The information that is actually signed is that between lines s02 and s12, the SignedInfo element. Reference to the algorithms used in calculating the SignatureValue element is included within the signed section while that element itself is outside the signed section, on line s13. The SignatureMethod reference on line s04 is to the algorithm used to convert
the canonicalized SignedInfo into the SignatureValue. It's a combination of a key-dependent algorithm and a digest algorithm, here DSA and SHA-1, possibly with other manipulation such as padding. The KeyInfo element (here lines s14 to s16 -- this element is optional) indicates the key that's used to validate the signature.