Using XML for Tax Transactions

White Paper

Abstract

Tax XML is a sample schema for tax data. The schema is a model to be used by standards organizations, industry partners, and government agencies in their investigation of the use of XML in a tax setting. This schema is further intended to provide thought leadership and the first steps toward creating an open standard for tax transactions.

The Tax XML Schema is integral because it defines the data to be stored and processed by applications and the supporting infrastructure. Collaboration on an open standard will result in the specification necessary for competition on implementation.

This paper is directed to those involved in fostering the use of XML who would like to see it applied to the field of taxation. It may also provide insight to those investigating the possibility of using XML in other business applications that are highly complex with a large number of distinct data items.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Other product and company names mentioned herein may be the trademarks of their respective owners.

Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA

0x99

Introduction

Overview

Using XML for tax transactions is a strategy whose time has come! Already the United Kingdom’s Inland Revenue uses an XML schema for the filing of individual returns – and has done so since 1999.

Extensible Markup Language (XML) is a technology that promises to free business and tax data from application infrastructure. The data-centric approach of XML allows the communication of data regardless of the platform, operating system, or underlying technology of existing systems.

The keystone to the use of XML for tax is the Tax XML Schema that describes the specific tax data items and their relationship to each other. The long-term goal of this schema is to define an international standard with which all tax transactions would comply. The current thinking is that the schema would be comprised of namespaces owned and maintained by governmental agencies and merged into a single schema. A standards organization similar to the World Wide Web Consortium (W3C) would be established to provide hosting services, version control, and guidance to the tax administrations around the world for the master schema.

There is a need for urgency associated with the completion of the Tax XML Schema. As mentioned above, the UK is currently using this methodology. Other national governments around the world are planning the creation of e-government systems that use XML. State and local tax administrations within the US also have development projects underway. With no standard in place, however, each of these groups will develop divergent schemas. This will complicate the communication of information between groups, and will hamper implementation efforts by requiring customization of data storage and handling at each site. To avoid such a proliferation of approaches a standard should be pursued swiftly.

Current Scope of Tax XML

The current scope of Tax XML is to present a small sample of a schema based on US taxes in order to provide a starting point for collaborative work. The current work that has been done focuses on three areas: Federal individual, Federal corporate, and California individual. For detail of data included in this work, see the Taxonomy section below. This work has been done for modeling purposes, rather than as a specific example of the content to be included.

Also within the scope of this project is the investigation of the best type of schema to use. Several approaches are explored in this paper including:

· The first approach is to use a schema comprised of elements distinctly named to represent the data contained in each element. This approach is exemplified by the UK schema and is consistent with the schema promulgated by the W3C.

· A second approach is demonstrated by the eXtensible Business Reporting Language (XBRL) organization in a schema used for financial reporting. This strategy uses a very limited number of elements and includes the identification of the data through attributes instead.

· Another approach investigated but not displayed focuses on an EDI-related schema that uses separate elements to identify the data.

Background

Consider the process of an individual preparing to file his federal income tax return for the year. Tax information must first be gathered from multiple sources. Employers must provide Form W-2 with wage information to the taxpayer and to the tax agency. Financial institutions must send Form 1099-INT with interest income, again both to the taxpayer and the tax agency. Another form contains mortgage interest expense, and once more is required to be sent to both places. Still other information is required from other payers, local governments, and state agencies.

Then this information must be accurately entered from these documents into the return, so that it matches the data sent by the same source to the tax agency. If the taxpayer plans to file electronically, extra data must be entered to ensure correct matching of these documents. For example, the taxpayer must re-enter his name and address for each Form W-2 exactly as it appears on the paper W-2 – even if it is incorrect! The same requirement also exists for Form 1099-R and Form W-2G.

The taxpayer must then send the tax return to the IRS, either as a printed return or electronically, using the IRS’s proprietary system. Because of the format of the forms, much of the data is included in the return more than once. For example, the total of itemized deductions appears on both Schedule A and Form 1040. Another exampled mentioned above is that the taxpayer’s name and address may be included in the return multiple times. These redundancies increase the data storage requirements at the IRS and make the data vulnerable to inconsistency and error. If the taxpayer resides in a state or city with its own income tax, the information must be sent to there as well – often with a copy of the same information sent to the IRS.

Even within a tax agency, data that was received electronically from the taxpayer may be manually entered into a separate application due to incompatibilities of applications and platforms. Once stored, the information is usually difficult to access across various legacy systems of the agency.

Even just modernizing systems within a tax agency will not fully solve these issues. Replacing legacy systems with fully integrated systems may enable effective data transfer within the agency, but it does not address the redundancies within the data itself nor the issues of communicating data among the taxpayer, employer, financial institution, and tax administration.

The Tax XML Schema provides a solution to these issues. Standardization of data so that it can be communicated electronically without ambiguity will change the processes of tax preparation. Employers will be able to provide data electronically in a format that will be readable by taxpayers, income tax software providers, and the IRS. The information will be automatically entered into tax software to eliminate the errors that can occur during manual entry. Matching data in a tax return with the same information sent to the IRS becomes a nearly fail-safe task.

There are other efforts underway that can do some of these same tasks. For instance, Intuit has a process that allows information from participating companies to be downloaded into a tax return being prepared using its products. However, this is a proprietary system designed only for this specific transfer. Tax XML, on the other hand, offers complete sharing of data based on an open standard and XML technology, as well as data redundancy eliminated by separating data from the government forms.

Technical Overview

The technical overview summarizes the work that has been done for Tax XML. This section includes information about the taxonomy, schema, and instance files developed as part of Tax XML.

The Taxonomy

XML focuses on data. Therefore, the first work done for Tax XML was to create a hierarchy, or taxonomy, of the tax data to be included in the schema. The goal of the taxonomy is to streamline and reorganize tax data to make it more logical in order, independent of the presentation on the existing government tax forms, and to eliminate redundancy.

In order to accomplish this, a review was done of the data contained on forms and in the electronic filing record layouts, where they were available for a form. The review focused on the following tasks:

Arrange data into logical groups. Data was moved from its location on the form and included where it would allow the data to flow functionally from a computational perspective.

Exclude computed amounts. Data was included if it was an entered field, but usually not if it was a computed amount. Computed amounts were sometimes included if they were key check totals or placeholders for parts of the taxonomy not yet completed.

Include data only once. Duplication increases the possibility of error and requires additional storage. If data appeared in more than one place, the data was captured in the hierarchy in the place that represented its source entry point.

Tax Forms Included in Tax XML

It is somewhat misleading to speak in terms of forms, but it is a helpful reference for identifying the scope of the work done to date. Tax XML includes the information from the following income tax forms and schedules, although the data is reorganized and computed amounts are not included.

Although Tax XML is intended to be a single schema, the work done has been grouped into individual income tax and corporate income tax for ease of analysis and is presented in those parts throughout this document.

Individual Forms

Form 1040 / Form 1040A / Form 8815
Schedule A / Schedules 1, 2, and 3 / Form 8828
Schedule B / Form 1040EZ / Form 8829
Schedule C / Form 2210-F / Form 8839
Schedule E (pg.1) / Form 2441 / Form 8863
Schedule EIC / Form 4255 / Form 9465
Schedule F / Form 4562 / Form W-2
Schedule H / Form 4797 / Form 1099-INT
Schedule J / Form 4835 / Form 1099-DIV
Schedule R / Form 8606 / Form 1099-MISC
Schedule SE / Form 8615 / Form 1099-R

The Forms 1099-INT, 1099-DIV, and 1099-MISC are included above, even though these forms are not currently included in the paper filing or electronic submission of forms to the IRS. The reason for this is to show an example of what the future might hold where financial institutions and employers would also use this schema to transmit their information to the government, and then on to the taxpayer for inclusion in the individual’s income tax return. The seamless integration of data transfer in such a system shows the extended possibilities of Tax XML beyond just the filing of income tax returns.

Corporate Forms

Form 1120 / Form 1120 (continued) / Form 4255
Schedule A / Schedule K / Form 4562
Schedule C / Schedule L / Form 4797
Schedule E / Schedule M-1
Schedule J / Schedule M-2

Form 1120 covers much of the data included so far. The additional forms listed are available from the work done on individual income tax. Because the goal of Tax XML is to define a data structure that crosses different types of tax returns, the tag names and data organization are consistent and reusable whether the tax type is corporate, individual, or another type.

Data Hierarchy for Tax XML

The beginning of the hierarchy is the same for all types of tax. This list shows the relationship of the individual and corporate tax branches to the entire tree.

TaxML
Authentication
Identification
KeyID
TaxYear
+ Version
+ IndividualTax
+ CorporateTax

The TaxML element serves as the top level, or root, of the hierarchy. Below TaxML are lower levels of the tree. The ‘+’ sign preceding an element indicates that there are levels below this item. If there is no ‘+’ sign, the element contains data rather than child elements.

The Authentication and Identification elements are shown only as placeholders. Security and verification of identity are beyond the current scope of Tax XML.

Building a Data Hierarchy for Individual Taxes

The first part done in the Tax XML project was work on individual income taxes. A data hierarchy was developed that had many levels and named the data elements with descriptive names designed to clearly identify the information they held. Element structures were developed, such as Name, to hold a set of elements and that could easily be reused. Groups of elements that could occur more than once were labeled as lists for easy identification. Some of these structures were not included in the best practices that are shown below, but they still represent a viable approach and are included for that reason.