1

Analytics and Data Warehousing Solutions Standard

CONTENTS

1.CONTEXT

1.1.Background

1.2.Purpose

1.3.Scope and application

1.4.Policy context3

1.5.The ICT Services Catalogue

2.KEY PRINCIPLES

3.REQUIREMENTS

3.1.Analytics and data warehousing

3.2.Service level and complexity

3.3.Requirements tables

3.3.1 Silver (standard) – Use Cases / Scenarios

3.3.2 Gold (complex) – Use Cases / Scenarios

3.4.Elements of this standard

DOCUMENT CONTROL

APPENDIX A – DEFINITIONS

APPENDIX B – ABBREVIATIONS

APPENDIX C – REFERENCES

APPENDIX D – STANDARDS

Developing technical standards

Management and implementation

  1. CONTEXT

1.1.Background

This is a technical standard developed through the NSW ICT Procurement and Technical Standards Working Group. The standard contains minimum technical and functional requirements that agencies should consider when procuring ICT services for analytics anddata warehousingsolutions.

By defining the necessary and common elements across agencies the standard provides an opportunity to leverage the buying power of Government as a whole,improve procurement efficiency, and increase interoperability.

1.2.Purpose

The purpose of this standard is to assist NSW Government agencies to develop, procure and implement analytics and data warehousingservice solutions and tools, as well as take full advantage of their benefits. This standard also helps agencies procure in a strategic manner that reflects the NSW Government’s priorities as outlined in the NSW Government ICT Strategy.

This standard details the issues that need to be considered so each agency can identify the available options that best suit their business requirements, helping them achieve value for money through cost savings and improved flexibility of service offerings.

1.3.Scope and application

This standard applies to all NSW Government departments, statutory bodies and shared service providers. It does not apply to state owned corporations, but is recommended for their adoption.

For the purposes of this standard, analytics means data analysis services (including business intelligence (BI)),and data warehousing means services for storage of data extracted from transaction systems, operational data stores and external sources.

This standard sets out service definitions as minimum requirements that vendors must meet, for inclusion in the NSW ICT Services Catalogue. Agencies should consider any specific operational or regulatory factors that impact their requirements, and specific requirements they have in addition to those detailed in this standard.

1.4.Policy context

The NSW Government ICT Strategy and Digital+ 2015 Final Updateset out the Government’s plan to: build capability across the NSW public sector to deliver better, more customer-focused services that are available anywhere, anytime; and to derive increased value from the Government’s annual investment in ICT.

Information sharing, open data and reuse of technology are priority initiatives of the ICT Strategy, to maximise the return on government investments, support better policy development and service delivery. The NSW Government ICT Investment Policy and Guidelines establishes these requirements for all new ICT projects, particular to make better use of the functionality in existing systems.

The NSW Government Enterprise Architecture(NSW GEA) provides direction and practical guidance to accelerate the development of agency EA capability and enabling a common, intra and inter agency approach to the design of digital government. It encompasses all aspects of enterprise architecture activity at the business, information, application and technology infrastructure layers. The NSW GEA is mapping the landscape of Whole of Government systems available across the sector, highlighting opportunities for reuse and where APIs can add value.

NSW Government, along with many governments in other jurisdictions, has moved towards opening up previously protected databases and applications, so that data and functionality can be accessed across agency boundaries or reused in new systems. Within NSW this has been reflected in the development of the NSW Government Open Data Policy, which provides clear direction for agencies to make their data available to the public in machine readable forms, including through the availability of APIs.

Developing whole of NSW Government ICT technical standards is a key initiative of the NSW Government ICT Strategy, driven by the ICT Procurement and Technical Standards Working Group. These standards leverage principles defined in the NSW Government ICT Strategy and the NSW Government Cloud Policy, and they support the NSW ICT Services Catalogue.

The standards set out service definitions as minimum requirements that vendors must meet to be able to offer their services through the NSW Services Catalogue. This helps achieve consistency across service offerings, emphasising a move to as a service sourcing strategies in line with the NSW Government ICT Strategy, and it signals government procurement priorities to industry.

This standard should be applied along with existing NSW Government policies and guidance, including the NSW Digital Information Security Policy. More information on the process for the development of standards that populate the ICT Services Catalogue is at Appendix D – Standards.

1.5.The ICT Services Catalogue

This catalogue provides suppliers with a showcase for their products and services, and an opportunity to outline how their offerings meet or exceed standard government requirements. The standards, together with supplier service offerings, help to reduce red tape and duplication of effort by allowing suppliers to submit service details only once against the standards. The offerings are then available to all potential buyers, simplifying procurement processes for government agencies.

Implementing this category management approach embeds common approaches, technologies and systems to maintain currency, improve interoperability, and provide better value ICT investment across NSW Government.

  1. KEY PRINCIPLES

This standard is based on the following principles:

  • End-to-end digital:Analytics and data warehousing solutions should enable end-to-end digital business processes and management.
  • Data quality: Data should possess characteristics indicating data quality, includingin relation to: accessibility, the institutional environment, relevance, timeliness, accuracy, coherence and interpretability (see the NSW Government Standard for Data Quality Reportingfor more details).
  • Single source of truth: Where applicable, agencies should establish a dataset as the single source of truth, to help eliminate duplication, inconsistency, and to support data currency. This will also be the authoritative version of the dataset, andits custodian will also be accountable for data information and quality (see the NSW Government Data and Information Custodianship Policyfor more details).
  • Timeliness where practical: Analytics and BI should be current and timely, to enhance its value for real-time monitoring, assessing progress, feedback and future predictions – particularly for service delivery.
  • The data warehouse as an ‘enterprise memory’: The data warehouse should be a repository for agency knowledge, and an ongoing resource for the agency. The data can be used for ongoing analytics and developing BI, informing future operations, increasing productivity and efficiency, and enhancing service delivery.
  • Facilitating as a service:Analytics and data warehousing solutions should facilitate the agency transition to as a service.
  • Interoperability: Analytics and data warehousing solutions should meet applicable recognised open standards.
  • Managing data as an asset: Datasets should be viewed as a strategic asset for agencies, and analytics and data warehousing solutions should help ensure agencies achieve maximum value and strategic business intelligence from their data.
  • Data sharing: Analytics and data warehousing solutions should support data and information sharing across NSW Government to enhance service delivery, and comply with principles set out in the NSW Open Data Policy.
  • Data accessibility: Where appropriate, and in line with the NSW Open Data Policy, data should meet accessibility requirements.
  1. REQUIREMENTS

3.1.Analytics and data warehousing

When assessing analytics and data warehousing solutions, additional elements to be considered include BI and service management. For the purpose of this standard data analytics includes BI.

3.2.Service level and complexity

Analytics and data warehousingcan be provided in a range of ways. For example, the supplier of the service may manage some of the service or environment during the course of the contract, or the supplier of the service may manage the entire service for course of the contract. Analytics and data warehousing services may be provided as one or two separate solutions.

The following requirements are set out in use case tables, separated into two service levels–Silver and Gold, reflecting the complexity of the analytics and data warehousingsolution required:

Silver:A Silver standard analytics or data warehousing service is the provision of a ‘Platform as a Service’, where the required tools are deployed into a cloud environment and made available to the buyer with minimal configuration. Further specific configuration is required by the buyer to allow successful delivery of the solution – for example, customisation of data models and configuration of ETL.

Gold: A Gold standard analytics or data warehousing service is the provision of a fully functional ‘Software as a Service’ where the solution is delivered as a fully configured and maintained service to meet the business needs of the buyer. Such a solution would be deployed wholly by the vendor to acquire and process data, with subsequent presentation to the user in a manner that can be consumed by the buyer using reporting or OLAP tools with little additional configuration effort.

3.3.Requirements tables

The following tables set out the recommended business and technical requirements for NSW Government. They provide a consistent approach for all NSW Government agencies regardless of their size. Explanations for each element of the following use cases are provided at section 3.4.

1

Analytics and Data Warehousing Solutions Standard

3.3.1 Silver (standard) – Use Cases / Scenarios

‘Use cases’ that are anticipated in agencies are set out in the table below. The corresponding requirement sections of this standard are ticked in the columns.

Use Case / Scenario
SILVER / Data profiling / Data quality / Data quality dashboards / Data quality assessment / Automated data cleansing / Retrospective analytics / Real-time analytics / Predictive analytics / Requirements analysis and design / Data modelling / Data lineage / Data history / maintenance of history / Data acquisition / ETL – operational metadata / ETL – transformation capabilities / ETL – exception management and reporting / ETL – reconciliation reporting / ETL – scheduling capabilities / ETL – performance / Resilience / Database support / Security / Business glossary / Performance
Analytics /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Data warehousing /  /  /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Use Case / Scenario
SILVER / Semantic Layer / OLAP / Predefined reports / Self-service administration / Full-service administration / Cloud compliant hosting facility / NSW Government Data Centre / Onshore/offshore management / Service level management / Multi-service broker provision / Data sovereignty / Backup and recoverability / Archive and retention / Audit
Analytics /  /  /  /  /  /  /  /  /  /  /  / 
Data warehousing /  /  /  /  /  /  /  /  /  /  /  / 

3.3.2 Gold (complex) – Use Cases / Scenarios

‘Use cases’ that are anticipated in agencies are set out in the table below. The corresponding requirement sections of this standard are ticked in the columns.

Use Case / Scenario
GOLD / Data profiling / Data quality / Data quality dashboards / Data quality assessment / Automated data cleansing / Retrospective analytics / Real-time analytics / Predictive analytics / Requirements analysis and design / Data modelling / Data lineage / Data history / maintenance of history / Data acquisition / ETL – operational metadata / ETL – transformation capabilities / ETL – exception management / ETL – reconciliation reporting and reporting / ETL – scheduling capabilities / ETL – performance / Resilience / Database support / Security / Business glossary / Performance
Analytics /  /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Data warehousing /  /  /  /  /  /  /  /  /  /  /  /  /  /  /  /  / 
Use Case / Scenario
GOLD / Semantic layer / OLAP / Predefined reports / Self-service administration / Full-service administration / Cloud compliant hosting facility / NSW Government Data Centre / Onshore/offshore management / Service level management / Multi-service broker provision / Data sovereignty / Backup and recoverability / Archive and retention / Audit
Analytics /  /  /  /  /  /  /  /  /  /  /  /  / 
Data warehousing /  /  /  /  /  /  /  /  /  /  /  /  /  / 

1

Analytics and Data Warehousing Solutions Standard

3.4.Elements of this standard

Data profiling

The capability to profile the candidate data for inclusion within the data warehouse to understand quality in terms of consistency (e.g. Street vs. St) and completeness (e.g. % completed compared to real world).

Data quality

Provision of measure(s) to determine the quality of data being analysed.

Data quality dashboards

The ability to deploy continuous data quality monitoring of source data.

Data quality assessment

The ability to define rules to check valid values and relationships between data to ensure a complete and accurate picture is available, or understand steps needed to remediate quality issues.

Automated data cleansing

The ability to correct known quality issues automatically, including for example:

  • Standardisation: Correcting AU, AUS, Australia or ensuring phone numbers are a standard format;
  • Matching: Matching multiple records that represent the same real-world ‘thing’, e.g. John C = Mr. John Citizen; and/or
  • Merging: Creating a single ‘golden’ record.

Retrospective analytics

The ability to deliver analytics retrospectively based on available data sets within scope.

Real-time analytics

The ability to deliver analytics in real-time based on available data sets within scope.

Predictive analytics

The ability to deliver predictive analytics based on available data sets within scope.

Requirements analysis and design

Provision of specialist data warehousing business analysts to understand and document information requirements.

Data modelling

Provision of specialist data warehousing data modellers who are experienced with industry standard data warehousing techniques that capture business concepts, business change over time and provide flexibility for future reporting or extension. Dependent upon the application, vendors should have demonstrated experience in COTS data models, data vault, dimensional modelling or third normal form data models.

Data modelling is core to the success of data warehouse projects as it sets the foundation for future analysis and reporting capability, therefore it is desirable that the modeller has experience within the relevant sector.

Data lineage

All tools within the data warehouse should provide full traceability from reported information back to the specific data extracts from source systems to enable full audit and reconciliation of reported information.

Datahistory/ maintenance of history

Where required by the business, the data warehouse should provide the capability to retain the history of slowly changing attributes (e.g. marital status) over time to ensure historic reporting can be delivered.

Data acquisition

Desirable data acquisition techniques should include extract files, direct data base connection to a range of major databases, and database change data capture. More modern tools should also accept ‘stream’ data to a web service, e.g. social media feed content.

The data acquisition techniques are often separately licenced by vendors and should be selected based on source system impact, data volume, budget and data timeliness requirements.

ETL–operational metadata

The Extract, Transform, Load (ETL) tool should capture operational metadata including information on processing times, record counts and exception counts for the constant monitoring of the solution.

ETL – transformation capabilities

ETL should provide appropriate data transformation capabilities for the data being processed. Typical capabilities will include calculations, text manipulation, joins and lookups. Where possible, transformation rules should be ‘metadata-driven’, allowing update of the rules without extensive code changes.

ETL – exception management and reporting

ETL should provide the ability to inspect data that has been rejected or that breaks specific rules and provide options for remediating that data and including within the data warehouse.

ETL – reconciliation reporting

The solution should provide for end-to-end reconciliation of information for each load. This will enable the organisation to have confidence in the accuracy of information presented by the solution and identify issues.

ETL – scheduling capabilities

It should be possible to schedule processing activities to occur in required batch windows or in response to external events.

ETL – performance

Performance should be sufficient to run the required processing within available time windows and support the data timeliness requirements of the solution. Dependent upon specific requirements, this may be supported by capabilities to allow:

  • Scalability to multiple processing nodes; and
  • Parallel processing of the dataset (e.g. dividing the data up and using multiple processors to accelerate the end-to-end run).

Resilience

Solutions should meet criteria for:

  • Backup frequency (e.g. what is the acceptable information loss);
  • Availability;
  • Recoverability (e.g. in the event of failure); and
  • Restartability (e.g. can processing be restarted from where it was if there is a failure part-way through processing).

Database support

Where procured as Platform as a Service, appropriate relational or in-memory database technology should be available to the buyer. The use of specific database technology is less critical should the solution be delivered as a fully managed service.

Security

Security should be available throughout the analytics solution including security of the semantic layer at a ‘per-object’ level, report security and data-level security (e.g. ability to view only clients related to your work group).

Data warehouses are by their nature attractive targets due to the large volume of valuable data. Dependent upon the classification of data stored, all layers of the data warehouse should include: