customer FAQs

Hadoop in the Cloud

Hadoop in the Cloud UKC-GEN-160 • 08/2016
©UKCloud Ltd, 2016OpenPage 1 of 3

customer FAQs

General

QWhat is the service?

Hadoop in the Cloud is UKCloud’s highly secure PaaS implementation of Hadoop. It provides a cloud-based solution to help organisations address the challenges of big data storage and processing.

QWhy deliver Hadoop as a cloud service?

The service enables organisations to explore a highly connected, secure, stable solution that's optimised for big data, from proof of concept through to production workloads — while minimising the investment, time and risk associated with buying, provisioning, configuring and maintaining Hadoop infrastructure, platforms and licenses.

QWhat modules constitute the Hadoop core platform?

The Hadoop core platformconsists of the following modules and associate supporting services:

  • Hadoop Common. The common utilities that support the other Hadoop modules
  • Hadoop Distributed File System (HDFS™). A distributed file system that provides high-throughput access to application data
  • Hadoop YARN. A framework for job scheduling and cluster resource management
  • Hadoop MapReduce v2. A YARN-based system for parallel processing of large datasets

The modules in the Hadoop core platform facilitate data ingress and egress, and native MapReduce v2 applications.

QHow did UKCloud define its Hadoop core platform?

To deliver a quality services, we identified the boundaries of Hadoop in order to make a clear delineation between UKCloud-provided and -supported services, and customer/third-party services. We've adopted the industry definition of Hadoop as per the Apache Foundation

QWhat is the HDFS data replication factor for Hadoop in the Cloud?

UKCloud has fixed the HDFS data replication factor to a multiple of three. This factor is inline with established Hadoop practices, and helps keep costs for the service to a minimum.

QHow large can I grow my Hadoop cluster?

UKCloud is confident that our Hadoop in the Cloud service is capable of operating at a scale more than large enough to deal with the majority of Hadoop use cases and production workloads.

QWhat Hadoop distributions does Hadoop in the Cloud support?

This service currently supports Hortonworks®HDP and Cloudera Enterprise. We will continue to review supporting additional distributions according to market demand.

QIs your Hadoop in the Cloud service extensible to offer additional analyticsand visualisation tools?

We have engineered the service to enable customers to provision their own analytics, business intelligence and visualisation tools on our Compute service line with full, reduced-latency connectivity to Hadoop in the Cloud.

Billing

QHow is Hadoop in the Cloud billed?

The service is a true cloud service, billed by the hour based on the storage consumed, with no upfront cost, minimum commitment or early exit fees.

QDoes UKCloud offer a free trial?

QWe offer a 30-day free trial so that you can test and evaluate our service without commitment. Your trial provides you with a live environment on the UKCloud platform to test our services and verify whether they are suited to your needs.

QWhat are Velocity Packs?

To maximise cost flexibility against user requirements, UKCloud offers three Hadoop cluster types for customers to choose from, based on their initial Hadoop data requirements, coupled with their projected velocity of future data ingest.

QCan I mix and match different Velocity Packs?

It is currently not possible to mix and match cluster node types within a single cluster (for example, a low-velocity cluster can only scale out with low-velocity slave nodes).

Security

QWhere is the service hosted?

The service is delivered by a UK company from two tier 3 UK data centres separated by more than 100km, and securely connected by high-bandwidth, low-latency dedicated connectivity.

QDoes my data leave the UK?

As the service is delivered from UK datacentres by a UK company, your data does not leave the UK when at rest.

QCan I use Hadoop in the Cloud in the UKCloud Elevated (previously IL3) domain?

Yes, Hadoop in the Cloud is available in both the OFFICIAL Assured and Elevated domains.

QHow do you ensure my data remains secure in a multi-tenant environment?

Hadoop in the Cloud was designed with data security as a priority. Each Hadoop cluster is deployed as its own entity and within its own virtualised environment from a storage, processing and management perspective. This, coupled with all HDFS data being stored on a physical drive exclusive to a single tenant’s virtual node, helps ensure the highest level of data security and assurance.

QIs the service Pan Government Accredited?

UKCloud’s existing PGA still applies to the infrastructure underpinning our services, but sincethe move to the Government Security Classification Policy (GSCP), we are no longer able to seek PGA for new services such as Hadoop in the Cloud.

We are now required to self-assert our services, with customers then responsible for assessing and selecting the most appropriate cloud services which meet their individual security requirements.

We provide confidence that the service still meets the highest level of information assurance, which is why we continue to conduct independent testing and validation of our platform, and have the findings made available to our customers and partners, thereby enabling their SIROs to make an informed decision about self-asserting any service they choose to consume.

Support

QHow is Hadoop in the Cloud supported?

We manage and support the Hadoop core platform using our dedicated support team based in the UK. Support is available via helpdesk ticket, phone or email.

QWill UKCloud manage rolling point Hadoop releases?

We will monitor the release of any minor, major and security patch releases, and test them on our own platforms. We won't automatically apply updates, but will present our testing, update packages and blueprints to enable customers to apply patches at their own discretion.

Hadoop in the Cloud UKC-GEN-160 • 08/2016
©UKCloud Ltd, 2016OpenPage 1 of 3

customer FAQs

Connectivity

QCan I use Hadoop in the Cloud over closed networks such as PSN and N3?

The service is accredited for use over PSN. Connectivity to the N3 network will be considered when an appropriate sponsor submits a requirement.

Data performance and resilience

QWhat is the underlying storage technology for the service?

We designed our platform to be optimised specifically for Hadoop, inline with best practices established by VMware and the Hadoop community.

Unlike some Hadoop cloud service providers, we give each node VM exclusive access to a physical drive attached directly to the host, helping to increase both performance and security.

QHow do you minimise the traditional cloud data access latency for Hadoop?

The use of localised, single-tenant physical drives, direct-attached to a virtual node overcomes the traditional cloud data access latency issues that occur when a virtual machine has to pull data from a SAN across the network.

QHow do you ensure the performance and resilience of Hadoop in a virtualised environment?

We've used Big Data Extension and Hadoop Virtual Extension technologies to create rack, host and node awareness within our virtual datacentre, to help ensure the best placement of nodes from a performance and resilience perspective.

QWill my cluster performance increase, the more worker nodes I deploy?

Owing to the way Hadoop places and queries data, the more worker nodes the clustercan spread its data across, the faster performance becomes.

QWhy is my cluster or newly added node faster after it has been running for a while?

When a cluster or node is initially deployed, the disks may take some time to warm up. This is because the disks are start out blank and there is an initial write penalty. Once the disk has been written to, the performance will improve.

QDoes UKCloud offer any scheduled automated backup for Hadoop in the Cloud?

There is no scheduled automated backup for this service as Hadoop's storage engine, HDFS, is engineered with infrastructure failure in mind. That means localised component failures are tolerated within the infrastructure via data replication, eliminating single points of failure (including physical host failure or disk failure).

Hadoop v2.4.1+ allows for manual creation of snapshots of HDFS, which can be stored offline using our Cloud Storage.

QDoes Hadoop in the Cloud support active/active replication of my cluster between your two data centres?

Currently our service offers only a single active cluster from either of our data centres.

Active/passive clusters could be configured using our low-latency dedicated connectivity to enable synchronous replication, butthe customer or partner would be responsible for supporting this configuration.

Third-party tools for active/active Hadoop clusters are available, but we would not be responsible for the design, implementation, testing or support of these tools.

Hadoop in the Cloud UKC-GEN-160 • 08/2016
©UKCloud Ltd, 2016OpenPage 1 of 3