Hadoop Developer

Hadoop Developer

Sivaprasad

Hadoop Developer

Professional Summary:

9years of comprehensive IT experience in Big Data & ETL domain with tools like Hadoop,Spark and other open source tools/technologies.

Hands on experience in development of Big Data projects using Hadoop, Hive,Sqopp,Oozie, PIG, Flume,kafka and MapReduce open source tools/technologies.

Hands on experience in installing, configuring Hadoop using ecosystem components like Hadoop MapReduce, HDFS, Hbase, Zoo Keeper, Oozie, Hive, HDP, Cassandra, Sqoop, PIG, Flume,Spark.

Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.

Developed pig latin scripts.

Good knowledge of NoSQL databases such as Hbase and Mangodb.

Strong knowledge of Spark for handling large data processing in streaming process along with Scala/Python.

Strong knowledge of ETL tool Data Stageand reporting tools like Cognos, Qlickview and Tableau for handling DWH applications.

Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables.

Experienced in loading data to Hive partitions and bucketing.

Background with traditional databases such as Oracle, Teradata, Netezza, ETL tools / processes and data warehousing architectures.

Experience in working with Iterative Agile/ ScrumMethodologies, Waterfall methodology.

Extensive Knowledge on source code version control using SVN.

Strong development skills in Object Oriented and functional programming.

Ability to manage and deliver results on multiple tasks by effectively managing time and priority constrains.

Able to work within a team environment as well as independently.

Proficient in collaborating with Team members, Scoping and planning project estimates, Requirement Analysis, Design and Architecture review, Development and Unit Testing.

Have a Strong customer focus and excellent interpersonal, verbal, and written communication skills.

Technical Skill Set:

Operating Systems: Linux , Windows

Programming Languages : Python, Scala and Java

Databases : DB2,Oracle,Teradata, MY SQL and Hive

Version Control: RTC (IBM Rational Team Concert),SVN

Processes : SDLC, Agile Scrum

Tools : TSRM,Clearquest,Winscp,Putty

Qualification:

Bachelor’s degree in Computers, Nagarjuna University, INDIA.

Professional Experience:

Project: Ecommerce Data Analytics
Client: Mondelez International (Kraft Foods), New Jersey / March 2017 – Present

Role: HadoopDeveloper

Environment: Hortonworks Data Platform (HDP) 2.4 ,Tableau, Hive, Oozie, Python

Brief Description:Ecommerce project is to answer eCommerce business questions by taking complex and decentralized data and translating into a self-service, centralized data source that is easy to access, understand, and analyze. The solution can be expanded to leverage and integrate additional Mondelez internal and external data sources to provide a comprehensive view of brands, market level data, and customer insights. Analyse all ecommerce sales data for Mondelez products.

Responsibilities:

  • Responsible for complete end-to-end project delivery.
  • Develop code to pick email source feed files to Hadoop file system using python code.
  • Pre-processing all input files with python code to make ready to load into table.
  • Design meta data for all sources like Amazon, Walmart,Peapod,WalmartPickup,Target.com,Jet.com,Freshdirect,Samsclub,Samsclub Pickup, Boxed.com, Quidsi, Costco and Safeway Albertson.
  • Design data model from stage table to final tables for all sources .
  • Create Hive External tables to access daily data from sources.
  • Create Hive Internal partitioned and Bucketing tables to load data.
  • Develop a Hive UDF to make common date format for all sources dates.
  • Create Oozie workflow for all actions to execute all actions in sequence.
  • Refresh Tableau dashboards with fresh data after each load.

Project: Emerging Mobile Engineering (Media Analytics - WC-T20 & IPL-2016)
Client: Akamai Technologies Inc,Cambridge, MA / Feb 2016 - March2017

Role: HadoopDeveloper

Environment: Cloudera,Spark,Scala,Kafka,Tableau, SVN

Brief Description:The aim this project is to perform media analytics, with mobile advancing rapidly media analytics plays vital role. we have content providers around the globe, and needs to analyze quickly and provide accurate statistics on internet traffic. Akamai being pioneer among content deliver operators we have to store data and perform analytics on top of the data we host in our servers.

Responsibilities:

  • Configure Kafka to get the logs to HDFS.
  • Process the logs and extract the data required to the business for Analysis.
  • Used SparkStreaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first
  • Implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Developed SparkSQL to load tables into HDFS to run select queries on top.
  • Created Autosys jobs for automation for workflows.
  • Designed Tableau reports for the business users.

Project: Marriott BI Migration
Client: IBM India / Marriott InternationalBangalore, India / Nov 2015 – Feb 2016

Role: HadoopDeveloper / Migration Engineer

Environment: IBM Big Insights 4.1

Brief Description:Marriott migrated Big Insights platform from v3.0.0.2 to v4.1.0.2 to achieve Stability, utilize the data encryption technique (for new developments) and utilize new tools and features. IBM provided the services required to move the application code from old environment to new, perform necessary setup, configuration, application testing,deployment and data validation.

Applications moved successfully: LZC, LZCDA, DPS and AW

Responsibilities:

  • Responsible for preparation of the migration plan activities.
  • Identify Hadoop scripts for all applications.
  • Create missing folders in new Environment.
  • Move the code to new environment from old environment.
  • Move data from old server to new server with the help of operations team and cross check data volume and records count.
  • Perform round of testing to ensure the overall quality of the project delivering with zero defects.
  • Analysis of SIT defects and proving solutions.
  • Accountable for understanding and analyzing the defects and proving solution.
  • Support and help client fixing issues after project deployment.

Project: Walk out and watch (WOW)
Client: IBM India / DirecTV (AT&T Company )Bangalore, India / Mar2015 – Oct 2015

Role: HadoopDeveloper

Environment: Cloudera 5.3.3,Unix,Hive,Impala,Sqoop,Oozie,Hue,XML,Autosys,Quickview

Brief Description:“Walk out and Watch” is an application provides the ability to stream content on a web or mobile NewCo App to the new Customer. WOW will allow a customer to stream or authenticate to third party applications prior to install. The scope for this project is the usage of streaming data that is flowing into BI to be used for Supplier Payments, Fraud Management, reporting and other business uses.

Responsibilities:

  • Analyze and understand the technical design document.
  • Develop the shell scripts to check files and move files from edge node to HDFS location.
  • Implement Hive external / internal tables and Hive queries to extract data for reports.
  • Develop Sqoop Scripts according to business rules.
  • Design Oozie workflow to execute actions in control order.
  • Analysis of SIT defects and proving solutions.
  • Accountable for understanding and analyzing the defects and proving solution.
  • Handle importing of data from various data sources using Sqoop, perform transformations after loading data to Hive.
  • Created UDF’s to implement business logic properly in Hadoop.
  • Used Impala for faster querying purposes.
  • Created Autosys jobs for automation for workflows.

Project: Agent Answer Center
Client: IBM India / DirecTV (AT&T Company ) Bangalore, India / Aug 2014 – Feb 2015

Role: HadoopDeveloper

Environment: Cloudera 5.3.3, Unix,Hive,Impala,Sqoop,Oozie,Hue,XML, Autosys,Quickview

Brief Description:The Agent Answer Center (AAC) is an enterprise web application utilized by the call center agents to trouble-shoot and resolve DIRECTV customer calls.This project will establish baseline metrics and ongoing reporting capabilities to align subsequent project work directly to departmental and corporate goals based on input data from WEBTRENDS vendor. The analytics will drive measureable and focused changes based on customer call data to the application moving forward.

.

Responsibilities:

  • Analyze and understand the technical design document.
  • Develop the shell scripts to check files and move files from edge node to HDFS location.
  • Implement Hive external / internal tables and Hive queries to extract data for reports.
  • Develop Sqoop Scripts according to business rules.
  • Design Oozie workflow to execute actions in control order.
  • Analysis of SIT defects and proving solutions.
  • Accountable for understanding and analyzing the defects and proving solution.
  • Handle importing of data from various data sources using Sqoop, perform transformations after loading data to Hive.
  • Created UDF’s to implement business logic properly in Hadoop .
  • Responsible to play as Hadoop SME role for customer in absence of onshore SME.
  • Used Impala for faster querying purposes.
  • Created Autosys jobs for automation for workflows.

Project: EE mData Analytics Platform
Client: IBM India / Everything Everywhere Limited Bangalore, India / Aug 2013 – Jul 2014

Role: HadoopDeveloper

Environment: IBM Big Insights ,Quickview

Brief Description:This is a development project EE client wants to analysis the CDR, WEB logs with help of Reference data. To provide valuable knowledge to help EE build and run their network and to service EE’s customers to ensure they remain the leading UK network and best digital life experience. To monetize their data and provide a new revenue stream outside their normal business focus.

Responsibilities:

  • Analyze and understand the technical design document.
  • Develop the shell scripts to check files and move files from edge node to HDFS location.
  • Key role in the design of Hadoop scripts for Application.
  • Develop Pig Scripts according to business rules.
  • Develop mapper and reducer jobs where ever required.
  • Used UDF’s to implement business logic in Hadoop.
  • Design Oozie workflow to execute actions.
  • Analysis of SIT defects and proving solutions.
  • Accountable for understanding and analyzing the defects and proving solution.
  • Analyse the data by performing Hive queries and running Pig scripts to study customer behaviour.
  • Responsible to play as Hadoop SME role for customer in absence of onshore SME.

Project: IACRR (International Assignment Cross Reporting Repository)
Client: IBM India / IBM USA Bangalore, India / Aug 2011 – Jul 2013

Role: ETL Data Stage Developer

Environment: IBM Data Stage, DB2, Cognos

Brief Description:Existing process of International Assignees (IAs) compensation data cross-reporting between home and host countries is manual and error-prone. Leads to late and inaccurate tax payments in many countries resulting heavy penalties due to non-compliance with tax laws. Need of a system in place to assist the IA Tax team as well as the local country payrolls to automate and simplify the process for accurate information. Objective is to build a common repository of all IA compensation data (earnings, deductions, taxes, equity transactions, expenses, benefits); this data will be collected from the assignee home countries as well as their host/work countries. Having this data in one place will allow the users to generate timely, more complete and accurate cross-country monthly compensation reports.

Responsibilities:

  • Requirements gathering and analysis and feasibility of requirements.
  • Design the ETL processes using Data Stage tool to load data to target DB2 database.
  • Involve in ETL process for data movement from Transient to Staging and finally to Warehouse.
  • Worked as IBM Data Stage developer to design jobs.
  • Conducted unit test and involved on preparing the test data.
  • Worked on upgrades for the weekend changes which were scheduled for Data stage Server.
  • Part of Data stage server migration team to move jobs and scripts.
  • Data Extraction, Transformation and Loading from source systems.
  • System Testing of the Data Mart and Data Warehouse
  • Extract external data from the sources like flat files and relational databases.
  • Analysis of SIT defects and proving solutions.
  • Accountable for understanding and analyzing the defects and proving solution.

Project: eTOTALS (Electronic Terminal for Time and Reporting System)
Client: IBM India / IBM USA Bangalore, India / Aug 2008 – Jul 2011

Role: Java Developer

Environment: JAVA, IBM Z/Vm, DB2, RTC

Brief Description:eTOTALS is the intranet web interface to the TOTALS application. Use eTOTALS to record your time and labor for payroll processing. eTOTALS is only available to IBM US employees and requires an IBM intranet ID and password for access.All regular exempt and non-exempt employees are only required to submit an eTOTALS time card to record one or more of the time card and attendance.Time cards with codes for unpaid absence or additional compensation/premium hours must be reviewed by management and signed. Time cards with codes for personal illness or industrial accident require management approval, and will require management post-review and approval.

Responsibilities:

  • Developed front end screens using JSP, HTML, CSS and JavaScript.
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Followed Java & J2EE design patterns and the coding guidelines to design and develop the application.
  • Interacted with the QA team to understand the information that are part of the QA weekly report also and the desired layout.
  • Involved with the onsite team to come with the design and implementation of the project.
  • Developed modules to create, view, delete and search the weekly reports of the QA team using Java, JDBC.
  • Performed front-end validation using Java Script.
  • Designed and created the database tables in DB2.
  • Developed Data Access layer using JDBC for connecting to database.
  • Used RTC version control for maintaining source code.

Certifications:

  • IBM Info Sphere Data Stage Certified Professional.
  • IBM Cognos Certified Developer.

1 | Page