Big Data Processing Using Apache Hadoop

Big Data Processing Using Apache Hadoop

hero-big-data-using-hadoop-2x

Overview 

Storage is running out in the world , everyday there is less and less space left for people to share their photos or save their videos but still we as an end user do not notice this change , we practically have access to an infinite amount of storage on the internet but how is that possible? The answer lies in the technology if hadoop where you will learn how to create a your of data center with minimal effort , you will be able to create and deploy your own cluster of many computers or in other words you will be able to create your own “SUPERCOMPUTER” .

Prerequisite 

 There is no such mandatory requirement

Course Fee :   12000   INR

What is Big Data?

Hadoop – Why Hadoop, Scaling, Distributed Framework, Hadoop v/s RDBMS, Brief history of Hadoop.

Hadoop Architecture v1 and v2

Hadoop Cluster Configuration

Hadoop MapReduce framework v1  –

  • Hadoop Data Types,
  • Map and Reduce tasks,
  • MapReduce Execution Framework,
  • Input Formats (Input Splits and Records, Text Input,
  • Binary Input, Multiple Inputs),
  • Output Formats (TextOutput, BinaryOutPut, Multiple Output),

Hadoop Project:

  • MapReduce Programming.

Hive and HiveQL

  • Hive Architecture and Installation, Comparison with Traditional Database

HiveQL:

  • Data Types, Operators and Functions, Hive Tables(Managed Tables and External Tables,
  •  Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables),
  • Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, Views, Mapand

Reduce side Joins to optimize Query).

Advance Hive, NoSQL Databases and HBase

  • Hive: Data manipulation with Hive, User Defined
  • Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive, Hadoop Project

Hive Scripting, HBase:

  • Introduction to HBase, Client API’s and their features, Available Client, HBase Architecture, MapReduce Integration.

 HBase and ZooKeeper

  • HBase: Advanced Usage, Schema Design, Advance Indexing,
  • Coprocessors, Hadoop Project: HBase tables The ZooKeeper Service: Data Model, Operations, Implementation, Consistency, Sessions, States.

Hadoop 2.0, MRv2 and YARN –

  • Schedulers:Fair and Capacity, Hadoop 2.0 New Features:
  • NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN, Upgrade your existing MRv1 code to MRv2, Programming in YARN framework.

Hadoop Project Environment and Apache Oozie

  • In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs

Overview of Other Hadoop Frameworks