Storage is running out in the world, everyday there is less and less space left for people to share their photos or save their videos but still we as an end user do not notice this change, we practically have access to an infinite amount of storage on the internet but how is that possible? The answer lies in the technology named Hadoop where you will learn how to create your own Data Center with minimal efforts, you will be able to create and deploy your own cluster of many computers or in other words you will be able to create your own “SUPER COMPUTER” .
Our courses are planned in such a way that even a Non tech person can start from scratch .
What is Big Data?
Hadoop – Why Hadoop, Scaling, Distributed Framework, Hadoop v/s RDBMS, Brief history of Hadoop.
Hadoop Architecture v1 and v2
Hadoop Cluster Configuration
Hadoop MapReduce framework v1 –
- Hadoop Data Types,
- Map and Reduce tasks,
- MapReduce Execution Framework,
- Input Formats (Input Splits and Records, Text Input,
- Binary Input, Multiple Inputs),
- Output Formats (TextOutput, BinaryOutPut, Multiple Output),
- MapReduce Programming.
Hive and HiveQL –
- Hive Architecture and Installation, Comparison with Traditional Database
- Data Types, Operators and Functions, Hive Tables(Managed Tables and External Tables,
- Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables),
- Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, Views, Mapand
Reduce side Joins to optimize Query).
Advance Hive, NoSQL Databases and HBase –
- Hive: Data manipulation with Hive, User Defined
- Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive, Hadoop Project
Hive Scripting, HBase:
- Introduction to HBase, Client API’s and their features, Available Client, HBase Architecture, MapReduce Integration.
HBase and ZooKeeper –
- HBase: Advanced Usage, Schema Design, Advance Indexing,
- Coprocessors, Hadoop Project: HBase tables The ZooKeeper Service: Data Model, Operations, Implementation, Consistency, Sessions, States.
Hadoop 2.0, MRv2 and YARN –
- Schedulers:Fair and Capacity, Hadoop 2.0 New Features:
- NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN, Upgrade your existing MRv1 code to MRv2, Programming in YARN framework.
Hadoop Project Environment and Apache Oozie –
- In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.
Overview of Other Hadoop Frameworks
- Apache Storm
- Cloudera Hadoop with Cloudera Manager
- Amazon EMR