Training Certification Duration: 2 Weeks | 4 Weeks |

Internship Certification Duration: 6 Weeks | 4 Months | 6 Months

Title : Data Analysis and Bigdata Processing with Apache Hadoop (3 Days)

Panorama of the Course Content:

  • Build Your Own Super Computer for Huge Data processing

  • Implementation of HDFS for data distribution

  • Writing PIG Scripts

  • Deep Dive with Hive the Warehouse service

  • Map Reduce: The Mapper & Reducer Framework

  • NoSQL service with Hbase and Cassendra

  • Hadoop Over cloud Overview

  • Data Graph Service with R Lang

  • Data Mining and Data science future

Big Data Hadoop : The Idea Behind Super Computing

Day 1: Linux – The Choice of developers and Security Experts

Session 1:

  • Professional level Understanding of Linux

  • The key role player in Cloud Technology

  • Understanding concept between GUI and CLI

  • Security and Hacking tricks

Session 2 :

  • IO redirection and pipeline

  • soft and hard links with Inode table

  • user and account security management

  • setup backup and restore them with tar and zip

  • Installing of software with yum and rpm

  • Using remote login with SSH and telnet

  • Packet analysis with Tcpdump and Wire shark

Day 2 : Understanding data science and logic behind

Session 1:

  • Understanding the concept of Big Data

  • Cluster Management with HDFS

  • Basic operation with HDFS

  • Storage Quota management

  • Multi node cluster and data node management

  • Data graphs with R Lang

Session 2 :

  • User based Quota

  • Multi node cluster with Map Reduce setup

  • Job Scheduling with MR

  • Deep dive with Jobs and its priority management

  • Sudoku solver and random data generation method

  • Word count and pattern matching jobs

  • Write your program for mapper and reducer using python

Day 3 : Big data – The Live data streaming

Session 1 :

  • Apache spark and its use case

  • Hive SQL data warehouse

  • Apache cluster over Hadoop framework

  • Using Scala for analysis

  • Hadoop Version 2 cluster setup

Session 2 :

  • Using Pyspark

  • Live data streaming from twitter

  • Hbase cluster for NoSQL database

  • Hadoop v3 setup

  • Yarn and its use case


WhatsApp chat