Big Data Hadoop Training with project in banglore

Master the skills of programming large data using Hadoop and learn advanced models like MapReduce, Yarn, Flume, Oozie, Impala, Zookeper while working on hands-on exercises and case studies

Key Features 

  • This is a combo course including:
  1. Hadoop Developer Training
  2. Hadoop Analyst Training
  3. Hadoop Administration Training
  4. Hadoop Testing Training
  • 70 hours of High-Quality in-depth Video E-Learning Sessions
  • 90 hours of Lab Exercises
  • Intellipaat Proprietary VM and free cloud access for 6 months for performing exercises
  • 70% of extensive learning through Hands-on exercises , Project Work , Assignments and Quizzes
  • The training will prepare you for Cloudera Certification: CCDH, CCAH as well as learners can learn how to work with Hortonworks and MapR Distributions
  • 24X7 Lifetime Support with Rapid Problem Resolution Guaranteed
  • Lifetime Access to Videos, Tutorials and Course Material
  • Guidance to Resume Preparation and Job Assistance
  • Step -by- step Installation of Software
  • Course Completion Certificate from Intellipaat

About Hadoop Training Course

It is an all-in-one course designed to give a 360 degree overview of Hadoop Architecture and its implementation on real-time projects. The major topics include sHadoop and its Ecosystem, core concepts of MapReduce and HDFS, Introduction to HBase Architecture, Hadoop Cluster Setup, Hadoop Administration and Maintenance. The course further includes advanced modules like Yarn, Flume, Hive, Oozie, Impala, Zookeeper and Hue.

Learning Objectives

After completion of this Hadoop all-in-one course, you will be able to:

  • Excel in the concepts of Hadoop Distributed File System (HDFS)
  • Implement HBase and MapReduce Integration
  • Understand Apache Hadoop2.0 Framework and  Architecture
  • Learn to write complex MapReduce programs in both MRv1 and Mrv2
  • Design and develop applications involving large data using Hadoop Ecosystem
  • Set up Hadoop infrastructure with single and multi-node clusters using Amazon ec2 (CDH4)
  • Monitor a Hadoop cluster and execute routine administration procedures
  • Learn ETL connectivity with Hadoop, real-time case studies
  • Learn to write Hive and Pig Scripts and work with Sqoop
  • Perform data analytics using Yarn
  • Schedule jobs through Oozie
  • Master Impala to work on real-time queries on Hadoop
  • Deal with Hadoop component failures and discoveries
  • Optimize Hadoop cluster for the best performance based on specific job requirements
  • Derive insight into the field of Data Science
  • Work on a Real Life Project on Big Data Analytics and gain hands-on Project Experience

Recommended Audience

  • Programming Developers and System Administrators
  • Project managers eager to learn new techniques of maintaining large data
  • Experienced working professionals aiming to become Big Data Analysts
  • Mainframe Professionals, Architects & Testing Professionals
  • Graduates, undergraduates and working professionals eager to learn the latest Big Data technology

Pre-Requisites:

Some prior experience any Programming Language would be good. Basic commands knowledge of UNIX, sql scripting. Prior knowledge of Apache Hadoop is not required.

Why Take Big Data Hadoop Course?

  • Hadoop is a combination of online running applications on a very huge scale built of commodity hardware.
  • It is handled by Apache Software Foundation and helpful in handling and storing huge amounts of data in cost-effective manner.
  • Big, multinational companies like Google, Yahoo, Apple, eBay, Facebook and many others are hiring skilled professionals capable of handling Big Data.
  • Experts in Hadoop can manage complete operations in an organization.
  • This course provides hands-on exercises on End-to-End POC using Yarn or Hadoop 2.
  • You will be equipped with advance Map Reduce exercises including examples of Facebook, Sentiment Analysis, LinkedIn shortest path algorithm, Inverted indexing.

Module 1 – Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

  • Big Data, Factors constituting Big Data
  • Hadoop and Hadoop Ecosystem
  • Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
  • Hadoop Distributed File System (HDFS) Concepts and its Importance
  • Deep Dive in Map Reduce – Execution Framework, Partitioner, Combiner, Data Types, Key pairs
  • HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
  • Parallel Copying with DISTCP, Hadoop Archives

Assignment – 1

Module 2 – Hands on Exercises

  • Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
  • Accessing HDFS from Command Line
  • Map Reduce – Basic Exercises
  • Understanding Hadoop Eco-system
  • Introduction to Sqoop, use cases and Installation
  • Introduction to Hive, use cases and Installation
  • Introduction to Pig, use cases and Installation
  • Introduction to Oozie, use cases and Installation
  • Introduction to Flume, use cases and Installation
  • Introduction to Yarn

Assignment – 2 and 3

Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive

Module 3 – Deep Dive in Map Reduce and Yarn

  • How to develop Map Reduce Application, writing unit test
  • Best Practices for developing and writing, Debugging Map Reduce applications
  • Joining Data sets in Map Reduce
  • Hadoop API’s
  • Introduction to Hadoop Yarn
  • Difference between Hadoop 1.0 and 2.0

Module 3.1

  • Project 1- Hands on exercise – end to end PoC using Yarn or Hadoop 2.
    1. Real World Transactions handling of Bank
    2. Moving data using Sqoop to HDFS
    3. Incremental update of data to HDFS
    4. Running Map Reduce Program
    5. Running Hive queries for data analytics
  • Project 2- Hands on exercise – end to end PoC using Yarn or Hadoop 2.0

Running Map Reduce Code for Movie Rating and finding their fans and average rating

Assignment – 4 and 5

Module 4 – Deep Dive in Pig

1. Introduction to Pig

  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

2. Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions
  • Hands-On Exercise: Using Pig for ETL Processing

3. Processing Complex Data with Pig

  • Complex/Nested Data Types
  • Grouping
  • Iterating Grouped Data
  • Hands-On Exercise: Analyzing Data with Pig

4. Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets
  • Hands-On Exercise

5. Extending Pig

  • Macros and Imports
  • UDFs
  • Using Other Languages to Process Data with Pig
  • Hands-On Exercise: Extending Pig with Streaming and UDFs

6. Pig Jobs

Case studies of Fortune 500 companies which are Electronic Arts and Walmart with real data sets.

Assignment – 6

Module 5 – Deep Dive in Hive

1. Introduction to Hive

  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive

2. Relational Data Analysis with Hive

  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built-in Functions
  • Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

3. Hive Data Management

  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • Hands-On Exercise: Data Management with Hive

4. Hive Optimization

  • Understanding Query Performance
  • Partitioning
  • Bucketing
  • Indexing Data

5. Extending Hive

  • User-Defined Functions

6. Hands on Exercises – Playing with huge data and Querying extensively.

7. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning

Assignment – 7

Module 6 – Introduction to Hbase architecture

  • What is Hbase
  • Where does it fits
  • What is NOSQL

Assignment -8

Module 7 – Hadoop Cluster Setup and Running Map Reduce Jobs

  • Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
  • Running Map Reduce Jobs on Cluster

Module 8 – Major Project – Putting it all together and Connecting Dots

  • Putting it all together and Connecting Dots
  • Working with Large data sets, Steps involved in analyzing large data

Assignment – 9, 10

Module 9 – Advance Mapreduce

  • Delving Deeper Into The Hadoop API
  • More Advanced Map Reduce Programming, Joining Data Sets in Map Reduce
  • Graph Manipulation in Hadoop

Assignment – 11, 12

Module 10 – Impala

1. Introduction to Impala

  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell

2. Choosing the Best (Hive, Pig, Impala)

Module 11 – ETL Connectivity with Hadoop Ecosystem

  • How ETL tools work in Big data Industry
  • Connecting to HDFS from ETL tool and moving data from Local system to HDFS
  • Moving Data from DBMS to HDFS
  • Working with Hive with ETL Tool
  • Creating Map Reduce job in ETL tool
  • End to End ETL PoC showing Hadoop integration with ETL tool.

Module 12 – Hadoop Cluster Configuration

  • Hadoop configuration overview and important configuration file
  • Configuration parameters and values
  • HDFS parameters MapReduce parameters
  • Hadoop environment setup
  • ‘Include’ and ‘Exclude’ configuration files

Lab: MapReduce Performance Tuning

Module 13 – Hadoop Administration and Maintenance

  • Namenode/Datanode directory structures and files
  • File system image and Edit log
  • The Checkpoint Procedure
  • Namenode failure and recovery procedure
  • Safe Mode
  • Metadata and Data backup
  • Potential problems and solutions / what to look for
  • Adding and removing nodes

Lab: MapReduce File system Recovery

Module 14 – Hadoop Monitoring and Troubleshooting

  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster

Module 15 – Job Scheduling

  • How to schedule Hadoop Jobs on the same cluster
  • Default Hadoop FIFO Schedule
  • Fair Scheduler and its configuration

Module 16 – Hadoop Multi Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2

  • Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
  • Running Map Reduce Jobs on Cluster

Module 17 – ZOOKEEPER

  • ZOOKEEPER Introduction
  • ZOOKEEPER use cases
  • ZOOKEEPER Services
  • ZOOKEEPER data Model
  • Znodes and its types
  • Znodes operations
  • Znodes watches
  • Znodes reads and writes
  • Consistency Guarantees
  • Cluster management
  • Leader Election
  • Distributed Exclusive Lock
  • Important points 

Module 18 – Advance Oozie

  • Why Oozie?
  • Installing Oozie
  • Running an example
  • Oozie- workflow engine
  • Example M/R action
  • Word count example
  • Workflow application
  • Workflow submission
  • Workflow state transitions
  • Oozie job processing
  • Oozie- HADOOP security
  • Why Oozie security?
  • Job submission to hadoop
  • Multi tenancy and scalability
  • Time line of Oozie job
  • Coordinator
  • Bundle
  • Layers of abstraction
  • Architecture
  • Use Case 1: time triggers
  • Use Case 2: data and time triggers
  • Use Case 3: rolling window 

Module 19 – Advance Flume

  • Apache Flume
  • Big data ecosystem
  • Physically distributed Data sources
  • Changing structure of Data
  • Closer look
  • Anatomy of Flume
  • Core concepts
  • Event
  • Clients
  • Agents
  • Source
  • Channels
  • Sinks
  • Interceptors
  • Channel selector
  • Sink processor
  • Data ingest
  • Agent pipeline
  • Transactional data exchange
  • Routing and replicating
  • Why channels?
  • Use case- Log aggregation
  • Adding flume agent
  • Handling a server farm
  • Data volume per agent
  • Example describing a single node flume deployment 

Module 20 – Advance HUE

  • HUE introduction
  • HUE ecosystem
  • What is HUE?
  • HUE real world view
  • Advantages of HUE
  • How to upload data in File Browser?
  • View the content
  • Integrating users
  • Integrating HDFS
  • Fundamentals of HUE FRONTEND 

Module 21 – Advance Impala

  • IMPALA Overview: Goals
  • User view of Impala: Overview
  • User view of Impala: SQL
  • User view of Impala: Apache HBase
  • Impala architecture
  • Impala state store
  • Impala catalogue service
  • Query execution phases
  • Comparing Impala to Hive

Testing

Module 22 – Hadoop Stack Integration Testing

  • Why Hadoop testing is important
  • Unit testing
  • Integration testing
  • Performance testing
  • Diagnostics
  • Nightly QA test
  • Benchmark and end to end tests
  • Functional testing
  • Release certification testing
  • Security testing
  • Scalability Testing
  • Commissioning and Decommissioning of Data Nodes Testing
  • Reliability testing
  • Release testing

Module 23 – Roles and Responsibilities of Hadoop Testing 

  • Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion.
  • ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation.
  • User Authorization and Authentication testing (Groups, Users, Privileges etc)
  • Report defects to the development team or manager and driving them to closure.
  • Consolidate all the defects and create defect reports.
  • Validating new feature and issues in Core Hadoop.

Module 24 – Framework called MR Unit for Testing of Map-Reduce Programs

  • Report defects to the development team or manager and driving them to closure.
  • Consolidate all the defects and create defect reports.
  • Validating new feature and issues in Core Hadoop
  • Responsible for creating a testing Framework called MR Unit for testing of Map-Reduce programs.

Module 25 – Unit Testing

  • Automation testing using the OOZIE.
  • Data validation using the query surge tool.

Module 26 – Test Execution of Hadoop Customized

  • Test plan for HDFS upgrade
  • Test automation and result

Module 27 – Test Plan Strategy Test Cases of Hadoop Testing

  • How to test install and configure

Module 28 – High Availability Federation, Yarn and Security

Module 29 – Job and Certification Support

  • Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

Course Duration: 70 Hrs

High quality interactive e-learning sessions for Self paced course. For online instructor led training, total course will be divided into sessions.

Hands on Exercise and Project Work: 90 Hrs

Each module will be followed by practical assignments and lab exercises. Towards the end of the course, you will be working on a project where would be expected to complete a project based on your learning. Our support team is available to help through email, phone or Live Support for any help required.

Access Duration: Lifetime

You will get Lifetime access to high quality interactive e-Learning Management System . Life time access to Course Material. There will be 24/7 access to video tutorials along with online interactive sessions support with trainer for issue resolving.

24 X 7 Support

We provide 24X7 support by email for issues or doubts clearance for Self-paced training.

In online Instructor led training, trainer will be available to help you out with your queries regarding the course. If required, the support team can also provide you live support by accessing your machine remotely. This ensures that all your doubts and problems faced during labs and project work are clarified round the clock.

Get Certified

This course is designed for clearing Cloudera Certified Developer for Apache Hadoop (CCDH). At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with NEXGEN Course Completion certificate.

This course is designed for clearing Cloudera Certified Administrator for Apache Hadoop (CCAH). At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with NEXGEN Course Completion certificate.

Job Assistance

NEXGEN enjoys strong relationship with multiple staffing companies in US, UK and have +60 clients across the globe. If you are looking out for exploring job opportunities, you can pass your resumes once you complete the course and we will help you with job assistance. We don’t charge any extra fees for passing the resume to our partners and clients

WHAT ARE VARIOUS BIG DATA HADOOP PROFESSIONAL TITLES?

Hadoop Architect: Hadoop Architect is a professional who organizes, manages and governs Hadoop on a very large cluster. The most important thing Hadoop Architect must have is rich experience in Hive, HBase, MapReduce, PIG and so on. Hadoop Developers: Hadoop Developer is a person who just loves programming and he must have knowledge about Core, Java, SQL and other languages along with remarkable skills. Hadoop QA Professional : Hadoop QA professional is a person who tests and rectify glitches in Hadoop Hadoop Administrator: Hadoop Administrator is a person who admins Hadoop and its Data base system. He has a well and good understanding of Hadoop principles and its hardware systems. Others: There can be some other jobs which could be assigned to some other professional as well. For example there can be a Hadoop trainer, Hadoop consultant, Hadoop engineers & also senior Hadoop engineers, big data Engineers, Hadoop developers and also Java Engineers (DSE Team).

WHAT PLATFORMS AND JAVA VERSIONS DOES HADOOP RUN ON?

Java 1.6.x or higher, preferably from Sun -see Hadoop JavaVersions Linux and Windows are the supported operating systems, but BSD, Mac OS/X, and Open Solaris are known to work.

WHAT IS NEXGEN SELF-PACED TRAINING?

In NEXGEN self-paced training program you will receive recorded sessions, course material, Quiz, related software’s and assignments.The courses are designed such that you will get real world exposure and focused on clearing relevant certification exam. After completion of training you can take quiz which enable you to check your knowledge and enables you to clear relevant certification at higher marks/grade also you will be able to work on the technology independently.

HOW LONG DO I HAVE ACCESS TO SELF-PACED COURSES?

Lifetime.

WHAT ARE THE BENEFITS OF NEXGEN SELF-PACED TRAINING?

All Courses are highly interactive to provide good exposure. You can learn at your own place and at your leisure time. Prices of self-paced is training is 75% cheaper than online training. You will have lifetime access hence you can refer it anytime during your project work or job.

IS THERE ANY SAMPLE VIDEO I CAN SEE BEFORE ENROLLING TO THE COURSE?

Yes, at the top of the page of course details you can see sample videos.

HOW SOON AFTER SIGNING UP WOULD I GET ACCESS TO THE LEARNING CONTENT?

As soon as you enroll to the course, your LMS (The Learning Management System) Access will be Functional. You will immediately get access to our course content in the form of a complete set of previous class recordings, PPTs, PDFs, assignments and access to our 24×7 support team. You can start learning right away.

WILL GET I ASSISTANCE OR SUPPORT IN SELF-PACED COURSES?

24/7 access to video tutorials and Email Support along with online interactive session support with trainer for issue resolving.

AT ANY STAGE, CAN I MOVE TO ONLINE TRAINING COURSE FROM SELF-PACED COURSE?

Yes, You can pay difference amount between Online training and Self-paced course and you can be enrolled in next online training batch.

WILL I GET THE SOFTWARE’S?

Yes, we will provide you the links of the software to download which are open source and for proprietary tools we will provide you trail version if available.

I AM NOT BEING ABLE TO ACCESS THE ONLINE COURSE. WHOM SHOULD I CONTACT FOR A SOLUTION?

Please send an email . You can also chat with us to get an instant solution.

HOW ARE YOUR VERIFIED CERTIFICATES AWARDED?

NEXGEN verified certificates will be awarded based on successful completion of course projects. There are set of quizzes after each Couse module that you need to go through . After successful submission, official NEXGEN verified certificate will be given to you.

ARE THESE CLASSES CONDUCTED VIA LIVE VIDEO STREAMING?

Classes are conducted via LIVE Video Streaming, where you get a chance to meet the instructor by speaking, chatting and sharing your screen. You will always have the access to videos and PPT. This would give you a clear insight about how the classes are conducted, quality of instructors and the level of Interaction in the Class.

IS THERE ANY OFFER / DISCOUNT I CAN AVAIL?

Yes, We do keep launching multiple offers, please see offer page.

WHAT HAPPEN IF I DIDN’T CLEAR CERTIFICATION EXAM IN FIRST ATTEMPT?

We will help you with the issue and doubts regarding the course. You can attempt the quiz again.

Big Data Hadoop Training

Big Data projects in pondicherry

No-sql Cassandra Hbase MongoDB Training

No-sql Cassandra Hbase MongoDB Training

Big Data Hadoop, Spark, Storm, Scala Training – Combo

Big Data projects in pondicherry

Apache Spark, Scala, Storm Training

big data projects in pondicherry