Big Data/Big Insight


Understanding Big Data and Hadoop



Learning Objectives – In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem component, Anatomy of File Write and Read, how MapReduce Framework works.
Hadoop Architecture and HDFS

Learning Objectives – In this module, you will learn the Hadoop HDFS Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques, how to setup single node hadoop cluster.How to work with local file system comands


Why we need sqoop, How to display tables from Rdbms Mysql for Sqoop, how to display Database from Rdbms mysql for sqoop, how to import all tables from a specific databasefrom RDBMS Mysql to HDFS (Hadoop), How to Import tables from RDBMS MYSQL FROM HDFS(HADOOP), How to export data from HDFS TO RDBMS MYSQL, how to import part of the table from RDBMS MYSQL TO HDF

Hadoop MapReduce Framework

Learning Objectives – In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner ,what are all the file input formats in hadoop(Mapreduce),What type of Keyvalue Pair will be generated our file format is key value text input format,Can we set Required no of mappers and Reducers?,Word Count Job Implementation in Hadoop,How to debugg Word count Job,
Differenace btween Old and New api in Mapreduce,What is importence of RecordReader in Hadoop.


Learning Objectives – This module will help you in understanding Hive concepts,Hive Data types, Loading and Querying Data in Hive,Hive Background, About Hive, Hive Vs Pig, Hive Architecture and Components, Metastore in Hive, Limitations of Hive, Comparison with Traditional Database, Hive Data Types and Data Models,How to Load data into Hive Table in Hadoop, Partitions and Buckets, Hive Tables(Managed Tables and External Tables),Querying Data, Managing Outputs, What is Single and Multitable insrtion


Learning Objectives – In this module, you will learn Pig, types of use case we can
use Pig, Pig Latin scripting, PIG running modes, Pig Streaming, Testing PIG Scripts. Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators,File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP

What is HBase, HBase Model, HBase Read, HBase Write, HBase MemStore, HBase Installation, RDBMS vs HBase, HBase Commands, HBase Example

Oozie and Hadoop Project

Learning Objectives – In this module, you will understand working of multiple Hadoop ecosystem components together in a Hadoop implementation to solve Big Data problems. We will discuss
multiple data sets and specifications of the project. This module will also cover Flume & Sqoop demo, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.


 Introduction to Spark – Getting started, What is Spark and what is its purpose?, Components of the Spark
unified stack, Resilient Distributed Dataset (RDD), Downloading and installing Spark standalone, Scala overview, Launching and using Spark’s Scala shell.

Resilient Distributed Dataset and data Frames, Understand how to create parallelized  collections and external datasets, Work with Resilient Distributed Dataset (RDD) operations, Utilize shared variables and key-value pairs,  Spark application programming, Understand the purpose and usage of the SparkContext, Initialize Spark with the various programming languages, Describe and run some Spark examples


Introduction to Flume, Uses of Flume, Flume Architecture, Flume Master, Flume Collectors Flume Agents,  Flume  using Twitter example.






Topics – Flume and Sqoop Demo, Oozie, Oozie Components, Oozie Workflow,
Scheduling with Oozie, PIG, Hive, and Sqoop,Hadoop Project Demo,