Big Data

Big data is a field that treats of ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

1. Intro with ZooKeeper

The limitations of the existing solutions for Big Data problem.
How Hadoop solves the Big Data problem?
IBM’s 4V’s
Types of Data
Installation Of Cloudera and VMWare
How to setup single node hadoop cluster
Describing the functions and features of HDP
Listing the IBM value-add components
Explaining what IBM Watson Studio is?
Giving a brief description of the purpose of each of the value-add components
Exploring the lab environment
Describing and compare the open source programming languages, Pig and Hive
Listing the characteristics of programming languages typically used by Data Scientists: R and Python
Understanding the challenges posed by distributed applications and how ZooKeeper is designed to handle them.
Explaining the role of ZooKeeper within the Apache Hadoop infrastructure and the realm of Big Data management.
Exploring generic use cases and some real-world scenarios for ZooKeeper.
Defining the ZooKeeper services that are used to manage distributed systems.
Exploring and use the ZooKeeper CLI to interact with ZooKeeper services.
Understanding how Apache Slider works in conjunction with YARN to deploy distributed applications and to monitor them.

2. HDFS Architecture

Hadoop Ecosystem
Linux based Commands(How to work with local file system comands)
Hadoop Commands
Sqoop
- Sqoop intro
- How to display tables from Rdbms Mysql for Sqoop?
- How to display Database from Rdbms mysql for Sqoop?
- How to import all tables from a specific database from RDBMS Mysql to HDFS(Hadoop)?
- How to import data from RDBMS MYSQL FROM HDFS(HADOOP)?
- How to export data from HDFS TO RDBMS MYSQL?
- How to import part of the table from RDBMS MYSQL TO HDFS?
HIVE
- Hive concepts
- Hive Data types
- Hive Background
- About Hive
- Hive Architecture and Components
- Metastore in Hive
- Limitations of Hive
- Comparison with Traditional Databases

3. PIG

What is Pig?
Pig Run Modes
Pig Latin Concepts
Pig Data Types
Pig Example
Group Operator
COGROUP Operator
Joins
COGROUP

4. HBASE

What is HBase
HBase Model
HBase Read
HBase Write
HBase MemStore
RDBMS vs HBase
HBase Commands
HBase Example

5. Map Reduce

Input Splits in MapReduce
Combiner & Partitioner
What are all the file input formats in hadoop (Mapreduce)?
What type of Key value Pair will be generated our file format is key value text input format?
Can we set Required no of mappers and Reducers?
Differenace between Old and New api in Mapreduce
What is importance of Record Reader in Hadoop?
Map Reduce with Word Count Example

6. Big SQL

Overview of Big SQL
Understanding how Big SQL fits in the Hadoop architecture
Start and stop Big SQL using Ambari and command line
Connecting to Big SQL using command line
Connecting to Big SQL using IBM Data Server Manager
Configuring images
Starting Hadoop components
Start up the Big SQL and DSM services
Connecting to Big SQL using JSqsh
Executing basic Big SQL statements