Big Data

Big Data

Big data is a field that treats of ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

1. Intro with ZooKeeper

  • The limitations of the existing solutions for Big Data problem.
  • How Hadoop solves the Big Data problem?
  • IBM’s 4V’s
  • Types of Data
  • Installation Of Cloudera and VMWare
  • How to setup single node hadoop cluster
  • Describing the functions and features of HDP
  • Listing the IBM value-add components
  • Explaining what IBM Watson Studio is?
  • Giving a brief description of the purpose of each of the value-add components
  • Exploring the lab environment
  • Describing and compare the open source programming languages, Pig and Hive
  • Listing the characteristics of programming languages typically used by Data Scientists: R and Python
  • Understanding the challenges posed by distributed applications and how ZooKeeper is designed to handle them.
  • Explaining the role of ZooKeeper within the Apache Hadoop infrastructure and the realm of Big Data management.
  • Exploring generic use cases and some real-world scenarios for ZooKeeper.
  • Defining the ZooKeeper services that are used to manage distributed systems.
  • Exploring and use the ZooKeeper CLI to interact with ZooKeeper services.
  • Understanding how Apache Slider works in conjunction with YARN to deploy distributed applications and to monitor them.

2. HDFS Architecture

  • Hadoop Ecosystem
  • Linux based Commands(How to work with local file system comands)
  • Hadoop Commands
  •  Sqoop
    • Sqoop intro
    • How to display tables from Rdbms Mysql for Sqoop?
    • How to display Database from Rdbms mysql for Sqoop?
    • How to import all tables from a specific database from RDBMS Mysql to HDFS(Hadoop)?
    • How to import data from RDBMS MYSQL FROM HDFS(HADOOP)?
    • How to export data from HDFS TO RDBMS MYSQL?
    • How to import part of the table from RDBMS MYSQL TO HDFS?
  •  HIVE
    • Hive concepts
    • Hive Data types
    • Hive Background
    • About Hive
    • Hive Architecture and Components
    • Metastore in Hive
    • Limitations of Hive
    • Comparison with Traditional Databases

3. PIG

  • What is Pig?
  • Pig Run Modes
  • Pig Latin Concepts
  • Pig Data Types
  • Pig Example
  • Group Operator
  • COGROUP Operator
  • Joins
  • COGROUP

4. HBASE

  • What is HBase
  • HBase Model
  • HBase Read
  • HBase Write
  • HBase MemStore
  • RDBMS vs HBase
  • HBase Commands
  • HBase Example

5. Map Reduce

  • Input Splits in MapReduce
  • Combiner & Partitioner
  • What are all the file input formats in hadoop (Mapreduce)?
  • What type of Key value Pair will be generated our file format is key value text input format?
  • Can we set Required no of mappers and Reducers?
  • Differenace between Old and New api in Mapreduce
  • What is importance of Record Reader in Hadoop?
  • Map Reduce with Word Count Example

6. Big SQL

  • Overview of Big SQL
  • Understanding how Big SQL fits in the Hadoop architecture
  • Start and stop Big SQL using Ambari and command line
  • Connecting to Big SQL using command line
  • Connecting to Big SQL using IBM Data Server Manager
  • Configuring images
  • Starting Hadoop components
  • Start up the Big SQL and DSM services
  • Connecting to Big SQL using JSqsh
  • Executing basic Big SQL statements