homecloud computingcloud data Processingcloud of thingsclound of networks
cloud data processing

Advanced Map Reduce Programming

Duration: 5 Days

Course Background

The purpose of this workshop is to explore various patterns and paradigms involving more advanced MapReduce programming. The programming language used will be Java. The course can be tailored to include Pythonic map reduce and Ruby map reduce techniques if needed. The emphasis of the course will be on understanding the various algorithms used in exploring very large data sets and how they are realised using MapReduce applications

Course Prerequisites and Target Audience

Attendees are expected to have a good working knowledge of Java and Hadoop as well as a working knowledge of MapReduce programming.

Course Outline

  • Intensive overview of Hadoop and MapReduce
  • Functional Programming Aspects of MapReduce
    • Mappers and Reducers
    • The Execution Framework
    • Partitioners and Combiners
    • The Distributed File System
    • Hadoop Cluster Architecture
  • Designing MapReduce Algorithms - By Example
    • Local Aggregation
    • Pairs and Stripes
    • Computing Relative Frequencies
    • Secondary Sorting
    • Relational Joins - Reduce-Side and Map-Side joins and Memory backed joins
  • Inverting Indexing and Text Retrieval
    • Crawling the Web
    • Introduction to Inverted Indexes
    • Implementation of Inverted Indexes
    • Index compression
  • Graph Algorithms and MapReduce
    • Graph Representations
    • Parallel Breadth First Search
    • Page Ranking Applications
    • Limitations of Graph Processing using MapReduce
  • Text Data Mining and Expectation Maximisation (EM)
    • Overview of Expectation Maximisation - Concepts and Theory
    • Hidden Markov Models (HMMs)
    • HMM Training in MapReduce
    • Gradient-Based Optimisation and Log-Linear Models