Automated problem diagnosis for Hadoop

Speaker: Soila Pertet
Date: Wednesday, October 21st
Time: 12:00 noon - 1:00pm
Location: Gates Hillman Complex 4405

Abstract:
Performance problems in software frameworks such as Hadoop, which
support long-running, parallelized, data-intensive computations, can
hamper cost-management efforts in cloud-computing environments. Manual
diagnosis does not scale in such environments because of the number of
nodes and the number of performance metrics to be analyzed on each
node. This talk provides an overview of our group’s research in
automated problem diagnosis (what we call "fingerpointing") for
Hadoop. We discuss three aspects of our research namely: (i) a
diagnosis approach that synthesizes resource usage data from the OS
and task-execution flows from the Hadoop logs to diagnose problems,
(ii) an automated, online diagnosis framework that transparently
extracts different time-varying data sources and implements our
diagnosis algorithms as plug-in modules, and (iii) visualization tools
for Hadoop that provide programmers insight into the execution
patterns of their jobs. Our visualization tools have been checked into
the Hadoop repository under the Chukwa project.