Upcoming Talks

The Large-Scale Lunch is a monthly lunchtime presentation and discussion about the use of cluster, distributed, parallel, and other large-scale computing methods to solve current applied computer science research problems. The LSL is intended to help spread knowledge about how to do research that involves massive amounts of data and/or computation on modern, multiprocessor computing equipment. The ideal talk provides two types of insight:

  • insight into the basic research problem, why it is important, and how it was addressed, and
  • insight into the large-scale implementation problem, and how it was addressed.

Talks are typically informal, and are followed (or perhaps interrupted) by questions and discussion. A typical talk is about 30 minutes of content, to allow time for discussion and getting lunch.

This series is sponsored by Yahoo! and open to all interested members of the SCS/ECE/etc. communities. We hope these events will be especially useful for those currently using or planning to use Hadoop on the M45 cluster.

We will not be sending regular emails to these lists to inform about future events, so if you'd like to be notified of upcoming events, please join our mailing list. To subscribe, just send mail to

     large-scale-request@mailman.srv.cs.cmu.edu

with message body "subscribe". Alternatively, go to the list information page :

     https://mailman.srv.cs.cmu.edu/mailman/listinfo/large-scale

and subscribe there. Also, all information about upcoming events will be displayed on our website.


Automated Problem Diagnosis for Hadoop

Speaker: Prof. Priya Narasimhan (www)
Date: Thursday, July 28
Time: 12:00 noon - 13:00
Location: University Center - Rangos 1
Abstract: Localizing performance problems (what we call "fingerpointing") is essential for distributed systems such as Hadoop that support long-running, parallelized, data-intensive computations over a large cluster of nodes. Manual fingerpointing does not scale in such environments because of the number of nodes and the number of performance metrics to be analyzed on each node. ASDF is an automated, online fingerpointing framework that transparently extracts and parses different time-varying data sources (e.g., sysstat, Hadoop logs) on each node, and implements multiple techniques (e.g., log analysis, correlation, clustering) to analyze these data sources jointly or in isolation. ASDF is intended to run transparently to, and not require any modifications of, both the hosted applications and the middleware (e.g., Hadoop) itself. ASDF should be deployable in production environments, where administrators might not have the luxury of instrumenting applications but could instead leverage other (black-box) data or existing system logs.We describe ASDF's online fingerpointing for documented performance problems in Hadoop, under different workloads. Our preliminary results indicate that ASDF incurs an average monitoring overhead of 0.38% of CPU time, and exhibits average online fingerpointing latencies of less than 1 minute with false-positive rates of less than 1%. Publications related to our fingerpointing research are available at http://www.ece.cmu.edu/~fingerpointing.

Syndicate content