Sketching Techniques for Tracking Distributed Data Streams

Minos Garofalakis

There is growing interest in algorithms for querying and analyzing continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure.

This talk will provide a short overview of key small-space pseudo-random sketching techniques for effective query processing over streaming data, focusing on some of my recent work on communication-efficient continuous query tracking over distributed data streams.