Summarization and interpolation

A short description  
The distribution package  
Related publications 
Authors & Aknowledgement 


A short description

Emerging real life applications, such as environmental compliance, ecological studies and meteorology, are characterized by real-time data acquisition through a number of (wireless) remote sensors. Operatively, remote sensors are installed across a spatially distributed network; they gather information along a number of attribute dimensions and periodically feed a central server with the measured data. The server is required to monitor these data, issue possible alarms or compute fast aggregates. As data analysis requests, which are submitted to a server, may concern both present and past data, the server is forced to store the entire stream. But, in the case of massive streams (large networks and/or frequent transmissions), the limited storage capacity of a server may impose to reduce the amount of data stored on the disk. One solution to address the storage limits is to compute summaries of the data as they arrive and use these summaries to interpolate the real data which are discarded instead. On any future demands of further analysis of the discarded data, the server pieces together the data from the summaries stored in database and processes them according to the requests. 
In order to achieve this issue, we have designed a summarization technique, called SUMATRA, which segments the stream into windows, computes summaries window-by-window and stores these summaries in a database and an interpolation technique, called TRECI, which uses the inverse distance weighting approach to approximate observed data and to estimate missing data from trend clsuters. Trend clusters are discovered as summaries of each window. They are clusters of georeferenced data which vary according to a similar trend along the window time horizon.


The distribution package

Both SUMATRA and TRECI are implemented in a Java system. It iterfaces MySQL database to read the network structure (nodes and arcs). 

jar Description
SUMATRA/TRECI This rar bundle contains (1) SumatraTreci.jar that allows us to :(i) perform the trend cluster discovery, in order to summarize a geophysical data stream and (ii) compute interpolations of the unknown data from the trend cluster summarization; (2) setup files and (3) a benchmark data steam (NDBC)

Warning: Both SUMATRA and TRECI are free for evaluation, research and teaching purposes, but not for commercial purposes. 
Please Acknowledge

Top of this page



Related publications

Annalisa Appice, Anna Ciampi, Donato Malerba
Summarizing numeric spatial data streams by trend cluster discovery. Data Min. Knowl. Discov. 29(1): 84-136 (Online 2013, Printed 2015) 

Annalisa Appice, Anna Ciampi, Donato Malerba, Pietro Guccione
Using trend clusters for spatiotemporal interpolation of missing data in a sensor network. J. Spatial Information Science 6(1): 119-153 (2013)

Annalisa Appice, Anna Ciampi, Fabio Fumarola, Donato Malerba
Data Mining Techniques in Sensor Networks - Summarization, Interpolation and Surveillance. Springer Briefs in Computer Science, Springer 2014, ISBN 978-1-4471-5453-2, pp. I-XIII, 1-105

Top of this page

 Project team


Name Email address Tel. number Fax
Annalisa Appice +39 080 5443262 +39 080 5443262

KDDE  Template

KDDE presentations have to be based on this template.

Group members and students who are taking a degree, are invited to use it.

Discovery Science 2016

The 19th International Conference on Discovery Science (DS 2016) will be held in Bari on October 2016, 19th-21st. KDDE Group is organizing it.

ALT 2016

Algorithmic Learning Theory 2016

Bari, Italy, 19-21 October, 2016.

Powered by CMSimple| Template:| Login