Site institutionnel du CNRS|attribut_html
Le rendez-vous de l’innovation

Les Partenaires
des Innovatives

Ministère de l'éducation nationale de l'enseignement supérieur et de la recherche ABG - L'intelli'agence Fist SA

Rechercher




Accueil > Big Data > Innovatives Big Data > Espace Exposition - Posters > Traitement - visualisation des données

Indexing and Mining Very Large Collections of Data Series with Varying Lengths

par Benjamin LEVAUX - WEBMASTER FREELANCE - publié le , mis à jour le

Indexing and Mining Very Large Collections of Data Series with Varying Lengths

Michele Linardi - PhD student, michele.linardi chez parisdescartes.fr
Themis Palpanas - Professor, themis chez mi.parisdescartes.fr

Paris Descartes University

Indexing and Mining Very Large Collections of Data Series with Varying Lengths

Data series are one of the most common data types, and are present in virtually every scientific and social domain. Data series analytics (e.g., clustering, classification, frequent patterns, outlier detection, etc.) represents an important challenge for very large collections. Previous works have demonstrated that indexing techniques enable scalable analytics by providing fast similarity search. Nevertheless, these techniques only work for a fixed, predetermined length for the indexed data series and the queries, which is a major shortcoming. In this work, we remove this constraint, and present the first index that can efficiently support queries of varying length. The proposed index works both for data series that are normalized and non-normalized. We address two major challenges. First, we show how we can effectively use the information that already resides within traditional indexes, in order to answer queries of varying length without increasing the size of the index. Second, we provide an efficient method for dealing with normalized data series by grouping neighboring subsequences under a common representation, which leads to a small index footprint and fast query answering times. The empirical evaluation of the proposed technique demonstrates the effectiveness and efficiency of our solution. Apart from the poster, which will present the theoretical background of our approach, we will also present a prototype system that implements the proposed approach. This system allows efficient exploration of big data series collections. Users can pose queries using their mouse (or touch screen), or select queries from a predefined set. The system can execute queries of varying lengths on large multi-gigabyte datasets in seconds, using a commodity laptop.
During our presentation/demonstration we may be able to show some relevant data and results from our collaboration with EDF (group of Dr. Georges Hebrail).

Téléchargez le poster

Partenaires Innovatives Big Data

EDF Thalès Cap Digital

Avec le soutien du GDR

Madics