Accèder directement au contenu
13 novembre

Alfred Inselberg Visualization and Data Mining for High Dimensional Datasets

Invité par David Holcman

Le séminaire d’Alfred Inselberg (School of Mathematical Sciences, Tel Aviv University) aura lieu de 14h à 15h30 dans la salle Favard, IBENS 46 rue d’Ulm 75005 Paris

A dataset with M items has 2M subsets anyone of which may be the one satisfying our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization. With parallel coordinates the search for relations in multivariate datasets is transformed into a 2-D pattern recognition problem. Guidelines and strategies for knowledge discovery are illustrated on several real datasets (financial, process control, credit-score and one with hundreds of variables) with stunning results. A geometric classification algorithm, having low computational complexity, provides the classification rule explicitly and visually. The minimal set of variables, features, required to state the rules is found and ordered by their predictive value. Multivariate relations can be modeled as hypersurfaces and used for decision support. A model of a (real) country’s economy reveals sensitivities, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. A smart display for Intensive Care Units determines the patient’s state by the interaction of many variables. An overview of the methodology provides foundational understanding ; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. A topology of proximity emerges opening the way for visualization in Big Data.