SONN

Symbolic Objects K-Nearest Neighbour classifier

 

A short description  
Architecture of the system  
Experiment: “Mushroom data set”   
Experiment: “Dermatology data set”   
Experiment: Adult data set
Using SONN system
Related publications
Acknowledgments

 

A short description

SONN is a prototype system for classification of Symbolic Objects (SOs) by mean of K-NN algorithm.

The problem solved by SONN can be formally stated as follows:

Given

  • a training set of SOs described by p symbolic variables V1,V2,...,Vp and by a single-value variable C (which represents the class),

  • test examples SOs described by the same variables,

Classify each test example SO finding the associated class C'.

SOs are aggregated data described by the triple (Y,R,d) where:

  • Y is the symbolic variables set (Y1,Y2,...,Yp) describing the SO;

  • d, also called intension, is the symbolic descriptions set (d1,d2,...dp) each of which indicates the value or the set of values that the corresponding variables have;

  • R is the relations set (R1,R2,...,Rp) each of which indicates, by mean of a comparison operator (=,>,>,...), the link between a variable and its correspondent description.

There are two main kinds of SOs: Boolean and Probabilistic. In this last case, a probability distribution is associated to each description. SONN system works on both SOs type. The classical K-NN algorithm has been extended in order to be applied to a new kind of data. The most important features of the extended version vs. the classical one are: the automated selection of optimal K on the basis of cross-validation, the local distance weighted, the use of non-eucliedean dissimilarity measures between SOs and the output, for each test example, of the list of all classes with an associated probability (a symbolic modal variable) instead of the single class value.

 

Top of this page  


Architecture of the system

SONN architecture is very simple like its working. It is a wizard application to guide the user in selecting all the parameters needed for the classification. At the beginning the user selects the input file containing the SOs, then selects the class, the symbolic variables of interest for the classification, the percentage of training examples the system chooses randomly among the all SOs, the number of folders to find the optimal K and, finally, the dissimilarity measure to evaluate the K nearest neighbours training examples.

Top of this page  


Experiment: “Mushroom data set”

The problem is to classify different mushrooms family in two categories: poisonous and not.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Dermatology data set”

The problem is to classify groups of patients according skin deseases.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Adult data set”

The problem is to classify groups individuals according two bands of income.
There are 2 experiments, with different types of inputs:

Top of this page  


Using SONN system

SONN.exe. The SONN system

DissDLL.dll The library of dissimilarity measures

To use SONN system, download the executable file and the library in the same folder. Then start the system choosing one of the downloaded input file.

Warning: SONN system is free for evaluation, research and teaching purposes, but not for commercial purposes. 
Please Acknowledge

Top of this page  


Related publications

  • F. Esposito, D. Malerba, & V. Tamma (2000). Dissimilarity Measures for Symbolic Objects. Chapter 8.3 in in H.-H. Bock and E. Diday (Eds.), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data, Series: Studies in Classification, Data Analysis, and Knowledge Organization, vol. 15, Springer-Verlag:Berlin, 165-185.

  • D. Malerba, F. Esposito, V. Gioviale & V. Tamma (2001). Comparing dissimilarity measures in Symbolic Data Analysis. Proceedings of the Joint Conferences on "New Techniques and Technologies for Statistcs" and "Exchange of Technology and Know-how" (ETK-NTTS'01), 473-481.

  • D. Malerba, F. Esposito, M. Monopoli (2002). Comparing dissimilarity measures for probabilistic symbolic objects. In A. Zanasi, C. A. Brebbia, N.F.F. Ebecken, P. Melli (Eds.) Data Mining III, Series Management Information Systems, Vol 6, 31-40, WIT Press, Southampton, UK.

  • C. D'Amato, D. Malerba, F. Esposito, M. Monopoli (2003). Extending the K-Nearest Neighbour classification algorithm to symbolic objects. Convegno Scientifico Intermedio SIS, 9-11 Giugno 2003, Università degli Studi di Napoli "Federico II".

Top of this page


Acknowledgments

SONN system has been implemented within the context of the following projects:

 

KDDE  Template

KDDE presentations have to be based on this template.

Group members and students who are taking a degree, are invited to use it.

Discovery Science 2016

The 19th International Conference on Discovery Science (DS 2016) will be held in Bari on October 2016, 19th-21st. KDDE Group is organizing it.

ALT 2016

Algorithmic Learning Theory 2016

Bari, Italy, 19-21 October, 2016.

Powered by CMSimple| Template: ge-webdesign.de| Login