SONN

Symbolic Objects K-Nearest Neighbour classifier

 

A short description  
Architecture of the system  
Experiment: “Mushroom data set”   
Experiment: “Dermatology data set”   
Experiment: Adult data set
Using SONN system
Related publications
Acknowledgments

 

A short description

SONN is a prototype system for classification of Symbolic Objects (SOs) by mean of K-NN algorithm.

The problem solved by SONN can be formally stated as follows:

Given

Classify each test example SO finding the associated class C'.

SOs are aggregated data described by the triple (Y,R,d) where:

There are two main kinds of SOs: Boolean and Probabilistic. In this last case, a probability distribution is associated to each description. SONN system works on both SOs type. The classical K-NN algorithm has been extended in order to be applied to a new kind of data. The most important features of the extended version vs. the classical one are: the automated selection of optimal K on the basis of cross-validation, the local distance weighted, the use of non-eucliedean dissimilarity measures between SOs and the output, for each test example, of the list of all classes with an associated probability (a symbolic modal variable) instead of the single class value.

 

Top of this page  


Architecture of the system

SONN architecture is very simple like its working. It is a wizard application to guide the user in selecting all the parameters needed for the classification. At the beginning the user selects the input file containing the SOs, then selects the class, the symbolic variables of interest for the classification, the percentage of training examples the system chooses randomly among the all SOs, the number of folders to find the optimal K and, finally, the dissimilarity measure to evaluate the K nearest neighbours training examples.

Top of this page  


Experiment: “Mushroom data set”

The problem is to classify different mushrooms family in two categories: poisonous and not.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Dermatology data set”

The problem is to classify groups of patients according skin deseases.
There are 2 experiments, with different types of inputs:

Top of this page  


Experiment: “Adult data set”

The problem is to classify groups individuals according two bands of income.
There are 2 experiments, with different types of inputs:

Top of this page  


Using SONN system

SONN.exe. The SONN system

DissDLL.dll The library of dissimilarity measures

To use SONN system, download the executable file and the library in the same folder. Then start the system choosing one of the downloaded input file.

Warning: SONN system is free for evaluation, research and teaching purposes, but not for commercial purposes. 
Please Acknowledge

Top of this page  


Related publications

Top of this page


Acknowledgments

SONN system has been implemented within the context of the following projects: