Symbolic Objects K-Nearest Neighbour classifier
A short description |
SONN is a prototype system for classification of Symbolic Objects (SOs) by mean of K-NN algorithm.
The problem solved by SONN can be formally stated as follows:
Given
a training set of SOs described by p symbolic variables V1,V2,...,Vp and by a single-value variable C (which represents the class),
test examples SOs described by the same variables,
Classify each test example SO finding the associated class C'.
SOs are aggregated data described by the triple (Y,R,d) where:
Y is the symbolic variables set (Y1,Y2,...,Yp) describing the SO;
d, also called intension, is the symbolic descriptions set (d1,d2,...dp) each of which indicates the value or the set of values that the corresponding variables have;
R is the relations set (R1,R2,...,Rp) each of which indicates, by mean of a comparison operator (=,>,>,...), the link between a variable and its correspondent description.
There are two main kinds of SOs: Boolean and Probabilistic. In this last case, a probability distribution is associated to each description. SONN system works on both SOs type. The classical K-NN algorithm has been extended in order to be applied to a new kind of data. The most important features of the extended version vs. the classical one are: the automated selection of optimal K on the basis of cross-validation, the local distance weighted, the use of non-eucliedean dissimilarity measures between SOs and the output, for each test example, of the list of all classes with an associated probability (a symbolic modal variable) instead of the single class value.
SONN architecture is very simple like its working. It is a wizard application to guide the user in selecting all the parameters needed for the classification. At the beginning the user selects the input file containing the SOs, then selects the class, the symbolic variables of interest for the classification, the percentage of training examples the system chooses randomly among the all SOs, the number of folders to find the optimal K and, finally, the dissimilarity measure to evaluate the K nearest neighbours training examples.
Experiment: “Mushroom data set”
The problem is to classify different mushrooms family in two categories: poisonous and not.
There are 2 experiments, with different types of inputs:
Boolean SOs
Input files for SONN
Probabilistic SOs
Input files for SONN
Experiment: “Dermatology data set”
The problem is to classify groups of patients according skin deseases.
There are 2 experiments, with different types of inputs:
Boolean SOs
Input files for SONN
Probabilistic SOs
Input files for SONN
The problem is to classify groups individuals according two bands of income.
There are 2 experiments, with different types of inputs:
Boolean SOs
Input files for SONN
Probabilistic SOs
Input files for SONN
SONN.exe. The SONN system
DissDLL.dll The library of dissimilarity measures
To use SONN system, download the executable file and the library in the same folder. Then start the system choosing one of the downloaded input file.
Warning: SONN system is free for evaluation, research and teaching purposes, but not for commercial purposes.
Please Acknowledge
F. Esposito, D. Malerba, & V. Tamma (2000). Dissimilarity Measures for Symbolic Objects. Chapter 8.3 in in H.-H. Bock and E. Diday (Eds.), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data, Series: Studies in Classification, Data Analysis, and Knowledge Organization, vol. 15, Springer-Verlag:Berlin, 165-185.
D. Malerba, F. Esposito, V. Gioviale & V. Tamma (2001). Comparing dissimilarity measures in Symbolic Data Analysis. Proceedings of the Joint Conferences on "New Techniques and Technologies for Statistcs" and "Exchange of Technology and Know-how" (ETK-NTTS'01), 473-481.
D. Malerba, F. Esposito, M. Monopoli (2002). Comparing dissimilarity measures for probabilistic symbolic objects. In A. Zanasi, C. A. Brebbia, N.F.F. Ebecken, P. Melli (Eds.) Data Mining III, Series Management Information Systems, Vol 6, 31-40, WIT Press, Southampton, UK.
C. D'Amato, D. Malerba, F. Esposito, M. Monopoli (2003). Extending the K-Nearest Neighbour classification algorithm to symbolic objects. Convegno Scientifico Intermedio SIS, 9-11 Giugno 2003, Università degli Studi di Napoli "Federico II".
SONN system has been implemented within the context of the following projects: