MTA

MeSH Term Associator

 

LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari
 
A short description  
The functional architecture  
The distribution package  
Project team  
Related publications 

A short description

MTA is a data mining tool able to discover association rules on biomedical text corpora. It imports both some MeSH (Medical Subject Headings) taxonomies and a set of abstracts published on MedLine and discovers associations at different levels of abstraction (generalized association rules). Both automatic and semiautomatic approaches can be applied to structure the set of discovered rules and filter out uninteresting ones. In the automatic approach rules are filtered out without using user knowledge, while in the semiautomatic approach user domain knowledge is exploited to strongly guide the exploration of the set of discovered rules. Discovered association ruels can be imported/exported in PMML. Similarities between discovered association rules can be visually explored through a multidimensional analysis technique.

Top of this page  


The functional architecture

The architecture developed in the MTA context follows the standard KDD (Knowledge Discovery in Databases) process. It consists of the following steps:

  • Data Collection. MTA is integrated in a distributed framework which interfaces the PubMed remote database through the IBM Web Services for Life Sciences. A user query is directly run and the list of relevant abstracts is returned and downloaded.
  • Data Selection and Pre-processing. This step involves operations to prepare both data to be mined and data to be used as background knowledge.
    • Input data are composed by sets of abstracts of scientific publications returned by PubMed queries. Texts are annotated by the BioTeKS Text Analysis Engine (TAE) provided within the IBM UIM Architecture, by using a local MeSH terms dictionary. Then, feature selection techniques are used to choose relevant items (i.e., MeSHs). Each query generated a single table of a relational database, where each transaction corresponds to an individual abstract and attribues to selected MeSH terms.
    • Background knowledge is composed by MeSH hierarchies. Supported operations concern conversion of taxonomies in the MTA format, selection of portions of taxonomies of interest by means of pruning and recovering operations.
  • Data MiningThe mining step performs both flat and generalized association rule discovery among abstracts returned by a PubMed query. Discovered association rules capture recurrent patterns in texts that may detect relations among biomedical concepts.
  • Interpretation and Evaluation. Since the number of discovered association rules is usually high and the interest of most of them does not fulfil user expectations, some filtering and browsing techniques are available. There are four main criteria: rule templates, rule covers, statistical rating and specificity. The first one allows the end user to specify some knowledge of interest that rules should/should not match. The second ones select groups of redundat rules while the third one identifies statistically interesting rules. Finally, the last technique allows to look at the set of discovered rules as a set of subspaces of rules, where for each subspace a representative rule is identifiable.



  • MTA architecture

    A framework for MTA in PubMed query expansion tasks.

    Top of this page


    The distribution package

    MTA is an application running under Windows98 or higher. 
    Download the distribution package (mta.zip, 60.5 MB) and unzip it into a temporary directory. 
    Sample datasets are available in a MS Access database. MeSH taxonomies are stored in a separate MS Access database. 
    See the User Guide for further details about system requirements, installation and usage of the system.

    Warning: The system MTA is free for evaluation, research and teaching purposes, but not for commercial purposes.

    Please Acknowledge

    Top of this page


    Project team

        Project Leader

    Prof. Donato Malerba

        LACAM Staff

    Margherita BerardiCorrado Loglisci

      Previous membersSaverio D'Alessandro

    Top of this page 


    Related publications

    (in inverse chronological order)

    • M. Berardi, A. Appice, C. Loglisci, P. Leo (2006). Supporting Visual Exploration of Discovered Association Rules Through Multi-Dimensional Scaling. Foundations of Intelligent Systems, 16th International Symposium, ISMIS 2006, Bari, Italia, Settembre 27-29, 2006, in F. Esposito, Z. W. Ras, D. Malerba, G. Semeraro (Eds.) Series: Lecture Notes in Computer Science 4203 Springer 2006, 369-378.
    • M. Berardi, M. Lapi, P. Leo, & C. Loglisci (2005). Mining Generalized Association Rules on Biomedical Literature. In: M. Ali, F. Esposito (Eds.): Innovations in Applied Artificial Intelligence, 18th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2005, Bari, Italy, June 22-24, 2005, Proceedings. Springer-Verlag, LNCS 3533, 500-509.
    • M. Berardi, D. Malerba, C. Marinelli, P. Leo, C. Loglisci, & G. Scioscia (2005). A Text-Mining application able to mine association rules from biomedical texts. Annual Meeting of the Bioinformatic Italian Society, BITS 2005. Milano, Marzo 17-19, 2005.
    • M. Berardi, M. Lapi, P. Leo, D. Malerba, C. Marinelli, & G. Scioscia (2004). A data mining approach to PubMed query refinement. 2nd International Workshop on Biological Data Management (BIDM 2004), in conjunction with DEXA 2004, Zaragoza, Spain, September 2, 2004, IEEE Computer Society, 401-405.
    • M. Berardi, M. Lapi, P. Leo, D. Malerba, C. Marinelli, & G. Scioscia (2004). A data mining approach for disease-genes relationship discovery in biomedical literature. KDNet Symposium on Knowledge-Based Services for the Public Sector: workshop on "Knowledge-based systems and services for health care". Bonn, Germany, June 3-4, 2004.

    Top of this page 
     


    berardi@di.uniba.it


    Last Update: Tue Apr 10 2007 14:21:36 GMT+0200 (CEST)

    KDDE  Template

    KDDE presentations have to be based on this template.

    Group members and students who are taking a degree, are invited to use it.

    Discovery Science 2016

    The 19th International Conference on Discovery Science (DS 2016) will be held in Bari on October 2016, 19th-21st. KDDE Group is organizing it.

    ALT 2016

    Algorithmic Learning Theory 2016

    Bari, Italy, 19-21 October, 2016.

    Powered by CMSimple| Template: ge-webdesign.de| Login