MTA

MeSH Term Associator

 

LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari
 
A short description  
The functional architecture  
The distribution package  
Project team  
Related publications 

A short description

MTA is a data mining tool able to discover association rules on biomedical text corpora. It imports both some MeSH (Medical Subject Headings) taxonomies and a set of abstracts published on MedLine and discovers associations at different levels of abstraction (generalized association rules). Both automatic and semiautomatic approaches can be applied to structure the set of discovered rules and filter out uninteresting ones. In the automatic approach rules are filtered out without using user knowledge, while in the semiautomatic approach user domain knowledge is exploited to strongly guide the exploration of the set of discovered rules. Discovered association ruels can be imported/exported in PMML. Similarities between discovered association rules can be visually explored through a multidimensional analysis technique.

Top of this page  


The functional architecture

The architecture developed in the MTA context follows the standard KDD (Knowledge Discovery in Databases) process. It consists of the following steps:

  • Data Collection. MTA is integrated in a distributed framework which interfaces the PubMed remote database through the IBM Web Services for Life Sciences. A user query is directly run and the list of relevant abstracts is returned and downloaded.
  • Data Selection and Pre-processing. This step involves operations to prepare both data to be mined and data to be used as background knowledge.
  • Data MiningThe mining step performs both flat and generalized association rule discovery among abstracts returned by a PubMed query. Discovered association rules capture recurrent patterns in texts that may detect relations among biomedical concepts.
  • Interpretation and Evaluation. Since the number of discovered association rules is usually high and the interest of most of them does not fulfil user expectations, some filtering and browsing techniques are available. There are four main criteria: rule templates, rule covers, statistical rating and specificity. The first one allows the end user to specify some knowledge of interest that rules should/should not match. The second ones select groups of redundat rules while the third one identifies statistically interesting rules. Finally, the last technique allows to look at the set of discovered rules as a set of subspaces of rules, where for each subspace a representative rule is identifiable.



  • MTA architecture

    A framework for MTA in PubMed query expansion tasks.

    Top of this page


    The distribution package

    MTA is an application running under Windows98 or higher. 
    Download the distribution package (mta.zip, 60.5 MB) and unzip it into a temporary directory. 
    Sample datasets are available in a MS Access database. MeSH taxonomies are stored in a separate MS Access database. 
    See the User Guide for further details about system requirements, installation and usage of the system.

    Warning: The system MTA is free for evaluation, research and teaching purposes, but not for commercial purposes.

    Please Acknowledge

    Top of this page


    Project team

    Prof. Donato Malerba

    Margherita BerardiCorrado Loglisci

    Top of this page 


    Related publications

    (in inverse chronological order)

    Top of this page 
     


    berardi@di.uniba.it


    Last Update: Tue Apr 10 2007 14:21:36 GMT+0200 (CEST)