BEE

Biomedical Entity Extractor

 

LACAM @ Dipartimento di Informatica - Università degli Studi di Bari - Via Orabona, 4 -70126 Bari

 

A short description  
The functional architecture  
The HmtDB application  
Project team  
Related publications 

A short description

BEE is a system developed for the automatic annotation of biomedical texts. Implemented methods for tagging are general and domain-dependency is limited to specific thesauri of the biomedical domain. Additional domain specific knowledge is automatically acquired through an Inductive Logic Programming system (ATRE) that works on logical representations of the textual content. Text is preprocessed by means of ANNIE, a component of the GATE (General Architecture for Text Engineering) infrastructure for information extraction. JAPE (a Java Annotation Patterns Engine) language, which is available in GATE, is also used to recognize regular expressions. 
BEE supports users in:

Top of this page  


The functional architecture

The architecture of the BEE system is based on three main layers: data acquisition/persistence, text processing and text annotation. Data can be loaded from both file system and database. The text processing layer includes functionalities for text segmentation, text cleaning and normalization, feature extraction (statistical, morphological, lexical, semantical, structural features) and generation of logic descriptions for learning tasks. To perform feature extraction operations BEE exploits language engeneering modules of the GATE framework. The annotation layer allows final users to provide learning examples and expert users to manage learning processes in order to obtain a suitable knowledge base for automatic annotation tasks. Persistence of annotation results and learning operations is supported.

BEE Architecture

Functional architecture.

Top of this page


The HmtDB application

This application concerns the annotation of the HmtDB resource of variability and clinical data associated to mitochondrial pathological phenotypes. These data are prevalently available in the literature where are reported in a completely free style. In this scenario, the goal is to identify occurrences of specific biological objects (i.e., mutations) and their features (e.g., position in the DNA, involved nucleotides, etc.) as well as the method of analysis and some information on the subjects from which the DNA was extracted (e.g., age, gender, nationality, pathologies of the sample, etc.). This problem is translated in an information extraction problem. More precisely, the user is asked to define sets of annotation classes and to manually annotate data sets of interest. Domain dictionaries can be imported in the system to support text processing operations. Management of learning sessions is delegated to the expert user who can set parameters and launch experiments by using the BEE interface. Learned rules are used to automatically annotate new texts. The result of automatic annotation is visually proposed to the user who supervises and corrects annotations in order to allow the learner to improve the perfomances. In the figure a tailoring of the BEE interface for this annotation task is reported.

BEE GUI

BEE user interface.

Top of this page


Project team

Donato Malerba

Margherita Berardi

 


Related publications

(in inverse chronological order)

Top of this page


berardi@di.uniba.it


Last Update: Wed Jun 13 2007 05:13:33 GMT+0200 (CEST)