Biomedical Entity Extractor
|A short description
The functional architecture
The HmtDB application
BEE is a system developed for the automatic annotation of biomedical texts. Implemented methods for tagging are general and domain-dependency is limited to specific thesauri of the biomedical domain. Additional domain specific knowledge is automatically acquired through an Inductive Logic Programming system (ATRE) that works on logical representations of the textual content. Text is preprocessed by means of ANNIE, a component of the GATE (General Architecture for Text Engineering) infrastructure for information extraction. JAPE (a Java Annotation Patterns Engine) language, which is available in GATE, is also used to recognize regular expressions.
BEE supports users in:
The architecture of the BEE system is based on three main layers: data acquisition/persistence, text processing and text annotation. Data can be loaded from both file system and database. The text processing layer includes functionalities for text segmentation, text cleaning and normalization, feature extraction (statistical, morphological, lexical, semantical, structural features) and generation of logic descriptions for learning tasks. To perform feature extraction operations BEE exploits language engeneering modules of the GATE framework. The annotation layer allows final users to provide learning examples and expert users to manage learning processes in order to obtain a suitable knowledge base for automatic annotation tasks. Persistence of annotation results and learning operations is supported.
This application concerns the annotation of the HmtDB resource of variability and clinical data associated to mitochondrial pathological phenotypes. These data are prevalently available in the literature where are reported in a completely free style. In this scenario, the goal is to identify occurrences of specific biological objects (i.e., mutations) and their features (e.g., position in the DNA, involved nucleotides, etc.) as well as the method of analysis and some information on the subjects from which the DNA was extracted (e.g., age, gender, nationality, pathologies of the sample, etc.). This problem is translated in an information extraction problem. More precisely, the user is asked to define sets of annotation classes and to manually annotate data sets of interest. Domain dictionaries can be imported in the system to support text processing operations. Management of learning sessions is delegated to the expert user who can set parameters and launch experiments by using the BEE interface. Learned rules are used to automatically annotate new texts. The result of automatic annotation is visually proposed to the user who supervises and corrects annotations in order to allow the learner to improve the perfomances. In the figure a tailoring of the BEE interface for this annotation task is reported.
BEE user interface.
(in inverse chronological order)
Last Update: Wed Jun 13 2007 05:13:33 GMT+0200 (CEST)
KDDE presentations have to be based on this template.
Group members and students who are taking a degree, are invited to use it.
The 19th International Conference on Discovery Science (DS 2016) will be held in Bari on October 2016, 19th-21st. KDDE Group is organizing it.
Bari, Italy, 19-21 October, 2016.