Course of
201121 – ACTIVE DATA BASES
A.Y. 1999/2000
Prof. Donato Malerba
Objectives. This
course focuses on advanced concepts and methodologies for the development
of data-oriented software systems, whose main issues are how to organize,
manipulate, and access data, as well as how to extract information from
row data. The course is organized in two parts: databases and architectures
and paradigms for data analysis. The former focuses on the foundations
of relational databases (models and languages), on the conceptual, logical
and physical design of databases, as well as on some database evolutions
(active databases, relational object-relational databases, object-oriented
databases, multimedia databases, integration in the World Wide Web). The
second part of the course covers concepts on data warehousing, OLAP, knowledge
discovery process, and data mining methods (decision tree induction). Databases
considered for the practice are Oracle 8i and Microsoft Access 97.
Prerequisites. Some familiarity with algorithms and data structures,
databases and information systems.
Evaluation: written exam on relational database design and querying
+ homework assignment (course project) + oral exam on parts 5-12.
Program of the course A.Y. 1999-2000
1. Introduction to database systems.
Organizational systems. Information systems for production.
Components and evolution of information systems. Requirements of complex
information systems. Databases and DBMS. Data models. Abstraction levels
in a database. Database languages. DBMS interfaces. Database users. Database
control: integrity, reliability, security. Classification of DBMS. Modules
of a DBMS. Pros and cons in using DBMS.
2. Data models.
Different aspects of data modeling: ontological, abstract
linguistic, concrete linguistic, pragmatic. The object-oriented data model:
representation of structures of concrete, abstract and procedural knowledge,
representation of communication. Alternative data models: hierarchical,
network, relational.
3. The relational model.
Relations and tables. Relations with attributes. Relations
and databases. Incomplete information and null values. Integrity constraints.
Tuple constraints. Keys. Keys and null values. Referential constraints.
3. Languages for relational databases.
Relational algebra: primitive operators (union, difference,
renaming, selection, projection, Cartesian join), derived operators (intersection,
quotient, join, natural join, external join, semi-join), additional operators
(complement, aggregation functions, transitive closure). Algebraic properties
of relational operators.
Relational calculus: domain relational calculus, Tuple
calculus with range declarations.
Algebra and calculus with null values
Views.
SQL. Historical evolution. Simple queries in SQL. Set
queries. Group by queries. Data definition and administration in SQL: database
creation, table creation, domain definition, inserting/deleting/modifying
tuples, integrity constraints (intra-relational and inter-relational),
modifying a schema, views, assertions, physical models (parameters and
indices), relational catalogues, access control, tools for DB administration.
Programming applications in SQL: host languages, languages with API interface,
integrated languages (Oracle PL/SQL), transaction programming (explicit
repetition of transactions, transactions with different levels of isolation).
QBE: a graphical query language.
4. Database design.
Design techniques and models: the life cycle of information
systems, a methodology for database design, the Entity-Relationship model
(constructs and documentation of E-R schemas).
Conceptual design: requirements collection and analysis,
general criteria for data representation, design strategies (top-down,
bottom-up, inside-out, mixed), quality of a Conceptual schema, a comprehensive
method for conceptual design.
Logical design: performance analysis on E-R schemas,
restructuring of E-R schemas (analysis of redundancies, removing generalizations,
partitioning and merging of entities and relationships, selection of primary
identifiers), translation into the relational model (entities and many-to-many
relationships, one-to-many relationships, entities with external identifiers,
one-to-one relationships, graphical representation of translations, translation
of a complex schema, summary tables).
Normalization: redundancies and anomalies, functional
dependencies, Boyce - Codd normal form, decomposition properties (lossless
decomposition, preservation of dependencies, qualities of decompositions),
third normal form, database design and normalization (Verification of normalization
on entities and relationships, violation of normal forms and restructuring
of conceptual schemas).
5. Active databases.
Databases and production systems. Trigger behaviour
in a relational system. Definition and use of triggers in Oracle. Advanced
features of active rules. Properties of active rules (termination, confluence
e observational determinism). Design and implementation issues for active
databases. The IDEA methodology. Applications of active databases.
6. Object-oriented and object-relational databases.
Non-first normal form models. The relational object
model. SQL-3: tuples and objects, type hierarchies, abstract data types,
flattening and nesting queries. The third generation database manifesto.
An object relational DBMS: Illustra. Objects in Oracle 8: abstract data
types, collections, row objects, object views. Object-oriented databases:
types, classes, methods, generalization hierarchies, persistence, redefinition
of methods, refinement of properties and methods. The object-oriented database
manifesto. The ODMG standard for object-oriented databases: ODL e OQL.
An OODBMS: ObjectStore.
7. Object-oriented modeling in UML.
A conceptual Model of the UML: Structural thing (class,
interface, collaboration, use case, active class, component, node); Behavioral
thing (interaction; state machine); Grouping thing (package); Annotational
thing (note); Relationship: dependence; association, generalization, realization;
Diagram:
class diagram, object diagram, use case diagram, sequence
diagram, collaboration diagram, statechart diagram, activity diagram, component
diagram, deployment diagram. Common mechanisms in UML: specifications,
ornaments, extensions (stereotypes, labelled values and constraints). The
five views: use case view, design view, implementation view, process view,
deployment view. The Eastern State University: a case study.
8. Multimedia databases.
Multimedia information systems. Multimedia data. Multimedia
data management. Recall and precision. Information retrieval systems for
text. Manual and automatic indexing. Index creation. Functionalities and
design of an IRS: the case of WebClass.
9. Databases and the World Wide Web.
The Internet and the World Wide Web: nodes, addresses
and protocols, the World Wide Web, anchors and URL, HTML.
Database access through the Web: atabase access through
common gateway interface (CGI) programs, field-oriented interaction, advantages
and disadvantages of DBMS Web, the universal server.
Web and databases: differences and similarities.
Client-server architecture. Three-tier architecture.
JDBC. Servlets. Java Server Pages. Enterprise Java Beans.
10. Microsoft Access’97.
Main characteristics of the DBMS. Basic notions: relations,
masks, queries, reports, macros and modules. Creating a new database. Data
types. Relationships (one-to-one, one-to-many, many-to-many). Referential
constraints. Primary keys and indices. Definition of a selection query
(single and multiple tables), cross-fields queries, statement queries e
parameterized queries. Operators and expressions in Access. Updating tables
through queries. Queries with aggregation functions. Access’97 and the
WWW: hyperlinks, data export in HTML format, use of Access’97 HTML models,
data import from HTML tables, dynamic creation of Web pages through Internet
Database Connector (IDC). Creation of Active Server Pages (ASP).
11. Architectures and paradigms for data analysis.
Operational vs. business data. Business Intelligence
technologies. Decision support systems (DSS), Executive Information Systems
(EIS) and Management Informaiton Systems (MIS). Main characteristics of
a data warehouse. Data warehouse architectures. Data warehouse schemas:
star schema and snowflake schema. OLAP and operations for data analysis:
drill down and roll up. ROLAP and MOLAP.
12. Knowledge discovery in databases.
Knowledge discovery in databases: definition and issues.
The database knowledge discovery process: selection, preprocessing, transformation,
data mining, interpretation and evaluation. Data mining: objectives, tasks,
pattern representation, pattern evaluation, search methods. Decision tree
induction: the three fundamental problems (associating classes to leaves,
choosing the test, stopping criteria). Information gain and gain-ratio.
Decision tree pruning methods. From decision trees to production rules.
Texts and Papers
P. Atzeni, S. Ceri, S. Paraboschi & R. Torlone
Database Systems - Concepts, Languages and Architectures
McGraw-Hill,2000.
Chapters: 1, 2, 3 (3.15 excluded), 4, 5, 6, 7, 8, 11,
12, 13, 14, Appendices A and D.
A. Albano, G. Ghelli, R. Orsini
Basi di dati relazionali e a oggetti
Zanichelli, 1997
Chapters: 1, 2, 5, 7, 8, 9, 10, 11.1, 13.3, 13.6, 15.4.1
G. Lausen, G. Vossen
Models and Languages of Object-Oriented Databases
Addison-Wesley, International Computer Science Series,
1997
Chapters 1, 2, 3.
G. Booch, I. Jacobson, J. Rumbaugh
The Unified Modeling Language: User Guide
Addison-Wesley, 1999
Chapters 1, 2, 3
U. Fayyad, G. Piatesky-Shapiro, P. Smyth. From data mining
to knowledge discovery: an overview.
In U. Fayyad, G. Piatesky-Shapiro, P. Smyth, R. Uthurusamy
(Eds.), Advances in Knowledge Discovery and Data Mining,
AAAI/MIT Press, pp. 1-35, 1996.
W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus. Knowledge
discovery in databases.
In W.J. Frawley, G. Piatesky-Shapito, C.J. Matheus, Knowledge
discovery in databases,
AAAI/MIT Press, pp. 1-27, 1991.
T. Mitchell
Machine Learning
McGraw Hill, 1997.
Chapters 1 e 3.
Further references for homeworks on database design:
L. Cabibbo, R. Torlone, C. Batini
Basi di dati: Progetti ed esercizi svolti.
Pitagora editrice, Bologna, 1995.
C. Francalanci, F. Schreiber, L. Tanca
Progetto di dati e funzioni
Società Editrice Esculapio, Bologna, 1995.
Texts, papers, and copies of transparencies are made available
in the Library of the Department of Informatics.