Data Model It includes conceptual, logical and physical data model. In the conceptual modeling phase, we need to
capture and understand the user requirements so that the concepts revealed can be accommodated into the
conceptual model. For the conceptual design of DW, we propose to use the multidimensional model rather than
relational model as it provides us a more clear view of data structure - facts and dimensions, which is better to
support analytical work. In the logical model design, the main issue is how to design the hierarchy levels of
dimensions as the multi-levels of dimensions may cause a number of tables joins when do the query operations, such as roll up. In database, join is an expensive operator, especially, for large tables. Hierarchy dimensions design can be
less hierarchy levels but with a bigger dimension table, e.g., many fields populated in one table or more hierarchy
levels but with a smaller dimension table. Thus, we need to considerate the impact of multi-joins, which the
dimension hierarchy levels bring, to the query speed when we do the logical design of data warehouse. In the
physical modeling, tables are declared for different dimensions and facts. Star or snowflake data schema is
generated for the dimensions and facts physically, indexes are built and data partitions are used for the large amount
of data populated in the fact tables. The partition technologies will be our main concern in the physical design.
Usually, the data in the warehouse can be vertically partitioned by measures or horizontally partitioned by
dimensions. In our design, we propose to horizontal partition according to the location dimension, which is also
indicated in our proposed architecture: the repository for each European region.
Data Source 3XL storage system will be used for the eGovMon DW data source, The
3XL system can automatically generate a specialized schema for the data based on Web Ontology Language (OWL)
descriptions of classes and their properties. It creates a table for each of the OWL classes. The data of instances is
hold in the table of the class. In order to achieve high performance, the inserted data will be buffered in the main
memory and only be flushed to the database when memory is needed or by committing. The experience from EIAO
DW, in which 3store system was used, is that 90% - 99% of used time was spent on extracting data from the 3store However, by the use of data buffer and bulk loading technologies, 3XL storage system can be expected to gain a
better performance than 3store storage system.
ETL The ETL is the way of extracting the data from different data sources, converting them into the uniform data
format and loading into the data warehouse. In the proposed architecture, there are several number of Right-Time
ETLs (RiTE) running in parallel to load the data from distributed 3XL data sources to the central data warehouse.
RiTE is an ETL technique that can make the new inserted source data quickly available to data consumers, while
still providing bulk-load insert speed. The trick is that catalyst, a middle-ware system, is used to achieve fast loading
and concurrency control.
eGovMon DW The data warehouse is the central data repository that stores the materialized view of source data. It
uses a multidimensional model where the data is stored as facts and dimensions. As the data volume will be very
huge in our data warehouse, data partition is necessary to achieve a better query performance. In this proposal, the
data partition by the location dimension, e.g., the different European regions, will be applied to our data warehouse
but logically it still functions as a whole data repository. Open source PostgreSQL is proposed to use as the DBMS
of eGovMon data warehouse as it provides several useful features that are crucial to data warehouse, such as good
extensibility, table partitioning and bit-mapped indexes etc. Besides, PostgreSQL performs very well for complex
queries on large databases.
OLAP A data warehouse stores and manages data. OLAP transforms data warehouse data into strategic information
providing users multidimensional views for their analysis. Open source tools, like JasperAnalysis, Pentaho etc, can
be chosen to fulfill the normal OLAP operations, such as slice and dice, pivot, filter, chart, drill-down, or roll-up a
cube of data in real-time.
In summary, we gave an overview and the general components' information of eGovMon DW architecture above.
The technologies used and architecture will evolve with the project maturity.
No comments:
Post a Comment