All major relational database management system vendors are adding features to address big data analytics from a relational point of view. The two most
significant architectural developments have been the overtaking of the high end of the market with massively parallel processing (MPP), and the growing adoption of columnar storage. When MPP and columnar storage techniques are combined, a number of the system requirements in the above list can start to be addressed, including:
significant architectural developments have been the overtaking of the high end of the market with massively parallel processing (MPP), and the growing adoption of columnar storage. When MPP and columnar storage techniques are combined, a number of the system requirements in the above list can start to be addressed, including:
1. To support exabytes (thousands of petabytes) of data being distributed across thousands of geographically dispersed processors subsecond response time
for highly constrained standard SQL queries.
2 Updating data in place at full load speeds
3 being configured without being subject to a single point of failure
4 fail over and process continuation when processing nodes fail
Additionally, RDBMS vendors are adding some complex user-defined functions
(UDF's) to their syntax, but the kind of general purpose procedural language
computing required by big data analytics is not being satisfied in relational
environments at this time.
In a similar vein, RDBMS vendors are allowing complex data structures to be stored. These kind of embedded complex data structures have been known as "blobs" .It's important to understand that relational databases have a hard time providing general support for interpreting blobs
The Evolving Role of the Enterprise Data Warehouse in the era of Big Data Analytics ,Blobs are related to “data bags”
MPP implementations have never satisfactorily addressed the "big join” where dimension table are to be joined to a trillion row fact table without resorting to clustered storage. The big join crisis occurs when an ad hoc constraint is
placed against the dimension table . Since the dimension keys are scattered randomly across the separate segments of the trillion row fact table, it is very hard to avoid a lengthy download step of the very large dimension table to
every one of the fact table storage partitions. To be fair, the MapReduce/Hadoop
architecture has not been able to address the big join problem either.
The standard RDBMS architecture for implementing an enterprise data warehouse based on dimensional modeling principles is simple and well understood.
placed against the dimension table . Since the dimension keys are scattered randomly across the separate segments of the trillion row fact table, it is very hard to avoid a lengthy download step of the very large dimension table to
every one of the fact table storage partitions. To be fair, the MapReduce/Hadoop
architecture has not been able to address the big join problem either.
The standard RDBMS architecture for implementing an enterprise data warehouse based on dimensional modeling principles is simple and well understood.
No comments:
Post a Comment