CA2401348C - Multi-dimensional database and integrated aggregation server - Google Patents
Multi-dimensional database and integrated aggregation server Download PDFInfo
- Publication number
- CA2401348C CA2401348C CA2401348A CA2401348A CA2401348C CA 2401348 C CA2401348 C CA 2401348C CA 2401348 A CA2401348 A CA 2401348A CA 2401348 A CA2401348 A CA 2401348A CA 2401348 C CA2401348 C CA 2401348C
- Authority
- CA
- Canada
- Prior art keywords
- data
- dbms
- aggregation
- query
- relational
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 419
- 238000004220 aggregation Methods 0.000 title claims abstract description 419
- 238000000034 method Methods 0.000 claims abstract description 153
- 238000012545 processing Methods 0.000 claims abstract description 39
- 230000004044 response Effects 0.000 claims abstract description 26
- 230000004931 aggregating effect Effects 0.000 claims abstract description 20
- 230000007246 mechanism Effects 0.000 claims description 51
- 238000004458 analytical method Methods 0.000 claims description 37
- 238000007726 management method Methods 0.000 claims description 37
- 238000003860 storage Methods 0.000 claims description 31
- 238000004891 communication Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000013439 planning Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 58
- 230000006872 improvement Effects 0.000 abstract description 6
- 238000013500 data storage Methods 0.000 description 19
- 230000006870 function Effects 0.000 description 19
- 238000013499 data model Methods 0.000 description 11
- 238000005304 joining Methods 0.000 description 11
- 239000000047 product Substances 0.000 description 11
- 230000008901 benefit Effects 0.000 description 8
- 238000005457 optimization Methods 0.000 description 7
- 230000008520 organization Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000000638 solvent extraction Methods 0.000 description 5
- 241000533950 Leucojum Species 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000001934 delay Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 239000000543 intermediate Substances 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000010420 art technique Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000009705 shock consolidation Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- 230000002567 autonomic effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000013341 scale-up Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C03—GLASS; MINERAL OR SLAG WOOL
- C03B—MANUFACTURE, SHAPING, OR SUPPLEMENTARY PROCESSES
- C03B37/00—Manufacture or treatment of flakes, fibres, or filaments from softened glass, minerals, or slags
- C03B37/01—Manufacture of glass fibres or filaments
- C03B37/02—Manufacture of glass fibres or filaments by drawing or extruding, e.g. direct drawing of molten glass from nozzles; Cooling fins therefor
- C03B37/025—Manufacture of glass fibres or filaments by drawing or extruding, e.g. direct drawing of molten glass from nozzles; Cooling fins therefor from reheated softened tubes, rods, fibres or filaments, e.g. drawing fibres from preforms
- C03B37/027—Fibres composed of different sorts of glass, e.g. glass optical fibres
- C03B37/02718—Thermal treatment of the fibre during the drawing process, e.g. cooling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- C—CHEMISTRY; METALLURGY
- C03—GLASS; MINERAL OR SLAG WOOL
- C03B—MANUFACTURE, SHAPING, OR SUPPLEMENTARY PROCESSES
- C03B2203/00—Fibre product details, e.g. structure, shape
- C03B2203/36—Dispersion modified fibres, e.g. wavelength or polarisation shifted, flattened or compensating fibres (DSF, DFF, DCF)
-
- C—CHEMISTRY; METALLURGY
- C03—GLASS; MINERAL OR SLAG WOOL
- C03B—MANUFACTURE, SHAPING, OR SUPPLEMENTARY PROCESSES
- C03B2205/00—Fibre drawing or extruding details
- C03B2205/42—Drawing at high speed, i.e. > 10 m/s
-
- C—CHEMISTRY; METALLURGY
- C03—GLASS; MINERAL OR SLAG WOOL
- C03B—MANUFACTURE, SHAPING, OR SUPPLEMENTARY PROCESSES
- C03B2205/00—Fibre drawing or extruding details
- C03B2205/55—Cooling or annealing the drawn fibre prior to coating using a series of coolers or heaters
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Abstract
Improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB). In one aspect of the present invention, the apparatus is realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional OLAP systems to achieve significant improvements in system performance. In accordance with the principles of the present invention, the stand-alone aggregation server contains a scalable MDDB and a high-performance aggregation engine that are integrated into the modular architecture of the aggregation server. The sand-alone aggregation server of the present invention can uniformly distribute data elements among a plurality of processors, for balanced loading and processing, and therefore is highly scalable. The stand-alone aggregation server of the present invention can be used to realize (i) an improved MDDB for supporting on-line analytical processing (OLAP) operations, (ii) an improved Internet URL Directory for supporting on-line information searching operations by Web-enabled client machines, as well as (iii) diverse types of MDDB-based systems for supporting real-time control of processes in response to complex states of information reflected in the MDDB. In another aspect of the present invention, the apparatus is integrated within a database management system (DBMS). The improved DBMS can be used to realize achieving a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use. The improved DBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system, operational database system, or the like.
Description
MULTI-DIMENSIONAL DATABASE AND INTEGRATED AGGREGATION
SERVER
Brief Description Of The State Of The Art The ability to act quickly and decisively in today's increasingly competitive marketplace is critical to the success of organizations. The volume of information that is available to corporations is rapidly increasing and frequently overwhelming.
Those organizations that will effectively and efficiently manage these tremendous volumes of data, and use the information to make business decisions, will realize a significant competitive advantage in the marketplace.
Data warehousing, the creation of an enterprise-wide data store, is the first step towards managing these volumes of data. The Data Warehouse is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Over the last few years, improvements in price, performance, scalability, and robustness of open computing systems have made data warehousing a central component of Information Technology CIT
strategies. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled "Data Integration: The Warehouse Foundation"
by Louis Rollleigh and Joe Thomas, published at http://www.aodom.com/whitepapers/wp-11.asp Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. OLAP
systems allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight into a particular problem or line of inquiry. For example, by using an OLAP system, decision makers can "slice and dice"
information along a customer (or business) dimension, and view business metrics by product and through time. Reports can be defined from multiple perspectives that provide a high-level or detailed view of the performance of any aspect of the business. Decision makers can navigate throughout their database by drilling down on a report to view elements at finer levels of detail, or by pivoting to view reports from different perspectives. To enable such full-functioned business analyses, OLAP systems need to (1) support sophisticated analyses, (2) scale to large numbers of dimensions, and (3) support analyses against large atomic data sets.
These three key requirements are discussed further below.
Decision makers use key performance metrics to evaluate the operations within their domain, and OLAP systems need to be capable of delivering these metrics in a user-customizable format. These metrics may be obtained from the transactional databases precalculated and stored in the database, or generated on demand during the query process.
Commonly used metrics include:
(1) Multidimensional Ratios (e.g. Percent to Total) - "Show me the contribution to weekly sales and category profit made by all items sold in the Northwest stores between July 1 and July 14."
(2) Comparisons (e.g. Actual vs. Plan, This Period vs. Last Period) - "Show me the sales to plan percentage variation for this year and compare it to that of the previous year to identify planning discrepancies."
SERVER
Brief Description Of The State Of The Art The ability to act quickly and decisively in today's increasingly competitive marketplace is critical to the success of organizations. The volume of information that is available to corporations is rapidly increasing and frequently overwhelming.
Those organizations that will effectively and efficiently manage these tremendous volumes of data, and use the information to make business decisions, will realize a significant competitive advantage in the marketplace.
Data warehousing, the creation of an enterprise-wide data store, is the first step towards managing these volumes of data. The Data Warehouse is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Over the last few years, improvements in price, performance, scalability, and robustness of open computing systems have made data warehousing a central component of Information Technology CIT
strategies. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled "Data Integration: The Warehouse Foundation"
by Louis Rollleigh and Joe Thomas, published at http://www.aodom.com/whitepapers/wp-11.asp Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. OLAP
systems allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight into a particular problem or line of inquiry. For example, by using an OLAP system, decision makers can "slice and dice"
information along a customer (or business) dimension, and view business metrics by product and through time. Reports can be defined from multiple perspectives that provide a high-level or detailed view of the performance of any aspect of the business. Decision makers can navigate throughout their database by drilling down on a report to view elements at finer levels of detail, or by pivoting to view reports from different perspectives. To enable such full-functioned business analyses, OLAP systems need to (1) support sophisticated analyses, (2) scale to large numbers of dimensions, and (3) support analyses against large atomic data sets.
These three key requirements are discussed further below.
Decision makers use key performance metrics to evaluate the operations within their domain, and OLAP systems need to be capable of delivering these metrics in a user-customizable format. These metrics may be obtained from the transactional databases precalculated and stored in the database, or generated on demand during the query process.
Commonly used metrics include:
(1) Multidimensional Ratios (e.g. Percent to Total) - "Show me the contribution to weekly sales and category profit made by all items sold in the Northwest stores between July 1 and July 14."
(2) Comparisons (e.g. Actual vs. Plan, This Period vs. Last Period) - "Show me the sales to plan percentage variation for this year and compare it to that of the previous year to identify planning discrepancies."
(3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30, Quartiles) -"Show me sales, profit and average call volume per day for my 20 most profitable salespeople, who are in the top 30% of the worldwide sales."
(4) Custom Consolidations - "Show me an abbreviated income statement by quarter for the last two quarters for my Western Region operations."
Knowledge workers analyze data from a number of different business perspectives or dimensions. As used hereinafter, a dimension is any element or hierarchical combination of elements in a data model that can be displayed orthogonally with respect to other combinations of elements in the data model. For example, if a report lists sales by week, promotion, store, and department, then the report would be a slice of data taken from a four-dimensional data model.
Target marketing and market segmentation applications involve extracting highly qualified result sets from large volumes of data. For example, a direct marketing organization might want to generate a targeted mailing list based on dozens of characteristics, including purchase frequency, size of the last purchase, past buying trends, customer location, age of customer, and gender of customer. These applications rapidly increase the dimensionality requirements for analysis.
The number of dimensions in OLAP systems range from a few orthogonal dimensions to hundreds of orthogonal dimensions. Orthogonal dimensions in an exemplary OLAP
application might include Geography, Time, and Products.
Atomic data refers to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, "atomic data" may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch. Most organizations implementing OLAP systems find themselves needing systems that can scale to tens, hundreds, and even thousands of gigabytes of atomic information.
As OLAP systems become more pervasive and are used by the majority of the enterprise, more data over longer time frames will be included in the data store (i.e. data warehouse), and the size of the database will increase by at least an order of magnitude. Thus, OLAP systems need to be able to scale from present to near-future volumes of data.
In general, OLAP systems need to (1) support the complex analysis requirements of decision-makers, (2) analyze the data from a number of different perspectives (i.e. business dimensions), and (3) support complex analyses against large input (atomic-level) data sets from a Data Warehouse maintained by the organization using a relational database management system (RDBMS).
Vendors of OLAP systems classify OLAP Systems as either Relational OLAP
(ROLAP) or Multidimensional OLAP (MOLAP) based on the underlying architecture thereof.
Thus, there are two basic architectures for On-Line Analytical Processing systems: the ROLAP
Architecture, and the MOLAP architecture.
The Relational OLAP (ROLAP) system accesses data stored in a Data Warehouse to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database, i.e. the Data Warehouse.
The ROLAP architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
A typical prior art ROLAP system has a three-tier or layer client/server architecture.
The "database layer" utilizes relational databases for data storage, access, and retrieval processes. The "application logic layer" is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of "presentation layers," through which users perform OLAP analyses.
After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP engine, which then dynamically transforms the requests into SQL execution plans. The SQL execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing precalculated results when they are available, or dynamically 5 generating results from atomic information when necessary.
Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) to provide OLAP analyses. The MDDB is logically organized as a multidimensional array (typically referred to as a multidimensional cube or hypercube or cube) whose rows/columns each represent a different dimension (i.e., relation). A data value is associated with each combination of dimensions (typically referred to as a "coordinate").
The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally.
As shown in Fig. 1B, prior art MOLAP systems have an Aggregation, Access and Retrieval module which is responsible for all data storage, access, and retrieval processes, including data aggregation (i.e. preaggregation) in the MDDB. As shown in Fig.
1B, the base data loader is fed with base data, in the most detailed level, from the Data Warehouse, into the Multi-Dimensional Data Base (MDDB). On top of the base data, layers of aggregated data are built-up by the Aggregation program, which is part of the Aggregation, Access and Retrieval module. As indicated in this figure, the application logic module is responsible for the execution of all OLAP requests/queries (e.g. ratios, ranks, forecasts, exception scanning, and slicing and dicing) of data within the MDDB. The presentation module integrates with the application logic module and provides an interface, through which the end users view and request OLAP analyses on their client machines which may be web-enabled through the infrastructure of the Internet. The client/server architecture of a MOLAP
system allows multiple users to access the same multidimensional database (MDDB).
Information (i.e. basic data) from a variety of operational systems within an enterprise, comprising the Data Warehouse, is loaded into a prior art multidimensional database (MDDB) through a series of batch routines. The ExpressTM server by the Oracle Corporation is exemplary of a popular server which can be used to carry out the data loading process in prior art MOLAP systems. As shown in Fig. 2B, an exemplary 3-D MDDB
is schematically depicted, showing geography, time and products as the "dimensions" of the database. The multidimensional data of the MDDB is logically organized in an array structure, as shown in Fig. 2C. Physically, the ExpressTM server stores data in pages (or records) of an information file. Pages contain 512, or 2048, or 4096 bytes of data, depending on the platform and release of the Express TM server. In order to look up the physical record address from the database file recorded on a disk or other mass storage device, the Express' server generates a data structure referred to as a "Page Allocation Table (PAT)". As shown in Fig. 2D, the PAT
tells the Express TM server the physical record number that contains the page of data.
Typically, the PAT
is organized in pages. The simplest way to access a data element in the MDDB
is by calculating the "offset" using the additions and multiplications expressed by a simple formula:
Offset = Months + Product * (# of Months) + City * (# of Months * # of Products) During an OLAP session, the response time of a multidimensional query on a prior art MDDB depends on how many cells in the MDDB have to be added "on the fly". As the number of dimensions in the MDDB increases linearly, the number of the cells in the MDDB
increases exponentially. However, it is known that the majority of multidimensional queries deal with summarized high level data. Thus, as shown in Figs. 3A and 3B, once the atomic data (i.e. "basic data") has been loaded into the MDDB, the general approach is to perform a series of calculations in batch in order to aggregate (i.e. pre-aggregate) the data elements along the orthogonal dimensions of the MDDB and fill the array structures thereof.
For example, revenue figures for all retail stores in a particular state (i.e. New York) would be added together to fill the state level cells in the MDDB. After the array structure in the database has been filled, integer-based indices are created and hashing algorithms are used to improve query access times. Pre-aggregation of dimension DO is always performed along the cross-section of the MDDB along the DO dimension.
As shown in Fig. 3C1 and 3C2, the raw data loaded into the MDDB is primarily organized at its lowest dimensional hierarchy, and the results of the pre-aggregations are stored in the neighboring parts of the MDDB.
As shown in Fig. 3C2, along the TIME dimension, weeks are the aggregation results of days, months are the aggregation results of weeks, and quarters are the aggregation results of months. While not shown in the figures, along the GEOGRAPHY dimension, states are the aggregation results of cities, countries are the aggregation results of states, and continents are the aggregation results of countries. By pre-aggregating (i.e. consolidating or compiling) all logical subtotals and totals along all dimensions of the MDDB, it is possible to carry out real-time MOLAP operations using a multidimensional database (MDDB) containing both basic (i.e. atomic) and pre-aggregated data. Once this compilation process has been completed, the MDDB is ready for use. Users request OLAP reports by submitting queries through the OLAP Application interface (e.g. using web-enabled client machines), and the application logic layer responds to the submitted queries by retrieving the stored data from the MDDB
for display on the client machine.
Typically, in MDDB systems, the aggregated data is very sparse, tending to explode as the number of dimension grows and dramatically slowing down the retrieval process (as described in the report entitled "Database Explosion: The OLAP Report", http://www.olapreport.com/DatabaseExplosion.htm. Quick and on line retrieval of queried data is critical in delivering on-line response for OLAP queries. Therefore, the data structure of the MDDB, and methods of its storing, indexing and handling are dictated mainly by the need of fast retrieval of massive and sparse data.
Different solutions for this problem are disclosed in the following US
Patents:
= 5,822,751 "Efficient Multidimensional Data Aggregation Operator Implementation"
= 5,805,885 "Method And System For Aggregation Objects"
= 5,781,896 "Method And System For Efficiently Performing Database Table Aggregation Using An Aggregation Index"
= 5,745,764 "Method And System For Aggregation Objects"
In all the prior art OLAP servers, the process of storing, indexing and handling MDDB utilize complex data structures to largely improve the retrieval speed, as part of the querying process, at the cost of slowing down the storing and aggregation. The query-bounded structure, that must support fast retrieval of queries in a restricting environment of high sparcity and multi-hierarchies, is not the optimal one for fast aggregation.
In addition to the aggregation process, the Aggregation, Access and Retrieval module is responsible for all data storage, retrieval and access processes.
The Logic module is responsible for the execution of OLAP queries. The Presentation module intermediates between the user and the logic module and provides an interface through which the end users view and request OLAP analyses. The client/server architecture allows multiple users to simultaneously access the multidimensional database.
Knowledge workers analyze data from a number of different business perspectives or dimensions. As used hereinafter, a dimension is any element or hierarchical combination of elements in a data model that can be displayed orthogonally with respect to other combinations of elements in the data model. For example, if a report lists sales by week, promotion, store, and department, then the report would be a slice of data taken from a four-dimensional data model.
Target marketing and market segmentation applications involve extracting highly qualified result sets from large volumes of data. For example, a direct marketing organization might want to generate a targeted mailing list based on dozens of characteristics, including purchase frequency, size of the last purchase, past buying trends, customer location, age of customer, and gender of customer. These applications rapidly increase the dimensionality requirements for analysis.
The number of dimensions in OLAP systems range from a few orthogonal dimensions to hundreds of orthogonal dimensions. Orthogonal dimensions in an exemplary OLAP
application might include Geography, Time, and Products.
Atomic data refers to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, "atomic data" may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch. Most organizations implementing OLAP systems find themselves needing systems that can scale to tens, hundreds, and even thousands of gigabytes of atomic information.
As OLAP systems become more pervasive and are used by the majority of the enterprise, more data over longer time frames will be included in the data store (i.e. data warehouse), and the size of the database will increase by at least an order of magnitude. Thus, OLAP systems need to be able to scale from present to near-future volumes of data.
In general, OLAP systems need to (1) support the complex analysis requirements of decision-makers, (2) analyze the data from a number of different perspectives (i.e. business dimensions), and (3) support complex analyses against large input (atomic-level) data sets from a Data Warehouse maintained by the organization using a relational database management system (RDBMS).
Vendors of OLAP systems classify OLAP Systems as either Relational OLAP
(ROLAP) or Multidimensional OLAP (MOLAP) based on the underlying architecture thereof.
Thus, there are two basic architectures for On-Line Analytical Processing systems: the ROLAP
Architecture, and the MOLAP architecture.
The Relational OLAP (ROLAP) system accesses data stored in a Data Warehouse to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database, i.e. the Data Warehouse.
The ROLAP architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
A typical prior art ROLAP system has a three-tier or layer client/server architecture.
The "database layer" utilizes relational databases for data storage, access, and retrieval processes. The "application logic layer" is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of "presentation layers," through which users perform OLAP analyses.
After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP engine, which then dynamically transforms the requests into SQL execution plans. The SQL execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing precalculated results when they are available, or dynamically 5 generating results from atomic information when necessary.
Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) to provide OLAP analyses. The MDDB is logically organized as a multidimensional array (typically referred to as a multidimensional cube or hypercube or cube) whose rows/columns each represent a different dimension (i.e., relation). A data value is associated with each combination of dimensions (typically referred to as a "coordinate").
The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally.
As shown in Fig. 1B, prior art MOLAP systems have an Aggregation, Access and Retrieval module which is responsible for all data storage, access, and retrieval processes, including data aggregation (i.e. preaggregation) in the MDDB. As shown in Fig.
1B, the base data loader is fed with base data, in the most detailed level, from the Data Warehouse, into the Multi-Dimensional Data Base (MDDB). On top of the base data, layers of aggregated data are built-up by the Aggregation program, which is part of the Aggregation, Access and Retrieval module. As indicated in this figure, the application logic module is responsible for the execution of all OLAP requests/queries (e.g. ratios, ranks, forecasts, exception scanning, and slicing and dicing) of data within the MDDB. The presentation module integrates with the application logic module and provides an interface, through which the end users view and request OLAP analyses on their client machines which may be web-enabled through the infrastructure of the Internet. The client/server architecture of a MOLAP
system allows multiple users to access the same multidimensional database (MDDB).
Information (i.e. basic data) from a variety of operational systems within an enterprise, comprising the Data Warehouse, is loaded into a prior art multidimensional database (MDDB) through a series of batch routines. The ExpressTM server by the Oracle Corporation is exemplary of a popular server which can be used to carry out the data loading process in prior art MOLAP systems. As shown in Fig. 2B, an exemplary 3-D MDDB
is schematically depicted, showing geography, time and products as the "dimensions" of the database. The multidimensional data of the MDDB is logically organized in an array structure, as shown in Fig. 2C. Physically, the ExpressTM server stores data in pages (or records) of an information file. Pages contain 512, or 2048, or 4096 bytes of data, depending on the platform and release of the Express TM server. In order to look up the physical record address from the database file recorded on a disk or other mass storage device, the Express' server generates a data structure referred to as a "Page Allocation Table (PAT)". As shown in Fig. 2D, the PAT
tells the Express TM server the physical record number that contains the page of data.
Typically, the PAT
is organized in pages. The simplest way to access a data element in the MDDB
is by calculating the "offset" using the additions and multiplications expressed by a simple formula:
Offset = Months + Product * (# of Months) + City * (# of Months * # of Products) During an OLAP session, the response time of a multidimensional query on a prior art MDDB depends on how many cells in the MDDB have to be added "on the fly". As the number of dimensions in the MDDB increases linearly, the number of the cells in the MDDB
increases exponentially. However, it is known that the majority of multidimensional queries deal with summarized high level data. Thus, as shown in Figs. 3A and 3B, once the atomic data (i.e. "basic data") has been loaded into the MDDB, the general approach is to perform a series of calculations in batch in order to aggregate (i.e. pre-aggregate) the data elements along the orthogonal dimensions of the MDDB and fill the array structures thereof.
For example, revenue figures for all retail stores in a particular state (i.e. New York) would be added together to fill the state level cells in the MDDB. After the array structure in the database has been filled, integer-based indices are created and hashing algorithms are used to improve query access times. Pre-aggregation of dimension DO is always performed along the cross-section of the MDDB along the DO dimension.
As shown in Fig. 3C1 and 3C2, the raw data loaded into the MDDB is primarily organized at its lowest dimensional hierarchy, and the results of the pre-aggregations are stored in the neighboring parts of the MDDB.
As shown in Fig. 3C2, along the TIME dimension, weeks are the aggregation results of days, months are the aggregation results of weeks, and quarters are the aggregation results of months. While not shown in the figures, along the GEOGRAPHY dimension, states are the aggregation results of cities, countries are the aggregation results of states, and continents are the aggregation results of countries. By pre-aggregating (i.e. consolidating or compiling) all logical subtotals and totals along all dimensions of the MDDB, it is possible to carry out real-time MOLAP operations using a multidimensional database (MDDB) containing both basic (i.e. atomic) and pre-aggregated data. Once this compilation process has been completed, the MDDB is ready for use. Users request OLAP reports by submitting queries through the OLAP Application interface (e.g. using web-enabled client machines), and the application logic layer responds to the submitted queries by retrieving the stored data from the MDDB
for display on the client machine.
Typically, in MDDB systems, the aggregated data is very sparse, tending to explode as the number of dimension grows and dramatically slowing down the retrieval process (as described in the report entitled "Database Explosion: The OLAP Report", http://www.olapreport.com/DatabaseExplosion.htm. Quick and on line retrieval of queried data is critical in delivering on-line response for OLAP queries. Therefore, the data structure of the MDDB, and methods of its storing, indexing and handling are dictated mainly by the need of fast retrieval of massive and sparse data.
Different solutions for this problem are disclosed in the following US
Patents:
= 5,822,751 "Efficient Multidimensional Data Aggregation Operator Implementation"
= 5,805,885 "Method And System For Aggregation Objects"
= 5,781,896 "Method And System For Efficiently Performing Database Table Aggregation Using An Aggregation Index"
= 5,745,764 "Method And System For Aggregation Objects"
In all the prior art OLAP servers, the process of storing, indexing and handling MDDB utilize complex data structures to largely improve the retrieval speed, as part of the querying process, at the cost of slowing down the storing and aggregation. The query-bounded structure, that must support fast retrieval of queries in a restricting environment of high sparcity and multi-hierarchies, is not the optimal one for fast aggregation.
In addition to the aggregation process, the Aggregation, Access and Retrieval module is responsible for all data storage, retrieval and access processes.
The Logic module is responsible for the execution of OLAP queries. The Presentation module intermediates between the user and the logic module and provides an interface through which the end users view and request OLAP analyses. The client/server architecture allows multiple users to simultaneously access the multidimensional database.
In summary, general system requirements of OLAP systems include: (1) supporting sophisticated analysis, (2) scaling to large number of dimensions, and (3) supporting analysis against large atomic data sets.
MOLAP system architecture is capable of providing analytically sophisticated reports and analysis functionality. However, requirements (2) and (3) fundamentally limit MOLAP's capability, because to be effective and to meet end-user requirements, MOLAP
databases need a high degree of aggregation.
By contrast, the ROLAP system architecture allows the construction of systems requiring a low degree of aggregation, but such systems are significantly slower than systems based on MOLAP system architecture principles. The resulting long aggregation times of ROLAP systems impose severe limitations on its volumes and dimensional capabilities.
The graphs plotted in Fig. 5 clearly indicate the computational demands that are created when searching an MDDB during an OLAP session, where answers to queries are presented to the MOLAP system, and answers thereto are solicited often under real-time constraints. However, prior art MOLAP systems have limited capabilities to dynamically create data aggregations or to calculate business metrics that have not been precalculated and stored in the MDDB.
The large volumes of data and the high dimensionality of certain market segmentation applications are orders of magnitude beyond the limits of current multidimensional databases.
ROLAP is capable of higher data volumes. However, the ROLAP architecture, despite its high volume and dimensionality superiority, suffers from several significant drawbacks as compared to MOLAP:
MOLAP system architecture is capable of providing analytically sophisticated reports and analysis functionality. However, requirements (2) and (3) fundamentally limit MOLAP's capability, because to be effective and to meet end-user requirements, MOLAP
databases need a high degree of aggregation.
By contrast, the ROLAP system architecture allows the construction of systems requiring a low degree of aggregation, but such systems are significantly slower than systems based on MOLAP system architecture principles. The resulting long aggregation times of ROLAP systems impose severe limitations on its volumes and dimensional capabilities.
The graphs plotted in Fig. 5 clearly indicate the computational demands that are created when searching an MDDB during an OLAP session, where answers to queries are presented to the MOLAP system, and answers thereto are solicited often under real-time constraints. However, prior art MOLAP systems have limited capabilities to dynamically create data aggregations or to calculate business metrics that have not been precalculated and stored in the MDDB.
The large volumes of data and the high dimensionality of certain market segmentation applications are orders of magnitude beyond the limits of current multidimensional databases.
ROLAP is capable of higher data volumes. However, the ROLAP architecture, despite its high volume and dimensionality superiority, suffers from several significant drawbacks as compared to MOLAP:
= Full aggregation of large data volumes are very time consuming, otherwise, partial aggregation severely degrades the query response.
= It has a slower query response = It requires developers and end users to know SQL
= SQL is less capable of the sophisticated analytical functionality necessary for OLAP
= ROLAP provides limited application functionality Thus, improved techniques for data aggregation within MOLAP systems would appear to allow the number of dimensions of and the size of atomic (i.e. basic) data sets in the MDDB
to be significantly increased, and thus increase the usage of the MOLAP system architecture.
Also, improved techniques for data aggregation within ROLAP systems would appear ' to allow for maximized query performance on large data volumes, and reduce the time of partial aggregations that degrades query response, and thus generally benefit ROLAP system architectures.
Thus, there is a great need in the art for an improved way of and means for aggregating data elements within a multi-dimensional database (MDDB), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
Modem operational and informational database systems, as described above, typically use a database management system (DBMS) (such as an RDBMS system, object database system, or object/relational database system) as a repository for storing data and querying the data. FIG. 14 illustrates a data warehouse-OLAP domain that utilizes the prior art approaches described above.
The data warehouse is an enterprise-wide data store. It is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled "Data Integration: The Warehouse Foundation" by Louis Rollleigh and Joe Thomas, published at http ://www.acxiom.com/whitepapers/wp-11.asp .
= It has a slower query response = It requires developers and end users to know SQL
= SQL is less capable of the sophisticated analytical functionality necessary for OLAP
= ROLAP provides limited application functionality Thus, improved techniques for data aggregation within MOLAP systems would appear to allow the number of dimensions of and the size of atomic (i.e. basic) data sets in the MDDB
to be significantly increased, and thus increase the usage of the MOLAP system architecture.
Also, improved techniques for data aggregation within ROLAP systems would appear ' to allow for maximized query performance on large data volumes, and reduce the time of partial aggregations that degrades query response, and thus generally benefit ROLAP system architectures.
Thus, there is a great need in the art for an improved way of and means for aggregating data elements within a multi-dimensional database (MDDB), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
Modem operational and informational database systems, as described above, typically use a database management system (DBMS) (such as an RDBMS system, object database system, or object/relational database system) as a repository for storing data and querying the data. FIG. 14 illustrates a data warehouse-OLAP domain that utilizes the prior art approaches described above.
The data warehouse is an enterprise-wide data store. It is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled "Data Integration: The Warehouse Foundation" by Louis Rollleigh and Joe Thomas, published at http ://www.acxiom.com/whitepapers/wp-11.asp .
Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. Such OLAP systems are commonly classified as Relational OLAP systems or Multi-Dimensional OLAP systems as described above.
The Relational OLAP (ROLAP) system accesses data stored in a relational database (which is part of the Data Warehouse) to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database.
The ROLAP
architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
As described above, a typical ROLAP system has a three-tier or layer client/server architecture. The "database layer" utilizes relational databases for data storage, access, and retrieval processes. The "application logic layer" is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of "presentation layers," through which users perform OLAP analyses. After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP
engine, which then dynamically transforms the requests into SQL execution plans. The SQL
execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing pre-calculated results when they are available, or dynamically generating results from the raw information when necessary.
The Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) (or "cube") to provide OLAP analyses. The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally. Such MOLAP systems provide an interface that enables users to query the MDDB data structure such that users can "slice and dice" the aggregated data. As shown in Fig. 15, such MOLAP systems have an aggregation engine which is responsible for all data storage, access, and retrieval processes, including data aggregation (i.e. pre-aggregation) in the MDDB, and an analytical processing and GUI module responsible for interfacing with a user to provide analytical analysis, query input, and reporting of query results to the user. In a relational database, data is stored in tables. In contrast, the MDDB is a non-relational data structure - it uses other data structures, either instead of or in addition to tables - to store data.
There are other application domains where there is a great need for improved methods of and apparatus for carrying out data aggregation operations. For example, modem operational and informational databases represent such domains. As described above, modern operational and informational databases typically utilize a relational database system (RDBMS) as a repository for storing data and querying data. FIG. 16A illustrates an exemplary table in an RDBMS; and FIGS.
16B and 16C illustrate operators (queries) on the table of FIG. 16A, and the result of such queries, respectively. The operators illustrated in FIGS. 16B and 16C are expressed as Structured Query Language (SQL) statements as is conventional in the art.
The choice of using a RDBMS as the data repository in information database systems naturally stems from the realities of SQL standardization, the wealth of RDBMS-related tools, and readily available expertise in RDBMS systems. However, the querying component of RDBMS
technology suffers from performance and optimization problems stemming from the very nature of the relational data model. More specifically, during query processing, the relational data model requires a mechanism that locates the raw data elements that match the query.
Moreover, to support queries that involve aggregation operations, such aggregation operations must be performed over the raw data elements that match the query. For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that leads to unacceptable query response times.
In order to better understand how the prior art has approached this problem, it will be helpful to briefly describe the relational database model. According to the relational database model, a relational database is represented by a logical schema and tables that implement the schema. The logical schema is represented by a set of templates that define one or more dimensions (entities) and attributes associated with a given dimension. The attributes associated with a given dimension includes one or more attributes that distinguish it from every other dimension in the database (a dimension identifier). Relationships amongst dimensions are formed by joining attributes. The data structure that represents the set of templates and relations of the logical schema is typically referred to as a catalog or dictionary. Note that the logical schema represents the relational organization of the database, but does not hold any fact data per se. This fact data is stored in tables that implement the logical schema.
Star schemas are frequently used to represent the logical structure of a relational database. The basic premise of star schemas is that information can be classified into two groups: facts and dimensions. Facts are the core data elements being analyzed.
For example, units of individual item sold are facts, while dimensions are attributes about the facts. For example, dimensions are the product types purchased and the data purchase.
Business questions against this schema are asked looking up specific facts (UNITS) through a set of dimensions (MARKETS, PRODUCTS, PERIOD). The central fact table is typically much larger than any of its dimension tables.
An exemplary star schema is illustrated in FIG. 17A for suppliers (the "Supplier"
dimension) and parts (the "Parts" dimension) over time periods (the "Time-Period"
dimension). It includes a central fact table "Supplied-Parts" that relates to multiple dimensions-the "Supplier", "Parts" and "Time-Period" dimensions. FIG. 17B
illustrates the tables used to implement the star schema of FIG. 17A. More specifically, these tables include a central fact table and a dimension table for each dimension in the logical schema of FIG. 17A. A given dimension table stores rows (instances) of the dimension defined in the logical schema. For the sake of description, FIG. 17B illustrates the dimension table for the "Time-Period" dimension only. Similar dimension tables for the "Supplier"
and "Part"
dimensions (not shown) are also included in such an implementation. Each row within the central fact table includes a multi-part key associated with a set of facts (in this example, a number representing a quantity). The multi-part key of a given row (values stored in the S#, P#, TP# fields as shown) points to rows (instances) stored in the dimension tables described above. A more detailed description of star schemas and the tables used to implement star schemas may be found in C. J. Date, "An Introduction to Database Systems,"
Seventh Edition, Addison-Wesley, 2000, pp. 711-715.
12a When processing a query, the tables that implement the schema are accessed to retrieve the facts that match the query. For example, in a star schema implementation as described above, the facts are retrieved from the central fact table and/or the dimension tables. Locating the facts that match a given query involves one or more join operations.
Moreover, to support queries that involve aggregation operations, such aggregation operations must be performed over the facts that match the query. For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that typically leads to unacceptable query response times. Moreover, since the fact tables are pre-summarized and aggregated along business dimensions, these tables tend to be very large. This point becomes an important consideration of the performance issues associated with star schemas. A more detailed discussion of the performance issues (and proposed approaches that address such issues) related to joining and aggregation of star schema is now set forth.
The first performance issue arises from computationally intensive table scans that are performed by a naive implementation of data joining. Indexing schemes may be used to bypass these scans when performing joining operations. Such schemes include B-tree indexing, inverted list indexing and aggregate indexing. A more detailed description of such indexing schemes can be found in "The Art of Indexing", Dynamic Information Systems Corporation, October 1999, available at http://www.disc.com/artindex.pdf. All of these indexing schemes replaces table scan operations (involved in locating the data elements that match a query) with one ore more index lookup operation. Inverted list indexing associates an index with a group of data elements, and stores (at a location identified by the index) a group of pointers to the associated data elements.
During query processing, in the event that the query matches the index, the pointers stored in the index are used to retrieve the corresponding data elements pointed therefrom.
Aggregation indexing integrates an aggregation index with an inverted list index to provide pointers to raw data elements that require aggregation, thereby providing for dynamic summarization of the raw data elements that match the user-submitted query.
These indexing schemes are intended to improve join operations by replacing table scan operations with one or more index lookup operation in order to locate the data elements that match a query. However, these indexing schemes suffer from various performance issues as follows:
= Since the tables in the star schema design typically contain the entire hierarchy of attributes (e.g. in a PERIOD dimension, this hierarchy could be day>week>month>quarter>year), a multipart key of day, week, month, quarter, year has to be created; thus, multiple meta-data definitions are required (one of each key component) to define a single relationship; this adds to the design complexity, and sluggishness in performance.
The Relational OLAP (ROLAP) system accesses data stored in a relational database (which is part of the Data Warehouse) to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database.
The ROLAP
architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
As described above, a typical ROLAP system has a three-tier or layer client/server architecture. The "database layer" utilizes relational databases for data storage, access, and retrieval processes. The "application logic layer" is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of "presentation layers," through which users perform OLAP analyses. After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP
engine, which then dynamically transforms the requests into SQL execution plans. The SQL
execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing pre-calculated results when they are available, or dynamically generating results from the raw information when necessary.
The Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) (or "cube") to provide OLAP analyses. The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally. Such MOLAP systems provide an interface that enables users to query the MDDB data structure such that users can "slice and dice" the aggregated data. As shown in Fig. 15, such MOLAP systems have an aggregation engine which is responsible for all data storage, access, and retrieval processes, including data aggregation (i.e. pre-aggregation) in the MDDB, and an analytical processing and GUI module responsible for interfacing with a user to provide analytical analysis, query input, and reporting of query results to the user. In a relational database, data is stored in tables. In contrast, the MDDB is a non-relational data structure - it uses other data structures, either instead of or in addition to tables - to store data.
There are other application domains where there is a great need for improved methods of and apparatus for carrying out data aggregation operations. For example, modem operational and informational databases represent such domains. As described above, modern operational and informational databases typically utilize a relational database system (RDBMS) as a repository for storing data and querying data. FIG. 16A illustrates an exemplary table in an RDBMS; and FIGS.
16B and 16C illustrate operators (queries) on the table of FIG. 16A, and the result of such queries, respectively. The operators illustrated in FIGS. 16B and 16C are expressed as Structured Query Language (SQL) statements as is conventional in the art.
The choice of using a RDBMS as the data repository in information database systems naturally stems from the realities of SQL standardization, the wealth of RDBMS-related tools, and readily available expertise in RDBMS systems. However, the querying component of RDBMS
technology suffers from performance and optimization problems stemming from the very nature of the relational data model. More specifically, during query processing, the relational data model requires a mechanism that locates the raw data elements that match the query.
Moreover, to support queries that involve aggregation operations, such aggregation operations must be performed over the raw data elements that match the query. For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that leads to unacceptable query response times.
In order to better understand how the prior art has approached this problem, it will be helpful to briefly describe the relational database model. According to the relational database model, a relational database is represented by a logical schema and tables that implement the schema. The logical schema is represented by a set of templates that define one or more dimensions (entities) and attributes associated with a given dimension. The attributes associated with a given dimension includes one or more attributes that distinguish it from every other dimension in the database (a dimension identifier). Relationships amongst dimensions are formed by joining attributes. The data structure that represents the set of templates and relations of the logical schema is typically referred to as a catalog or dictionary. Note that the logical schema represents the relational organization of the database, but does not hold any fact data per se. This fact data is stored in tables that implement the logical schema.
Star schemas are frequently used to represent the logical structure of a relational database. The basic premise of star schemas is that information can be classified into two groups: facts and dimensions. Facts are the core data elements being analyzed.
For example, units of individual item sold are facts, while dimensions are attributes about the facts. For example, dimensions are the product types purchased and the data purchase.
Business questions against this schema are asked looking up specific facts (UNITS) through a set of dimensions (MARKETS, PRODUCTS, PERIOD). The central fact table is typically much larger than any of its dimension tables.
An exemplary star schema is illustrated in FIG. 17A for suppliers (the "Supplier"
dimension) and parts (the "Parts" dimension) over time periods (the "Time-Period"
dimension). It includes a central fact table "Supplied-Parts" that relates to multiple dimensions-the "Supplier", "Parts" and "Time-Period" dimensions. FIG. 17B
illustrates the tables used to implement the star schema of FIG. 17A. More specifically, these tables include a central fact table and a dimension table for each dimension in the logical schema of FIG. 17A. A given dimension table stores rows (instances) of the dimension defined in the logical schema. For the sake of description, FIG. 17B illustrates the dimension table for the "Time-Period" dimension only. Similar dimension tables for the "Supplier"
and "Part"
dimensions (not shown) are also included in such an implementation. Each row within the central fact table includes a multi-part key associated with a set of facts (in this example, a number representing a quantity). The multi-part key of a given row (values stored in the S#, P#, TP# fields as shown) points to rows (instances) stored in the dimension tables described above. A more detailed description of star schemas and the tables used to implement star schemas may be found in C. J. Date, "An Introduction to Database Systems,"
Seventh Edition, Addison-Wesley, 2000, pp. 711-715.
12a When processing a query, the tables that implement the schema are accessed to retrieve the facts that match the query. For example, in a star schema implementation as described above, the facts are retrieved from the central fact table and/or the dimension tables. Locating the facts that match a given query involves one or more join operations.
Moreover, to support queries that involve aggregation operations, such aggregation operations must be performed over the facts that match the query. For large multi-dimensional databases, a naive implementation of these operations involves computational intensive table scans that typically leads to unacceptable query response times. Moreover, since the fact tables are pre-summarized and aggregated along business dimensions, these tables tend to be very large. This point becomes an important consideration of the performance issues associated with star schemas. A more detailed discussion of the performance issues (and proposed approaches that address such issues) related to joining and aggregation of star schema is now set forth.
The first performance issue arises from computationally intensive table scans that are performed by a naive implementation of data joining. Indexing schemes may be used to bypass these scans when performing joining operations. Such schemes include B-tree indexing, inverted list indexing and aggregate indexing. A more detailed description of such indexing schemes can be found in "The Art of Indexing", Dynamic Information Systems Corporation, October 1999, available at http://www.disc.com/artindex.pdf. All of these indexing schemes replaces table scan operations (involved in locating the data elements that match a query) with one ore more index lookup operation. Inverted list indexing associates an index with a group of data elements, and stores (at a location identified by the index) a group of pointers to the associated data elements.
During query processing, in the event that the query matches the index, the pointers stored in the index are used to retrieve the corresponding data elements pointed therefrom.
Aggregation indexing integrates an aggregation index with an inverted list index to provide pointers to raw data elements that require aggregation, thereby providing for dynamic summarization of the raw data elements that match the user-submitted query.
These indexing schemes are intended to improve join operations by replacing table scan operations with one or more index lookup operation in order to locate the data elements that match a query. However, these indexing schemes suffer from various performance issues as follows:
= Since the tables in the star schema design typically contain the entire hierarchy of attributes (e.g. in a PERIOD dimension, this hierarchy could be day>week>month>quarter>year), a multipart key of day, week, month, quarter, year has to be created; thus, multiple meta-data definitions are required (one of each key component) to define a single relationship; this adds to the design complexity, and sluggishness in performance.
= Addition or deletion of levels in the hierarchy will require physical modification of the fact table, which is time consuming process that limits flexibility.
= Carrying all the segments of the compound dimensional key in the fact table increases the size of the index, thus impacting both performance and scalability.
Another performance issue arises from dimension tables that contain multiple hierarchies.
In such cases, the dimensional table often includes a level of hierarchy indicator for every record.
Every retrieval from fact table that stores details and aggregates must use the indicator to obtain the correct result, which impacts performance. The best alternative to using the level indicator is the snowflake schema. In this schema aggregate tables are created separately from the detail tables. In addition to the main fact tables, snowflake schema contains separate fact tables for each level of aggregation. Notably, the snowflake schema is even more complicated than a star schema, and often requires multiple SQL statements to get the results that are required.
Another performance issue arises from the pairwise join problem. Traditional RDBMS
engines are not design for the rich set of complex queries that are issued against a star schema. The need to retrieve related information from several tables in a single query ¨
"join processing" ¨ is severely limited. Many RDBMSs can join only two tables at a time. If a complex join involves more than two tables, the RDBMS needs to break the query into a series of pairwise joins.
Selecting the order of these joins has a dramatic performance impact. There are optimizers that spend a lot of CPU cycles to find the best order in which to execute those joins. Unfortunately, because the number of combinations to be evaluated grows exponentially with the number of tables being joined, the problem of selecting the best order of pairwise joins rarely can be solved in a reasonable amount of time.
Moreover, because the number of combinations is often too large, optimizers limit the selection on the basis of a criterion of directly related tables. In a star schema, the fact table is the only table directly related to most other tables, meaning that the fact table is a natural candidate for the first pairwise join. Unfortunately, the fact table is the very largest table in the query, so this strategy leads to selecting a pairwise join order that generates a very large intermediate result set, severely affecting query performance.
There is an optimization strategy, typically referred to as Cartesian Joins, that lessens the performance impact of the pairwise join problem by allowing joining of unrelated tables. The join to the fact table, which is the largest one, is deferred until the very end, thus reducing the size of intermediate result sets. In a join of two unrelated tables every combination of the two tables' rows is produced, a Cartesian product. Such a Cartesian product improves query performance. However, this strategy is viable only if the Cartesian product of dimension rows selected is much smaller than 5 the number of rows in the fact table. The multiplicative nature of the Cartesian join makes the optimization helpful only for relatively small databases.
In addition, systems that exploit hardware and software parallelism have been developed that lessens the performance issues set forth above. Parallelism can help reduce the execution time 10 of a single query (speed-up), or handle additional work without degrading execution time (scale-up). ). For example, Red BrickTM has developed STARjoinTm technology that provides high speed, parallelizable multi-table joins in a single pass, thus allowing more than two tables can be joined in a single operation. The core technology is an innovative approach to indexing that accelerates multiple joins. Unfortunately, parallelism can only reduce, not eliminate, the 15 performance degradation issues related to the star schema.
One of the most fundamental principles of the multidimensional database is the idea of aggregation. The most common aggregation is called a roll-up aggregation. This type is relatively easy to compute: e.g. taking daily sales totals and rolling them up into a monthly sales table. The more difficult are analytical calculations, the aggregation of Boolean and comparative operators.
However these are also considered as a subset of aggregation.
In a star schema, the results of aggregation are summary tables. Typically, summary tables are generated by database administrators who attempt to anticipate the data aggregations that the users will request, and then pre-build such tables. In such systems, when processing a user-generated query that involves aggregation operations, the pre-built aggregated data that matches the query is retrieved from the summary tables (if such data exists). FIGS. 18A
and 18B illustrate a multi-dimensional relational database using a star schema and summary tables.
In this example, the summary tables are generated over the "time" dimension storing aggregated data for "month", "quarter" and "year" time periods as shown in FIG. 18B. Summary tables are in essence additional fact tables, of higher levels. They are attached to the basic fact table creating a snowflake extension of the star schema. There are hierarchies among summary tables because users at different levels of management require different levels of summarization. Choosing the level of aggregation is accomplished via the "drill-down" feature.
= Carrying all the segments of the compound dimensional key in the fact table increases the size of the index, thus impacting both performance and scalability.
Another performance issue arises from dimension tables that contain multiple hierarchies.
In such cases, the dimensional table often includes a level of hierarchy indicator for every record.
Every retrieval from fact table that stores details and aggregates must use the indicator to obtain the correct result, which impacts performance. The best alternative to using the level indicator is the snowflake schema. In this schema aggregate tables are created separately from the detail tables. In addition to the main fact tables, snowflake schema contains separate fact tables for each level of aggregation. Notably, the snowflake schema is even more complicated than a star schema, and often requires multiple SQL statements to get the results that are required.
Another performance issue arises from the pairwise join problem. Traditional RDBMS
engines are not design for the rich set of complex queries that are issued against a star schema. The need to retrieve related information from several tables in a single query ¨
"join processing" ¨ is severely limited. Many RDBMSs can join only two tables at a time. If a complex join involves more than two tables, the RDBMS needs to break the query into a series of pairwise joins.
Selecting the order of these joins has a dramatic performance impact. There are optimizers that spend a lot of CPU cycles to find the best order in which to execute those joins. Unfortunately, because the number of combinations to be evaluated grows exponentially with the number of tables being joined, the problem of selecting the best order of pairwise joins rarely can be solved in a reasonable amount of time.
Moreover, because the number of combinations is often too large, optimizers limit the selection on the basis of a criterion of directly related tables. In a star schema, the fact table is the only table directly related to most other tables, meaning that the fact table is a natural candidate for the first pairwise join. Unfortunately, the fact table is the very largest table in the query, so this strategy leads to selecting a pairwise join order that generates a very large intermediate result set, severely affecting query performance.
There is an optimization strategy, typically referred to as Cartesian Joins, that lessens the performance impact of the pairwise join problem by allowing joining of unrelated tables. The join to the fact table, which is the largest one, is deferred until the very end, thus reducing the size of intermediate result sets. In a join of two unrelated tables every combination of the two tables' rows is produced, a Cartesian product. Such a Cartesian product improves query performance. However, this strategy is viable only if the Cartesian product of dimension rows selected is much smaller than 5 the number of rows in the fact table. The multiplicative nature of the Cartesian join makes the optimization helpful only for relatively small databases.
In addition, systems that exploit hardware and software parallelism have been developed that lessens the performance issues set forth above. Parallelism can help reduce the execution time 10 of a single query (speed-up), or handle additional work without degrading execution time (scale-up). ). For example, Red BrickTM has developed STARjoinTm technology that provides high speed, parallelizable multi-table joins in a single pass, thus allowing more than two tables can be joined in a single operation. The core technology is an innovative approach to indexing that accelerates multiple joins. Unfortunately, parallelism can only reduce, not eliminate, the 15 performance degradation issues related to the star schema.
One of the most fundamental principles of the multidimensional database is the idea of aggregation. The most common aggregation is called a roll-up aggregation. This type is relatively easy to compute: e.g. taking daily sales totals and rolling them up into a monthly sales table. The more difficult are analytical calculations, the aggregation of Boolean and comparative operators.
However these are also considered as a subset of aggregation.
In a star schema, the results of aggregation are summary tables. Typically, summary tables are generated by database administrators who attempt to anticipate the data aggregations that the users will request, and then pre-build such tables. In such systems, when processing a user-generated query that involves aggregation operations, the pre-built aggregated data that matches the query is retrieved from the summary tables (if such data exists). FIGS. 18A
and 18B illustrate a multi-dimensional relational database using a star schema and summary tables.
In this example, the summary tables are generated over the "time" dimension storing aggregated data for "month", "quarter" and "year" time periods as shown in FIG. 18B. Summary tables are in essence additional fact tables, of higher levels. They are attached to the basic fact table creating a snowflake extension of the star schema. There are hierarchies among summary tables because users at different levels of management require different levels of summarization. Choosing the level of aggregation is accomplished via the "drill-down" feature.
Summary tables containing pre-aggregated results typically provide for improved query response time with respect to on-the-fly aggregation. However, summary tables suffer from some disadvantages:
= summary tables require that database administrators anticipate the data aggregation operations that users will require; this is a difficult task in large multi-dimensional databases (for example, in data warehouses and data mining systems), where users always need to query in new ways looking for new information and patterns.
= summary tables do not provide a mechanism that allows efficient drill down to view the raw data that makes up the summary table - typically a table scan of one or more large tables is required.
= querying is delayed until pre-aggregation is completed.
= there is a heavy time overhead because the vast majority of the generated information remains unvisited.
= there is a need to synchronize the summary tables before the use.
= the degree of viable parallelism is limited because the subsequent levels of summary tables must be performed in pipeline, due to their hierarchies.
= for very large databases, this option is not valid because of time and storage space.
Note that it is common to utilize both pre-aggregated results and on-the-fly aggregation in support aggregation. In these system, partial pre-aggregation of the facts results in a small set of summary tables. On-the-fly aggregation is used in the case the required aggregated data does not exist in the summary tables.
Note that in the event that the aggregated data does not exist in the summary tables, table join operations and aggregation operations are performed over the raw facts in order to generate such aggregated data. This is typically referred to as on-the-fly aggregation.
In such instances, aggregation indexing is used to mitigate the performance of multiple data joins associated with dynamic aggregation of the raw data. Thus, in large multi-dimensional databases, such dynamic aggregation may lead to unacceptable query response times.
In view of the problems associated with joining and aggregation within RDBMS, prior art ROLAP systems have suffered from essentially the same shortcomings and drawbacks of their underlying RDBMS.
While prior art MOLAP systems provide for improved access time to aggregated data within their underlying MDD structures, and have performance advantages when carrying out joining and aggregations operations, prior art MOLAP
architectures have suffered from a number of shortcomings and drawbacks. More specifically, atomic (raw) data is moved, in a single transfer, to the MOLAP system for aggregation, analysis and querying. Importantly, the aggregation results are external to the DBMS. Thus, users of the DBMS cannot directly view these results. Such results are accessible only from the MOLAP system. Because the MDD query processing logic in prior art MOLAP
systems is separate from that of the DBMS, users must procure rights to access to the MOLAP system and be instructed (and be careful to conform to such instructions) to access the MDD (or the DBMS) under certain conditions. Such requirements can present security issues, highly undesirable for system administration. Satisfying such requirements is a costly and logistically cumbersome process. As a result, the widespread applicability of MOLAP
systems has been limited.
Thus, there is a great need in the art for an improved mechanism for joining and aggregating data elements within a database management system (e.g., RDBMS), and for integrating the improved database management system (e.g., RDBMS) into informational database systems (including the data warehouse and OLAP domains), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
SUMMARY OF PRESENT INVENTION
Illustrative embodiments may provide an improved method of and system for managing data elements within a multidimensional database (MDDB) using a novel stand-alone (i.e. external) data aggregation server, achieving a significant increase in system performance (e.g. deceased access/search time) using a stand-alone scalable data aggregation server.
= summary tables require that database administrators anticipate the data aggregation operations that users will require; this is a difficult task in large multi-dimensional databases (for example, in data warehouses and data mining systems), where users always need to query in new ways looking for new information and patterns.
= summary tables do not provide a mechanism that allows efficient drill down to view the raw data that makes up the summary table - typically a table scan of one or more large tables is required.
= querying is delayed until pre-aggregation is completed.
= there is a heavy time overhead because the vast majority of the generated information remains unvisited.
= there is a need to synchronize the summary tables before the use.
= the degree of viable parallelism is limited because the subsequent levels of summary tables must be performed in pipeline, due to their hierarchies.
= for very large databases, this option is not valid because of time and storage space.
Note that it is common to utilize both pre-aggregated results and on-the-fly aggregation in support aggregation. In these system, partial pre-aggregation of the facts results in a small set of summary tables. On-the-fly aggregation is used in the case the required aggregated data does not exist in the summary tables.
Note that in the event that the aggregated data does not exist in the summary tables, table join operations and aggregation operations are performed over the raw facts in order to generate such aggregated data. This is typically referred to as on-the-fly aggregation.
In such instances, aggregation indexing is used to mitigate the performance of multiple data joins associated with dynamic aggregation of the raw data. Thus, in large multi-dimensional databases, such dynamic aggregation may lead to unacceptable query response times.
In view of the problems associated with joining and aggregation within RDBMS, prior art ROLAP systems have suffered from essentially the same shortcomings and drawbacks of their underlying RDBMS.
While prior art MOLAP systems provide for improved access time to aggregated data within their underlying MDD structures, and have performance advantages when carrying out joining and aggregations operations, prior art MOLAP
architectures have suffered from a number of shortcomings and drawbacks. More specifically, atomic (raw) data is moved, in a single transfer, to the MOLAP system for aggregation, analysis and querying. Importantly, the aggregation results are external to the DBMS. Thus, users of the DBMS cannot directly view these results. Such results are accessible only from the MOLAP system. Because the MDD query processing logic in prior art MOLAP
systems is separate from that of the DBMS, users must procure rights to access to the MOLAP system and be instructed (and be careful to conform to such instructions) to access the MDD (or the DBMS) under certain conditions. Such requirements can present security issues, highly undesirable for system administration. Satisfying such requirements is a costly and logistically cumbersome process. As a result, the widespread applicability of MOLAP
systems has been limited.
Thus, there is a great need in the art for an improved mechanism for joining and aggregating data elements within a database management system (e.g., RDBMS), and for integrating the improved database management system (e.g., RDBMS) into informational database systems (including the data warehouse and OLAP domains), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
SUMMARY OF PRESENT INVENTION
Illustrative embodiments may provide an improved method of and system for managing data elements within a multidimensional database (MDDB) using a novel stand-alone (i.e. external) data aggregation server, achieving a significant increase in system performance (e.g. deceased access/search time) using a stand-alone scalable data aggregation server.
Other illustrative embodiments may provide such a system, wherein the stand-alone aggregation server includes an aggregation engine that is integrated with an MDDB, to provide a cartridge-style plug-in accelerator which can communicate with virtually any conventional OLAP server.
Other illustrative embodiments may provide such a stand-alone data aggregation server whose computational tasks are restricted to data aggregation, leaving all other OLAP
functions to the MOLAP server and therefore complementing OLAP server's functionality.
Other illustrative embodiments may provide such a system, wherein the stand-alone aggregation server carries out an improved method of data aggregation within the MDDB which enables the dimensions of the MDDB to be scaled up to large numbers and large atomic (i.e. base) data sets to be handled within the MDDB.
Other illustrative embodiments may provide such a stand-alone aggregation server, wherein the aggregation engine supports high-performance aggregation (i.e.
data roll-up) processes to maximize query performance of large data volumes, and to reduce the time of partial aggregations that degrades the query response.
Other illustrative embodiments may provide such a stand-alone, external scalable aggregation server, wherein its integrated data aggregation (i.e. roll-up) engine speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server for use in OLAP operations, wherein the scalability of the aggregation server enables (i) the speed of the aggregation process carried out therewithin to be substantially increased by distributing the computationally intensive tasks associated with data aggregation among multiple processors, and (ii) the large data sets contained within the MDDB of the aggregation server to be subdivided among multiple processors thus allowing the size of atomic (i.e. basic) data sets within the MDDB to be substantially increased.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server, which provides for uniform load balancing among processors for high efficiency and best performance, and linear scalability for extending the limits by adding processors.
Other illustrative embodiments may provide such a stand-alone data aggregation server whose computational tasks are restricted to data aggregation, leaving all other OLAP
functions to the MOLAP server and therefore complementing OLAP server's functionality.
Other illustrative embodiments may provide such a system, wherein the stand-alone aggregation server carries out an improved method of data aggregation within the MDDB which enables the dimensions of the MDDB to be scaled up to large numbers and large atomic (i.e. base) data sets to be handled within the MDDB.
Other illustrative embodiments may provide such a stand-alone aggregation server, wherein the aggregation engine supports high-performance aggregation (i.e.
data roll-up) processes to maximize query performance of large data volumes, and to reduce the time of partial aggregations that degrades the query response.
Other illustrative embodiments may provide such a stand-alone, external scalable aggregation server, wherein its integrated data aggregation (i.e. roll-up) engine speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server for use in OLAP operations, wherein the scalability of the aggregation server enables (i) the speed of the aggregation process carried out therewithin to be substantially increased by distributing the computationally intensive tasks associated with data aggregation among multiple processors, and (ii) the large data sets contained within the MDDB of the aggregation server to be subdivided among multiple processors thus allowing the size of atomic (i.e. basic) data sets within the MDDB to be substantially increased.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server, which provides for uniform load balancing among processors for high efficiency and best performance, and linear scalability for extending the limits by adding processors.
Other illustrative embodiments may provide a stand-alone, external scalable aggregation server, which is suitable for MOLAP as well as for ROLAP system architectures.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server, wherein an MDDB and aggregation engine are integrated and the aggregation engine carries out a high-performance aggregation algorithm and novel storing and searching methods within the MDDB.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which can be supported on single-processor (i.e. sequential or serial) computing platforms, as well as on multi-processor (i.e. parallel) computing platforms.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which can be used as a complementary aggregation plug-in to existing MOLAP and ROLAP databases.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which carries out an novel rollup (i.e. down-up) and spread down (i.e.
top-down) aggregation algorithms.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which includes an integrated MDDB and aggregation engine which carries out full pre-aggregation and/or "on-the-fly" aggregation processes within the MDDB.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server which is capable of supporting MDDB having a multi-hierarchy dimensionality.
Other illustrative embodiments may provide a novel method of aggregating multidimensional data of atomic data sets originating from a RDBMS Data Warehouse.
Other illustrative embodiments may provide a novel method of aggregating multidimensional data of atomic data sets originating from other sources, such as external ASCII files, MOLAP server, or other end user applications.
Other illustrative embodiments may provide a novel stand-alone scalable data aggregation server which can communicate with any MOLAP server via standard ODBC, OLE DB or DLL interface, in a completely transparent manner with respect to the (client) user, without any time delays in queries, equivalent to storage in MOLAP
server's cache.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of MOLAP
into large-scale applications including Banking, Insurance, Retail and Promotion Analysis.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) 5 scalable data aggregation engine which dramatically expands the boundaries of high-volatility type ROLAP applications such as, for example, the precalculation of data to maximize query performance.
Other illustrative embodiments may provide a generic plug-in cartridge-type data aggregation component, suitable for all MOLAP systems of different vendors, dramatically 10 reducing their aggregation burdens.
Other illustrative embodiments may provide a novel high performance cartridge-type data aggregation server which, having standardized interfaces, can be plugged-into the OLAP system of virtually any user or vendor.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) 15 scalable data aggregation engine which has the capacity to convert long batch-type data aggregations into interactive sessions.
In another aspect, illustrative embodiments may provide an improved method of and system for joining and aggregating data elements integrated within a database management system (DBMS) using a non-relational multi-dimensional data structure (MDDB), achieving 20 a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use.
Other illustrative embodiments may provide such an DBMS wherein its integrated data aggregation module supports high-performance aggregation (i.e. data roll-up) processes to maximize query performance of large data volumes.
Other illustrative embodiments may provide such an DBMS system, wherein its integrated data aggregation (i.e. roll-up) module speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Other illustrative embodiments may provide such a novel DBMS system for use in OLAP operations.
Other illustrative embodiments may provide a novel DBMS system having an integrated aggregation module that carries out an novel rollup (i.e. down-up) and spread down (i.e. top-down) aggregation algorithms.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server, wherein an MDDB and aggregation engine are integrated and the aggregation engine carries out a high-performance aggregation algorithm and novel storing and searching methods within the MDDB.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which can be supported on single-processor (i.e. sequential or serial) computing platforms, as well as on multi-processor (i.e. parallel) computing platforms.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which can be used as a complementary aggregation plug-in to existing MOLAP and ROLAP databases.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which carries out an novel rollup (i.e. down-up) and spread down (i.e.
top-down) aggregation algorithms.
Other illustrative embodiments may provide a novel stand-alone scalable aggregation server which includes an integrated MDDB and aggregation engine which carries out full pre-aggregation and/or "on-the-fly" aggregation processes within the MDDB.
Other illustrative embodiments may provide such a novel stand-alone scalable aggregation server which is capable of supporting MDDB having a multi-hierarchy dimensionality.
Other illustrative embodiments may provide a novel method of aggregating multidimensional data of atomic data sets originating from a RDBMS Data Warehouse.
Other illustrative embodiments may provide a novel method of aggregating multidimensional data of atomic data sets originating from other sources, such as external ASCII files, MOLAP server, or other end user applications.
Other illustrative embodiments may provide a novel stand-alone scalable data aggregation server which can communicate with any MOLAP server via standard ODBC, OLE DB or DLL interface, in a completely transparent manner with respect to the (client) user, without any time delays in queries, equivalent to storage in MOLAP
server's cache.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of MOLAP
into large-scale applications including Banking, Insurance, Retail and Promotion Analysis.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) 5 scalable data aggregation engine which dramatically expands the boundaries of high-volatility type ROLAP applications such as, for example, the precalculation of data to maximize query performance.
Other illustrative embodiments may provide a generic plug-in cartridge-type data aggregation component, suitable for all MOLAP systems of different vendors, dramatically 10 reducing their aggregation burdens.
Other illustrative embodiments may provide a novel high performance cartridge-type data aggregation server which, having standardized interfaces, can be plugged-into the OLAP system of virtually any user or vendor.
Other illustrative embodiments may provide a novel "cartridge-style" (stand-alone) 15 scalable data aggregation engine which has the capacity to convert long batch-type data aggregations into interactive sessions.
In another aspect, illustrative embodiments may provide an improved method of and system for joining and aggregating data elements integrated within a database management system (DBMS) using a non-relational multi-dimensional data structure (MDDB), achieving 20 a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use.
Other illustrative embodiments may provide such an DBMS wherein its integrated data aggregation module supports high-performance aggregation (i.e. data roll-up) processes to maximize query performance of large data volumes.
Other illustrative embodiments may provide such an DBMS system, wherein its integrated data aggregation (i.e. roll-up) module speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Other illustrative embodiments may provide such a novel DBMS system for use in OLAP operations.
Other illustrative embodiments may provide a novel DBMS system having an integrated aggregation module that carries out an novel rollup (i.e. down-up) and spread down (i.e. top-down) aggregation algorithms.
Other illustrative embodiments may provide a novel DBMS system having an integrated aggregation module that carries out full pre-aggregation and/or "on-the-fly"
aggregation processes.
Other illustrative embodiments may provide a novel DBMS system having an integrated aggregation module which is capable of supporting a MDDB having a multi-hierarchy dimensionality.
In accordance with another illustrative embodiment, there is provided a database management system (DBMS). The DBMS includes: a relational datastore storing data in tables; an aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore; a reference generating mechanism for generating a first reference to data stored in the relational datastore and a second reference to aggregated data generated by the aggregation module and stored in the non-relational datastore; and a query processing mechanism for processing query statements.
Upon identifying that a given query statement is on the second reference, the query processing mechanism communicates with the aggregation module to retrieve portions of aggregated data identified by the reference that are relevant to the given query statement.
In accordance with another illustrative embodiment, there is provided a database management system (DBMS). The DBMS includes a relational datastore storing data in tables, and an integrated aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore.
In accordance with another illustrative embodiment, there is provided, in a database management system (DBMS) including a relational datastore storing data in tables, a method for aggregating the data stored in the tables of the relational datastore and providing query access to the aggregated data. The method involves: (a) providing an integrated aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the relational datastore and storing the resultant aggregated data in a non-relational datastore; (b) in response to user input, generating a reference to aggregated data generated by the aggregation module; and (c) processing a given query statement generated in response to user input, wherein, upon identifying that the given query statement is on the reference, retrieving from the integrated aggregation module portions of aggregated data identified by the reference that are relevant to the given query statement.
In accordance with another illustrative embodiment, there is provided a method of aggregating data. The method involves: (a) loading data from a data source into a multidimensional datastore, wherein the data is logically partitioned into N
dimensions;
(b) performing a first stage of data aggregation operations along a first dimension in the multi-dimensional datastore; and (c) performing a second stage of aggregation operations for a given slice in the first dimension along N-1 dimensions other than the first dimension in the multi-dimensional datastore.
These and other aspects and features will become apparent hereinafter and in the Claims to Invention set forth herein.
BRIEF DESCRIPTION OF THE DRAWINGS
The following Detailed Description of the Illustrative Embodiments should be read in conjunction with the accompanying Drawings, wherein:
Fig. lA is a schematic representation of an exemplary prior art relational on-line analytical processing (ROLAP) system comprising a three-tier or layer client/server architecture, wherein the first tier has a database layer utilizing an RDBMS
for data storage, access, and retrieval processes, the second tier has an application logic layer (i.e.
the ROLAP engine) for executing the multidimensional reports from multiple users, and the third tier integrates the ROLAP engine with a variety of presentation layers, through which users perform OLAP analyses;
Fig. 1B is a schematic representation of a generalized embodiment of a prior art multidimensional on-line analytical processing (MOLAP) system comprising a base data loader for receiving atomic (i.e. base) data from a Data Warehouse realized by a RDBMS, an OLAP multidimensional database (MDDB), an aggregation, access and retrieval module, application logic module and presentation module associated with a conventional OLAP sever (e.g. Oracle's Express Server) for supporting on-line transactional processing (OLTP) operations on the MDDB, to service database queries and requests from a plurality of OLAP client machines typically accessing the system from an information network (e.g.
the Internet);
Fig. 2A is a schematic representation of the Data Warehouse shown in the prior art system of Fig. 1B comprising numerous data tables (e.g. Ti, T2, ............
Tn) and data field links, and the OLAP multidimensional database shown of Fig. 1B, comprising a 22a conventional page allocation table (PAT) with pointers pointing to the physical storage of variables in an information storage device;
Fig. 2B is a schematic representation of an exemplary three-dimensional MDDB
and organized as a 3-dimensional Cartesian cube and used in the prior art system of Fig.
2A, wherein the first dimension of the MDDB is representative of geography (e.g. cities, states, countries, continents), the second dimension of the MDDB is representative of time (e.g. days, weeks, months, years), the third dimension of the MDDB is representative of products (e.g. all products, by manufacturer), and the basic data element is a set of variables which are addressed by 3-dimensional coordinate values;
aggregation processes.
Other illustrative embodiments may provide a novel DBMS system having an integrated aggregation module which is capable of supporting a MDDB having a multi-hierarchy dimensionality.
In accordance with another illustrative embodiment, there is provided a database management system (DBMS). The DBMS includes: a relational datastore storing data in tables; an aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore; a reference generating mechanism for generating a first reference to data stored in the relational datastore and a second reference to aggregated data generated by the aggregation module and stored in the non-relational datastore; and a query processing mechanism for processing query statements.
Upon identifying that a given query statement is on the second reference, the query processing mechanism communicates with the aggregation module to retrieve portions of aggregated data identified by the reference that are relevant to the given query statement.
In accordance with another illustrative embodiment, there is provided a database management system (DBMS). The DBMS includes a relational datastore storing data in tables, and an integrated aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore.
In accordance with another illustrative embodiment, there is provided, in a database management system (DBMS) including a relational datastore storing data in tables, a method for aggregating the data stored in the tables of the relational datastore and providing query access to the aggregated data. The method involves: (a) providing an integrated aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the relational datastore and storing the resultant aggregated data in a non-relational datastore; (b) in response to user input, generating a reference to aggregated data generated by the aggregation module; and (c) processing a given query statement generated in response to user input, wherein, upon identifying that the given query statement is on the reference, retrieving from the integrated aggregation module portions of aggregated data identified by the reference that are relevant to the given query statement.
In accordance with another illustrative embodiment, there is provided a method of aggregating data. The method involves: (a) loading data from a data source into a multidimensional datastore, wherein the data is logically partitioned into N
dimensions;
(b) performing a first stage of data aggregation operations along a first dimension in the multi-dimensional datastore; and (c) performing a second stage of aggregation operations for a given slice in the first dimension along N-1 dimensions other than the first dimension in the multi-dimensional datastore.
These and other aspects and features will become apparent hereinafter and in the Claims to Invention set forth herein.
BRIEF DESCRIPTION OF THE DRAWINGS
The following Detailed Description of the Illustrative Embodiments should be read in conjunction with the accompanying Drawings, wherein:
Fig. lA is a schematic representation of an exemplary prior art relational on-line analytical processing (ROLAP) system comprising a three-tier or layer client/server architecture, wherein the first tier has a database layer utilizing an RDBMS
for data storage, access, and retrieval processes, the second tier has an application logic layer (i.e.
the ROLAP engine) for executing the multidimensional reports from multiple users, and the third tier integrates the ROLAP engine with a variety of presentation layers, through which users perform OLAP analyses;
Fig. 1B is a schematic representation of a generalized embodiment of a prior art multidimensional on-line analytical processing (MOLAP) system comprising a base data loader for receiving atomic (i.e. base) data from a Data Warehouse realized by a RDBMS, an OLAP multidimensional database (MDDB), an aggregation, access and retrieval module, application logic module and presentation module associated with a conventional OLAP sever (e.g. Oracle's Express Server) for supporting on-line transactional processing (OLTP) operations on the MDDB, to service database queries and requests from a plurality of OLAP client machines typically accessing the system from an information network (e.g.
the Internet);
Fig. 2A is a schematic representation of the Data Warehouse shown in the prior art system of Fig. 1B comprising numerous data tables (e.g. Ti, T2, ............
Tn) and data field links, and the OLAP multidimensional database shown of Fig. 1B, comprising a 22a conventional page allocation table (PAT) with pointers pointing to the physical storage of variables in an information storage device;
Fig. 2B is a schematic representation of an exemplary three-dimensional MDDB
and organized as a 3-dimensional Cartesian cube and used in the prior art system of Fig.
2A, wherein the first dimension of the MDDB is representative of geography (e.g. cities, states, countries, continents), the second dimension of the MDDB is representative of time (e.g. days, weeks, months, years), the third dimension of the MDDB is representative of products (e.g. all products, by manufacturer), and the basic data element is a set of variables which are addressed by 3-dimensional coordinate values;
Fig. 2C is a schematic representation of a prior art array structure associated with an exemplary three-dimensional MDDB, arranged according to a dimensional hierarchy;
Fig. 2D is a schematic representation of a prior art page allocation table for an exemplary three-dimensional MDDB, arranged according to pages of data element addresses;
Fig. 3A is a schematic representation of a prior art MOLAP system, illustrating the process of periodically storing raw data in the RDBMS Data Warehouse thereof, serially loading of basic data from the Data Warehouse to the MDDB, and the process of serially pre-aggregating (or pre-compiling) the data in the MDDB along the entire dimensional hierarchy thereof;
Fig. 3B is a schematic representation illustrating that the Cartesian addresses listed in a prior art page allocation table (PAT) point to where physical storage of data elements (i.e.
variables) occurs in the information recording media (e.g. storage volumes) associated with the MDDB, during the loading of basic data into the MDDB as well as during data preaggregation processes carried out therewithin;
Fig. 3C1 is a schematic representation of an exemplary three-dimensional database used in a conventional MOLAP system of the prior art, showing that each data element contained therein is physically stored at a location in the recording media of the system which is specified by the dimensions (and subdimensions within the dimensional hierarchy) of the data variables which are assigned integer-based coordinates in the MDDB, and also that data elements associated with the basic data loaded into the MDDB are assigned lower integer coordinates in MDDB Space than pre-aggregated data elements contained therewithin;
Fig. 3C2 is a schematic representation illustrating that a conventional hierarchy of the dimension of "time" typically contains the subdimensions "days, weeks, months, quarters, etc."
of the prior art;
Fig. 3C3 is a schematic representation showing how data elements having higher subdimensions of time in the MDDB of the prior art are typically assigned increased integer addresses along the time dimension thereof;
Fig. 2D is a schematic representation of a prior art page allocation table for an exemplary three-dimensional MDDB, arranged according to pages of data element addresses;
Fig. 3A is a schematic representation of a prior art MOLAP system, illustrating the process of periodically storing raw data in the RDBMS Data Warehouse thereof, serially loading of basic data from the Data Warehouse to the MDDB, and the process of serially pre-aggregating (or pre-compiling) the data in the MDDB along the entire dimensional hierarchy thereof;
Fig. 3B is a schematic representation illustrating that the Cartesian addresses listed in a prior art page allocation table (PAT) point to where physical storage of data elements (i.e.
variables) occurs in the information recording media (e.g. storage volumes) associated with the MDDB, during the loading of basic data into the MDDB as well as during data preaggregation processes carried out therewithin;
Fig. 3C1 is a schematic representation of an exemplary three-dimensional database used in a conventional MOLAP system of the prior art, showing that each data element contained therein is physically stored at a location in the recording media of the system which is specified by the dimensions (and subdimensions within the dimensional hierarchy) of the data variables which are assigned integer-based coordinates in the MDDB, and also that data elements associated with the basic data loaded into the MDDB are assigned lower integer coordinates in MDDB Space than pre-aggregated data elements contained therewithin;
Fig. 3C2 is a schematic representation illustrating that a conventional hierarchy of the dimension of "time" typically contains the subdimensions "days, weeks, months, quarters, etc."
of the prior art;
Fig. 3C3 is a schematic representation showing how data elements having higher subdimensions of time in the MDDB of the prior art are typically assigned increased integer addresses along the time dimension thereof;
Fig. 4 is a schematic representation illustrating that, for very large prior art MDDBs, very large page allocation tables (PATs) are required to represent the address locations of the data elements contained therein, and thus there is a need to employ address data paging techniques between the DRAM (e.g. program memory) and mass storage devices (e.g.
recording discs or RAIDs) available on the serial computing platform used to implement such prior art MOLAP systems;
Fig. 5 is a graphical representation showing how search time in a conventional (i.e.
prior art) MDDB increases in proportion to the amount of preaggregation of data therewithin;
Fig. 6A is a schematic representation of a generalized embodiment of a multidimensional on-line analytical processing (MOLAP) system of the present invention comprising a Data Warehouse realized as a relational database, a stand-alone Aggregation Server of the present invention having an integrated aggregation engine and MDDB, and an OLAP server supporting a plurality of OLAP clients, wherein the stand-alone Aggregation Server performs aggregation functions (e.g. summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and multi-dimensional data storage functions;
Fig. 6B is a schematic block diagram of the stand-alone Aggregation Server of the illustrative embodiment shown in Fig. 6A, showing its primary components, namely, a base data interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receiving RDBMS
flat files lists and other files from the Data Warehouse (RDBMS), a base data loader for receiving base data from the base data interface, configuration manager for managing the operation of the base data interface and base data loader, an aggregation engine and MDDB
handler for receiving base data from the base loader, performing aggregation operations on the base data, and storing the base data and aggregated data in the MDDB; an aggregation client interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) and input analyzer for receiving requests from OLAP client machines, cooperating with the aggregation engine and MDDB handler to generate aggregated data and/or retrieve aggregated data from the MDDB
that pertains to the received requests, and returning this aggregated back to the requesting OLAP clients; and a configuration manager for managing the operation of the input analyzer and the aggregation client interface.
Fig. 6C is a schematic representation of the software modules comprising the aggregation engine and MDDB handler of the stand-alone Aggregation Server of the illustrative embodiment of the present invention, showing a base data list structure being supplied to a hierarchy analysis and reorder module, the output thereof being transferred to an 5 aggregation management module, the output thereof being transferred to a storage module via a storage management module, and a Query Directed Roll-up (QDR) aggregation management module being provided for receiving database (DB) requests from OLAP client machines (via the aggregation client interface) and managing the operation of the aggregation and storage management modules of the present invention;
Fig. 6D is a flow chart representation of the primary operations carried out by the (DB) request serving mechanism within the QDR aggregation management module shown in Fig.
6C;
Fig. 7A is a schematic representation of a separate-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment of Fig. 6B and a conventional OLAP server supporting a plurality of client machines, wherein base data from a Data Warehouse is shown being received by the aggregation server, realized on a first hardware/software platform (i.e. Platform A) and the stand-alone Aggregation Server is shown serving the conventional OLAP server, realized on a second hardware/software platform (i.e.
Platform B), as well as serving data aggregation requirements of other clients supporting diverse applications such as spreadsheet, GUI front end, and applications;
Fig. 7B is a schematic representation of a shared-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment of Fig. 6B and a conventional OLAP server supporting a plurality of client machines, wherein base data from a Data Warehouse is shown being received by the stand-alone Aggregation Server, realized on a common hardware/software platform and the aggregation server is shown serving the conventional OLAP server, realized on the same common hardware/software platform, as well as serving data aggregation requirements of other clients supporting diverse applications such as spreadsheet, GUI front end, and applications;
Fig. 8A is a data table setting forth information representative of performance benchmarks obtained by the shared-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment serving the conventional OLAP server (i.e.
Oracle EXPRESS Server) shown in Fig. 7B, wherein the common hardware/software platform is realized using a Pentium II 450Mhz, 1GB RAM, 18GB Disk, running the Microsoft NT operating system (OS);
Fig. 9A is a schematic representation of the first stage in the method of segmented aggregation according to the principles of the present invention, showing initial aggregation along the 1st dimension;
Fig. 9B is a schematic representation of the next stage in the method of segmented aggregation according to the principles of the present invention, showing that any segment along dimension 1, such as the shown slice, can be separately aggregated along the remaining dimensions, 2 and 3, and that in general, for an N dimensional system, the second stage involves aggregation in N-1 dimensions. The principle of segmentation can be applied on the first stage as well, however, only a large enough data will justify such a sliced procedure in the first dimension. Actually, it is possible to consider each segment as an N-1 cube, enabling recursive computation.
Fig. 9C1 is a schematic representation of the Query Directed Roll-up (QDR) aggregation method/procedure of the present invention, showing data aggregation starting from existing basic data or previously aggregated data in the first dimension (D1), and such aggregated data being utilized as a basis for QDR aggregation along the second dimension (D2);
Fig. 9C2 is a schematic representation of the Query Directed Roll-up (QDR) aggregation method/procedure of the present invention, showing initial data aggregation starting from existing previously aggregated data in the second third (D3), and continuing along the third dimension (D3), and thereafter continuing aggregation along the second dimension (D2);
Fig. 10A is a schematic representation of the "slice-storage" method of storing sparse data in the disk storage devices of the MDDB of Fig. 6B in accordance with the principles of the present invention, based on an ascending-ordered index along aggregation direction, enabling fast retrieval of data;
Fig. 10B is a schematic representation of the data organization of data files and the directory file used in the storages of the MDDB of Fig. 6B, and the method of searching for a queried data point therein using a simple binary search technique due to the data files ascending order;
Fig. 11A is a schematic representation of three exemplary multi-hierarchical data structures for storage of data within the MDDB of Fig. 6B, having three levels of hierarchy, wherein the first level representative of base data is composed of items A,B,F, and G, the second level is composed of items C,E,H and I, and the third level is composed of a single item D, which is common to all three hierarchical structures;
Fig. 11B is a schematic representation of an optimized multi-hierarchical data structure merged from all three hierarchies of Fig. 11A, in accordance with the principles of the present invention;
Fig. 11C(i) through 11C(ix) represent a flow chart description (and accompanying data structures) of the operations of an exemplary hierarchy transformation mechanism of the present invention that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies.
Fig. 12 is a schematic representation showing the levels of operations performed by the stand-alone Aggregation Server of Fig. 6B, summarizing the different enabling components for carrying out the method of segmented aggregation in accordance with the principles of the present invention;
Fig. 13 is a schematic representation of the stand-alone Aggregation Server of the present invention shown as a component of a central data warehouse, serving the data aggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAP systems and OLAP systems alike;
Fig. 14 is a schematic representation of a prior art information database system, wherein the present invention may be embodied;
Fig. 15 is a schematic representation of the prior art data warehouse and OLAP
system, wherein the present invention may be embodied;
Figs. 16A-16C are schematic representations of exemplary tables employed in a prior art Relational Database Management System (RDBMS); Figs. 16B and 16C illustrate operators (queries) on the table of Fig. 16A, and the result of such queries, respectively;
Fig. 17A is a schematic representation of an exemplary dimensional schema (star schema) of a relational database;
Fig. 17B is a schematic representation of tables used to implement the schema shown in Fig. 17A;
Fig. 18A is a schematic representation of an exemplary multidimensional schema (star schema);
Fig. 18B is a schematic representation of tables used to implement the schema of Fig. 18A, including summary tables storing results of aggregation operations performed on the facts of the central fact table along the time-period dimension, in accordance with conventional teachings;
Fig. 19A is a schematic representation of an exemplary embodiment of a DBMS
(for example, an RDBMS as shown) of the present invention comprising a relational datastore and an integrated multidimensional (MDD) aggregation module supporting queries from a plurality of clients, wherein the aggregation engine performs aggregation functions (e.g.
summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and non-relational multi-dimensional data storage functions.
Fig. 19B is a schematic block diagram of the MDD aggregation module of the illustrative embodiment of the present invention shown in Fig. 6A.
Fig. 19C(i) and 19C(ii), taken together, set forth a flow chart representation of the primary operations carried out within the DBMS of the present invention when performing data aggregation and related support operations, including the servicing of user-submitted (e.g. natural language) queries made on such aggregated database of the present invention.
Fig. 19D is a flow chart representation of the primary operations carried out by the (DB) request serving mechanism within the MDD control module shown in Fig. 6B.
Fig. 19E is a schematic representation of the view mechanism of an DBMS that enables users to query on the aggregated data generated and/or stored in the MDD
Aggregation module according to the present invention.
Fig. 19F is a schematic representation of the trigger mechanism of the DBMS
that enables users to query on the aggregated data generated and/or stored in the MDD
Aggregation module according to the present invention.
Fig. 19G is a schematic representation of the DBMS of the present invention, illustrating a logically partitioning into a relational part and a non-relational part. The relational part includes the relational data store (e.g., table(s) and dictionary) and support mechanisms (e.g., query handling services). The non-relational part includes the MDD Aggregation Module. Data flows bidirectionally between the relational part and the non-relational part as shown.
Fig. 20A shows a separate-platform type implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational datastore and support mechanisms (e.g., query handling, fact table(s) and dictionary of the DBMS) reside on a separate hardware platform and/or OS system from that used to run the MDD Aggregation Module of the present invention.
Fig. 20B shows a common-platform type implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational datastore and support mechanisms (e.g., query handling, fact table(s) and dictionary of the DBMS) share the same hardware platform and operating system (OS) that is used to run the MDD
Aggregation Module of the present invention.
Fig. 21 is a schematic representation of the DBMS of the present invention shown as a component of a central data warehouse, serving the data storage and aggregation needs of a ROLAP system (or other OLAP system).
Fig. 22 is a schematic representation of the DBMS of the present invention shown as a component of a central data warehouse, wherein the DBMS includes integrated OLAP Analysis Logic (and preferably an integrated Presentation Module) that operates cooperatively with the query handling of the DBMS system and the MDD Aggregation Module to enable users of the DBMS system to execute multidimensional reports (e.g., ratios, ranks, transforms, dynamic consolidation, complex filtering, forecasts, query governing, scheduling, flow control, pre-aggregate inferencing, denormalization support, and/or table partitioning and joins) and preferably perform traditional OLAP analyses (grids, graphs, maps, alerts, drill-down, data pivot, data surf, 5 slice and dice, print).
DETAILED DESCRIPTION OF THE PREFERRED
EMBODIMENTS OF THE PRESENT INVENTION
10 Referring now to Figs. 6A through 13, the preferred embodiments of the method and system of the present invention will be now described in great detail hereinbelow, wherein like elements in the Drawings shall be indicated by like reference numerals.
Through this invention disclosure, the term "aggregation" and "preaggregation"
shall be 15 understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.
In general, the stand-alone aggregation server and methods of and apparatus for data aggregation of the present invention can be employed in a wide range of applications, including 20 MOLAP systems, ROLAP systems, Internet URL-directory systems, personalized on-line e-commerce shopping systems, Internet-based systems requiring real-time control of packet routing and/or switching, and the like.
For purposes of illustration, initial focus will be accorded to improvements in MOLAP
recording discs or RAIDs) available on the serial computing platform used to implement such prior art MOLAP systems;
Fig. 5 is a graphical representation showing how search time in a conventional (i.e.
prior art) MDDB increases in proportion to the amount of preaggregation of data therewithin;
Fig. 6A is a schematic representation of a generalized embodiment of a multidimensional on-line analytical processing (MOLAP) system of the present invention comprising a Data Warehouse realized as a relational database, a stand-alone Aggregation Server of the present invention having an integrated aggregation engine and MDDB, and an OLAP server supporting a plurality of OLAP clients, wherein the stand-alone Aggregation Server performs aggregation functions (e.g. summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and multi-dimensional data storage functions;
Fig. 6B is a schematic block diagram of the stand-alone Aggregation Server of the illustrative embodiment shown in Fig. 6A, showing its primary components, namely, a base data interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receiving RDBMS
flat files lists and other files from the Data Warehouse (RDBMS), a base data loader for receiving base data from the base data interface, configuration manager for managing the operation of the base data interface and base data loader, an aggregation engine and MDDB
handler for receiving base data from the base loader, performing aggregation operations on the base data, and storing the base data and aggregated data in the MDDB; an aggregation client interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) and input analyzer for receiving requests from OLAP client machines, cooperating with the aggregation engine and MDDB handler to generate aggregated data and/or retrieve aggregated data from the MDDB
that pertains to the received requests, and returning this aggregated back to the requesting OLAP clients; and a configuration manager for managing the operation of the input analyzer and the aggregation client interface.
Fig. 6C is a schematic representation of the software modules comprising the aggregation engine and MDDB handler of the stand-alone Aggregation Server of the illustrative embodiment of the present invention, showing a base data list structure being supplied to a hierarchy analysis and reorder module, the output thereof being transferred to an 5 aggregation management module, the output thereof being transferred to a storage module via a storage management module, and a Query Directed Roll-up (QDR) aggregation management module being provided for receiving database (DB) requests from OLAP client machines (via the aggregation client interface) and managing the operation of the aggregation and storage management modules of the present invention;
Fig. 6D is a flow chart representation of the primary operations carried out by the (DB) request serving mechanism within the QDR aggregation management module shown in Fig.
6C;
Fig. 7A is a schematic representation of a separate-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment of Fig. 6B and a conventional OLAP server supporting a plurality of client machines, wherein base data from a Data Warehouse is shown being received by the aggregation server, realized on a first hardware/software platform (i.e. Platform A) and the stand-alone Aggregation Server is shown serving the conventional OLAP server, realized on a second hardware/software platform (i.e.
Platform B), as well as serving data aggregation requirements of other clients supporting diverse applications such as spreadsheet, GUI front end, and applications;
Fig. 7B is a schematic representation of a shared-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment of Fig. 6B and a conventional OLAP server supporting a plurality of client machines, wherein base data from a Data Warehouse is shown being received by the stand-alone Aggregation Server, realized on a common hardware/software platform and the aggregation server is shown serving the conventional OLAP server, realized on the same common hardware/software platform, as well as serving data aggregation requirements of other clients supporting diverse applications such as spreadsheet, GUI front end, and applications;
Fig. 8A is a data table setting forth information representative of performance benchmarks obtained by the shared-platform type implementation of the stand-alone Aggregation Server of the illustrative embodiment serving the conventional OLAP server (i.e.
Oracle EXPRESS Server) shown in Fig. 7B, wherein the common hardware/software platform is realized using a Pentium II 450Mhz, 1GB RAM, 18GB Disk, running the Microsoft NT operating system (OS);
Fig. 9A is a schematic representation of the first stage in the method of segmented aggregation according to the principles of the present invention, showing initial aggregation along the 1st dimension;
Fig. 9B is a schematic representation of the next stage in the method of segmented aggregation according to the principles of the present invention, showing that any segment along dimension 1, such as the shown slice, can be separately aggregated along the remaining dimensions, 2 and 3, and that in general, for an N dimensional system, the second stage involves aggregation in N-1 dimensions. The principle of segmentation can be applied on the first stage as well, however, only a large enough data will justify such a sliced procedure in the first dimension. Actually, it is possible to consider each segment as an N-1 cube, enabling recursive computation.
Fig. 9C1 is a schematic representation of the Query Directed Roll-up (QDR) aggregation method/procedure of the present invention, showing data aggregation starting from existing basic data or previously aggregated data in the first dimension (D1), and such aggregated data being utilized as a basis for QDR aggregation along the second dimension (D2);
Fig. 9C2 is a schematic representation of the Query Directed Roll-up (QDR) aggregation method/procedure of the present invention, showing initial data aggregation starting from existing previously aggregated data in the second third (D3), and continuing along the third dimension (D3), and thereafter continuing aggregation along the second dimension (D2);
Fig. 10A is a schematic representation of the "slice-storage" method of storing sparse data in the disk storage devices of the MDDB of Fig. 6B in accordance with the principles of the present invention, based on an ascending-ordered index along aggregation direction, enabling fast retrieval of data;
Fig. 10B is a schematic representation of the data organization of data files and the directory file used in the storages of the MDDB of Fig. 6B, and the method of searching for a queried data point therein using a simple binary search technique due to the data files ascending order;
Fig. 11A is a schematic representation of three exemplary multi-hierarchical data structures for storage of data within the MDDB of Fig. 6B, having three levels of hierarchy, wherein the first level representative of base data is composed of items A,B,F, and G, the second level is composed of items C,E,H and I, and the third level is composed of a single item D, which is common to all three hierarchical structures;
Fig. 11B is a schematic representation of an optimized multi-hierarchical data structure merged from all three hierarchies of Fig. 11A, in accordance with the principles of the present invention;
Fig. 11C(i) through 11C(ix) represent a flow chart description (and accompanying data structures) of the operations of an exemplary hierarchy transformation mechanism of the present invention that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies.
Fig. 12 is a schematic representation showing the levels of operations performed by the stand-alone Aggregation Server of Fig. 6B, summarizing the different enabling components for carrying out the method of segmented aggregation in accordance with the principles of the present invention;
Fig. 13 is a schematic representation of the stand-alone Aggregation Server of the present invention shown as a component of a central data warehouse, serving the data aggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAP systems and OLAP systems alike;
Fig. 14 is a schematic representation of a prior art information database system, wherein the present invention may be embodied;
Fig. 15 is a schematic representation of the prior art data warehouse and OLAP
system, wherein the present invention may be embodied;
Figs. 16A-16C are schematic representations of exemplary tables employed in a prior art Relational Database Management System (RDBMS); Figs. 16B and 16C illustrate operators (queries) on the table of Fig. 16A, and the result of such queries, respectively;
Fig. 17A is a schematic representation of an exemplary dimensional schema (star schema) of a relational database;
Fig. 17B is a schematic representation of tables used to implement the schema shown in Fig. 17A;
Fig. 18A is a schematic representation of an exemplary multidimensional schema (star schema);
Fig. 18B is a schematic representation of tables used to implement the schema of Fig. 18A, including summary tables storing results of aggregation operations performed on the facts of the central fact table along the time-period dimension, in accordance with conventional teachings;
Fig. 19A is a schematic representation of an exemplary embodiment of a DBMS
(for example, an RDBMS as shown) of the present invention comprising a relational datastore and an integrated multidimensional (MDD) aggregation module supporting queries from a plurality of clients, wherein the aggregation engine performs aggregation functions (e.g.
summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and non-relational multi-dimensional data storage functions.
Fig. 19B is a schematic block diagram of the MDD aggregation module of the illustrative embodiment of the present invention shown in Fig. 6A.
Fig. 19C(i) and 19C(ii), taken together, set forth a flow chart representation of the primary operations carried out within the DBMS of the present invention when performing data aggregation and related support operations, including the servicing of user-submitted (e.g. natural language) queries made on such aggregated database of the present invention.
Fig. 19D is a flow chart representation of the primary operations carried out by the (DB) request serving mechanism within the MDD control module shown in Fig. 6B.
Fig. 19E is a schematic representation of the view mechanism of an DBMS that enables users to query on the aggregated data generated and/or stored in the MDD
Aggregation module according to the present invention.
Fig. 19F is a schematic representation of the trigger mechanism of the DBMS
that enables users to query on the aggregated data generated and/or stored in the MDD
Aggregation module according to the present invention.
Fig. 19G is a schematic representation of the DBMS of the present invention, illustrating a logically partitioning into a relational part and a non-relational part. The relational part includes the relational data store (e.g., table(s) and dictionary) and support mechanisms (e.g., query handling services). The non-relational part includes the MDD Aggregation Module. Data flows bidirectionally between the relational part and the non-relational part as shown.
Fig. 20A shows a separate-platform type implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational datastore and support mechanisms (e.g., query handling, fact table(s) and dictionary of the DBMS) reside on a separate hardware platform and/or OS system from that used to run the MDD Aggregation Module of the present invention.
Fig. 20B shows a common-platform type implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational datastore and support mechanisms (e.g., query handling, fact table(s) and dictionary of the DBMS) share the same hardware platform and operating system (OS) that is used to run the MDD
Aggregation Module of the present invention.
Fig. 21 is a schematic representation of the DBMS of the present invention shown as a component of a central data warehouse, serving the data storage and aggregation needs of a ROLAP system (or other OLAP system).
Fig. 22 is a schematic representation of the DBMS of the present invention shown as a component of a central data warehouse, wherein the DBMS includes integrated OLAP Analysis Logic (and preferably an integrated Presentation Module) that operates cooperatively with the query handling of the DBMS system and the MDD Aggregation Module to enable users of the DBMS system to execute multidimensional reports (e.g., ratios, ranks, transforms, dynamic consolidation, complex filtering, forecasts, query governing, scheduling, flow control, pre-aggregate inferencing, denormalization support, and/or table partitioning and joins) and preferably perform traditional OLAP analyses (grids, graphs, maps, alerts, drill-down, data pivot, data surf, 5 slice and dice, print).
DETAILED DESCRIPTION OF THE PREFERRED
EMBODIMENTS OF THE PRESENT INVENTION
10 Referring now to Figs. 6A through 13, the preferred embodiments of the method and system of the present invention will be now described in great detail hereinbelow, wherein like elements in the Drawings shall be indicated by like reference numerals.
Through this invention disclosure, the term "aggregation" and "preaggregation"
shall be 15 understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.
In general, the stand-alone aggregation server and methods of and apparatus for data aggregation of the present invention can be employed in a wide range of applications, including 20 MOLAP systems, ROLAP systems, Internet URL-directory systems, personalized on-line e-commerce shopping systems, Internet-based systems requiring real-time control of packet routing and/or switching, and the like.
For purposes of illustration, initial focus will be accorded to improvements in MOLAP
25 systems, in which knowledge workers are enabled to intuitively, quickly, and flexibly manipulate operational data within a MDDB using familiar business terms in order to provide analytical insight into a business domain of interest.
Fig. 6A illustrates a generalized embodiment of a multidimensional on-line analytical 30 processing (MOLAP) system of the present invention comprising: a Data Warehouse realized as a relational database; a stand-alone cartridge-style Aggregation Server of the present invention having an integrated aggregation engine and a MDDB; and an OLAP
server communicating with the Aggregation Server, and supporting a plurality of OLAP
clients. In accordance with the principles of the present invention, the stand-alone Aggregation Server performs aggregation functions (e.g. summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and multi-dimensional data storage functions.
Departing from conventional practices, the principles of the present invention teaches moving the aggregation engine and the MDDB into a separate Aggregation Server having standardized interfaces so that it can be plugged-into the OLAP server of virtually any user or vendor. This dramatic move discontinues the restricting dependency of aggregation from the analytical functions of OLAP, and by applying novel and independent algorithms. The stand-alone data aggregation server enables efficient organization and handling of data, fast aggregation processing, and fast access to and retrieval of any data element in the MDDB.
As will be described in greater detail hereinafter, the Aggregation Server of the present invention can serve the data aggregation requirements of other types of systems besides OLAP
systems such as, for example, URL directory management Data Marts, RDBMS, or ROLAP
systems.
The Aggregation Server of the present invention excels in performing two distinct functions, namely: the aggregation of data in the MDDB; and the handling of the resulting data base in the MDDB, for "on demand" client use. In the case of serving an OLAP
server, the Aggregation Server of the present invention focuses on performing these two functions in a high performance manner (i.e. aggregating and storing base data, originated at the Data Warehouse, in a multidimensional storage (MDDB), and providing the results of this data aggregation process "on demand" to the clients, such as the OLAP server, spreadsheet applications, the end user applications. As such, the Aggregation Server of the present invention frees each conventional OLAP server, with which it interfaces, from the need of making data aggregations, and therefore allows the conventional OLAP server to concentrate on the primary functions of OLAP servers, namely: data analysis and supporting a graphical interface with the user client.
Fig. 6B shows the primary components of the stand-alone Aggregation Server of the illustrative embodiment, namely: a base data interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receiving RDBMS flat files lists and other files from the Data Warehouse (RDBMS), a base data loader for receiving base data from the base data interface, configuration manager for managing the operation of the base data interface and base data loader, an aggregation engine for receiving base data from the base loader, a multi-dimensional database (MDDB); a MDDB handler, an input analyzer, an aggregation client interface (e.g. OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.) and a configuration manager for managing the operation of the input analyzer and the aggregation client interface.
During operation, the base data originates at data warehouse or other sources, such as external ASCII files, MOLAP server, or others. The Configuration Manager, in order to enable proper communication with all possible sources and data structures, configures two blocks, the Base Data Interface and Data Loader. Their configuration is matched with different standards such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.
As shown in Fig. 6B, the core of the data Aggregation Server of the present invention comprises: a data Aggregation Engine; a Multidimensional Data Handler (MDDB Handler); and a Multidimensional Data Storage (MDDB). The results of data aggregation are efficiently stored in the MDDB by the MDDB Handler.
As shown in Figs. 6A and 6B, the stand-alone Aggregation Server of the present invention serves the OLAP Server (or other requesting computing system) via an aggregation client interface, which preferably conforms to standard interface protocols such as OLDB, OLE-DB, ODBC, SQL, JDBC, an API, etc. Aggregation results required by the OLAP server are supplied on demand. Typically, the OLAP Server disintegrates the query, via parsing process, into series of requests. Each such request, specifying a n-dimensional coordinate, is presented to the Aggregation Server. The Configuration Manager sets the Aggregation Client Interface and Input Analyzer for a proper communication protocol according to the client user. The Input Analyzer converts the input format to make it suitable for the MDDB Handler.
Illustrative embodiments may make the transfer of data completely transparent to the OLAP user, in a manner which is equivalent to the storing of data in the MOLAP
server's cache and without any query delays. This requires that the stand-alone Aggregation Server have exceptionally fast response characteristics. This is enabled by providing the unique data structure and aggregation mechanism of the present invention.
Fig. 6C shows the software modules comprising the aggregation engine and MDDB
handler components of the stand-alone Aggregation Server of the illustrative embodiment.
The base data list, as it arrives from RDBMS or text files, has to be analyzed and reordered to optimize hierarchy handling, according to the unique method of the present invention, as described later with reference to Figs. 11A and 11B.
The function of the aggregation management module is to administrate the aggregation process according to the method illustrated in Figs. 9A and 9B.
In accordance with the principles of the present invention, data aggregation within the stand-alone Aggregation Server can be carried out either as a complete pre-aggregation process, where the base data is fully aggregated before commencing querying, or as a query directed roll-up (QDR) process, where querying is allowed at any stage of aggregation using the "on-the-fly" data aggregation process of the present invention. The QDR
process will be described hereinafter in greater detail with reference to Fig. 9C. The response to a request (i.e.
a basic component of a client query), by calling the Aggregation management module for "on-the-fly" data aggregation, or for accessing pre-aggregated result data via the Storage management module. The query/request serving mechanism of the present invention within the QDR aggregation management module is illustrated in the flow chart of Fig. 6D.
The function of the Storage management module is to handle multidimensional data in the storage(s) module in a very efficient way, according to the novel method of the present invention, which will be described in detail hereinafter with reference to Figs. 10A and 10B.
The request serving mechanism shown in Fig. 6D is controlled by the QDR
aggregation management module. Requests are queued and served one by one. If the required data is already pre-calculated, then it is retrieved by the storage management module and returned to the client. Otherwise, the required data is calculated "on-the-fly" by the aggregation management module, and the result moved out to the client, while simultaneously stored by the storage management module, shown in Fig. 6C.
Figs. 7A and 7B outline two different implementations of the stand-alone (cartridge-style) Aggregation Server of the present invention. In both implementations, the Aggregation Server supplies aggregated results to a client.
Fig. 7A shows a separate-platform type implementation of the MOLAP system of the illustrative embodiment shown in Fig. 6A, wherein the Aggregation Server of the present invention resides on a separate hardware platform and OS system from that used to run the OLAP server. In this type of implementation, it is even possible to run the Aggregation Server and the OLAP Server on different-type operating systems (e.g. NT, Unix, MAC
OS).
Fig. 7B shows a common-platform type implementation of the MOLAP system of the illustrative embodiment shown in Fig. 6B, wherein the Aggregation Server of the present invention and OLAP Server share the same hardware platform and operating system (OS).
Fig. 8A shows a table setting forth the benchmark results of an aggregation engine, implemented on a shared/common hardware platform and OS, in accordance with the principles of the present invention. The common platform and OS is realized using a Pentium II 450Mhz, 1GB RAM, 18GB Disk, running the Microsoft NT operating system. The six (6) data sets shown in the table differ in number of dimensions, number of hierarchies, measure of sparcity and data size. A comparison with ORACLE Express, a major OLAP server, is made.
It is evident that the aggregation engine of the present invention outperforms currently leading aggregation technology by more than an order of magnitude.
The segmented data aggregation method of the present invention is described in Figs.
9A through 9C2. These figures outline a simplified setting of three dimensions only; however, the following analysis applies to any number of dimensions as well.
The data is being divided into autonomic segments to minimize the amount of simultaneously handled data. The initial aggregation is practiced on a single dimension only, while later on the aggregation process involves all other dimensions.
At the first stage of the aggregation method, an aggregation is performed along dimension 1. The first stage can be performed on more than one dimension. As shown in Fig.
9A, the space of the base data is expanded by the aggregation process.
In the next stage shown in Fig. 9B, any segment along dimension 1, such as the shown slice, can be separately aggregated along the remaining dimensions, 2 and 3.
In general, for an N dimensional system, the second stage involves aggregation in N-1 dimensions.
5 The principle of data segmentation can be applied on the first stage as well. However, only a large enough data set will justify such a sliced procedure in the first dimension.
Actually, it is possible to consider each segment as an N-1 cube, enabling recursive computation.
10 It is imperative to get aggregation results of a specific slice before the entire aggregation is completed, or alternatively, to have the roll-up done in a particular sequence.
This novel feature of the aggregation method of the present invention is that it allows the querying to begin, even before the regular aggregation process is accomplished, and still having fast response. Moreover, in relational OLAP and other systems requiring only partial 15 aggregations, the QDR process dramatically speeds up the query response.
The QDR process is made feasible by the slice-oriented roll-up method of the present invention. After aggregating the first dimension(s), the multidimensional space is composed of independent multidimensional cubes (slices). These cubes can be processed in any arbitrary 20 sequence.
Consequently the aggregation process of the present invention can be monitored by means of files, shared memory sockets, or queues to statically or dynamically set the roll-up order.
In order to satisfy a single query coming from a client, before the required aggregation result has been prepared, the QDR process of the present invention involves performing a fast on-the-fly aggregation (roll-up) involving only a thin slice of the multidimensional data.
Fig. 9C1 shows a slice required for building-up a roll-up result of the 2nd dimension. In case 1, as shown, the aggregation starts from an existing data, either basic or previously aggregated in the first dimension. This data is utilized as a basis for QDR
aggregation along the second dimension. In case 2, due to lack of previous data, a QDR involves an initial slice aggregation along dimension 3, and thereafter aggregation along the 211c1 dimension.
Fig. 9C2 shows two corresponding QDR cases for gaining results in the 3d dimension.
Cases 1 and 2 differ in the amount of initial aggregation required in 2"
dimension.
Fig. 10A illustrates the "Slice-Storage" method of storing sparse data on storage disks.
In general, this data storage method is based on the principle that an ascending-ordered index along aggregation direction, enables fast retrieval of data. Fig. 10A
illustrates a unit-wide slice of the multidimensional cube of data. Since the data is sparse, only few non-NA data points exist. These points are indexed as follows. The Data File consists of data records, in which each n-I dimensional slice is being stored, in a separate record. These records have a varying length, according to the amount of non-NA stored points. For each registered point in the record, /NDk stands for an index in a n-dimensional cube, and Data stands for the value of a given point in the cube.
Fig. 10B illustrates a novel method for randomly searching for a queried data point in the MDDB of Fig. 6B by using a novel technique of organizing data files and the directory file used in the storages of the MDDB, so that a simple binary search technique can then be employed within the Aggregation Server of the prsent invention. According to this method, a metafile termed DIR File, keeps pointers to Data Files as well as additional parameters such as the start and end addresses of data record (/NDo, /ND) , its location within the Data File, record size (n), file's physical address on disk (D Path), and auxiliary information on the record (Flags).
A search for a queried data point is then performed by an access to the DIR
file. The search along the file can be made using a simple binary search due to file's ascending order.
When the record is found, it is then loaded into main memory to search for the required point, characterized by its index /NDk. The attached Data field represents the queried value. In case the exact index is not found, it means that the point is a NA.
In another aspect of the present invention, a novel method is provided for optimally merging multiple hierarchies in multi-hierarchical structures. The method, illustrated in Figs.
11A, 11B, and 11C is preferably used by the Aggregation Server of the present invention in processing the table data (base data), as it arrives from RDBMS.
According to the devised method, the inner order of hierarchies within a dimension is optimized, to achieve efficient data handling for summations and other mathematical formulas (termed in general "Aggregation"). The order of hierarchy is defined externally. It is brought from a data source to the stand-alone aggregation engine, as a descriptor of data, before the data itself In the illustrative embodiment, the method assumes hierarchical relations of the data, as shown in Fig. 11A. The way data items are ordered in the memory space of the Aggregation Server, with regard to the hierarchy, has a significant impact on its data handling efficiency.
Notably, when using prior art techniques, multiple handling of data elements, which occurs when a data element is accessed more than once during aggregation process, has been hitherto unavoidable when the main concern is to effectively handle the sparse data. The data structures used in prior art data handling methods have been designed for fast access to a non NA data. According to prior art techniques, each access is associated with a timely search and retrieval in the data structure. For the massive amount of data typically accessed from a Data Warehouse in an OLAP application, such multiple handling of data elements has significantly degraded the efficiency of prior art data aggregation processes. When using prior art data handling techniques, the data element D shown in Fig. 11A must be accessed three times, causing poor aggregation performance.
In accordance with the data handling method of the present invention, the data is being pre-ordered for a singular handling, as opposed to multiple handling taught by prior art methods. According to the present invention, elements of base data and their aggregated results are contiguously stored in a way that each element will be accessed only once. This particular order allows a forward-only handling, never backward. Once a base data element is stored, or aggregated result is generated and stored, it is never to be retrieved again for further aggregation. As a result the storage access is minimized. This way of singular handling greatly elevates the aggregation efficiency of large data bases. An efficient handling method as used in the present invention, is shown in Fig. 7A. The data element D, as any other element, is accessed and handled only once.
Fig. 11A shows an example of a multi-hierarchical database structure having 3 hierarchies. As shown, the base data has a dimension that includes items A,B,F, and G., The second level is composed of items C,E,H and I. The third level has a single item D, which is common to all three hierarchical structures. In accordance with the method of the present invention, a minimal computing path is always taken. For example, according to the method of the present invention, item D will be calculated as part of structure 1, requiring two mathematical operations only, rather than as in structure 3, which would need four mathematical operations. Fig. 11B depicts an optimized structure merged from all three hierarchies.
Fig. 11C(i) through 11C(ix) represent a flow chart description (and accompanying data structures) of the operations of an exemplary hierarchy transformation mechanism of the present invention that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies. For the sake of description, the data structures correspond to exemplary hierarchical structures described above with respect to Figs.
11(A) and 11(B). As illustrated in Fig. 11C(i), in step 1101, a catalogue is loaded from the DBMS system. As is conventional, the catalogue includes data ("hierarchy descriptor data") describing multiple hierarchies for at least one dimension of the data stored in the DBMS. In step 1103, this hierarchy descriptor data is extracted from the catalogue. A
loop (steps 1105-1119) is performed over the items in the multiple hierarchy described by the hierarchy descriptor data.
In the loop 1105-1119, a given item in the multiple hierarchy is selected (step 1107);
and, in step 1109, the parent(s) (if any) - including grandparents, great-grandparents, etc. - of the given item are identified and added to an entry (for the given item) in a parent list data structure, which is illustrated in Fig. 11C(v). Each entry in the parent list corresponds to a specific item and includes zero or more identifiers for items that are parents (or grandparents, or great-grandparents) of th6 specific item. In addition, an inner loop (steps 1111-1117) is performed over the hierarchies of the multiple hierarchies described by the hierarchy descriptor data, wherein in step 1113 one of the multiple hierarchies is selected. In step 1115, the child of the given item in the selected hierarchy (if any) is identified and added (if need be) to a group of identifiers in an entry (for the given item) in a child list data structure, which is illustrated in Fig. 11C(vi). Each entry in the child list corresponds to a specific item and includes zero or more groups of identifiers each identifying a child of the specific item. Each group corresponds to one or more of the hierarchies described by the hierarchy descriptor data.
The operation then continues to steps 1121 and 1123 as illustrated in Fig.
11C(ii) to verify the integrity of the multiple hierarchies described by the hierarchy descriptor data (step 1121) and fix (or report to the user) any errors discovered therein (step 1123). Preferably, the integrity of the multiple hierarchies is verified in step 1121 by iteratively expanding each group of identifiers in the child list to include the children, grandchildren, etc of any item listed in the group. If the child(ren) for each group for a specific item do not match, a verification error is encountered, and such error is fixed (or reported to the user (step 1123). The operation then proceeds to a loop (steps 1125 - 1133) over the items in the child list.
In the loop (steps 1125 - 1133), a given item in the child list is identified in step 1127.
In step 1129, the entry in the child list for the given item is examined to determine if the given item has no children (e.g., the corresponding entry is null). If so, the operation continues to step 1131 to add an entry for the item in level 0 of an ordered list data structure, which is illustrated in Fig. 11C(vii); otherwise the operation continues to process the next item of the child list in the loop. Each entry in a given level of the order list corresponds to a specific item and includes zero or more identifiers each identifying a child of the specific item. The levels of the order list described the transformed hierarchy as will readily become apparent in light of the following. Essentially, loop 1125-1333 builds the lowest level (level 0) of the transformed hierarchy.
After loop 1125-1133, operation continues to process the lowest level to derive the next higher level, and iterate over this process to build out the entire transformed hierarchy. More specifically, in step 1135, a "current level" variable is set to identify the lowest level. In step 1137, the items of the "current level" of the ordered list are copied to a work list. In step 1139, it is determined if the worklist is empty. If so, the operation ends;
otherwise operation continues to step 1141 wherein a loop (steps 1141 - 1159) is performed over the items in the work list.
In step 1143, a given item in the work list is identified and operation continues to an inner loop (steps 1145 - 1155) over the parent(s) of the given item (which are specified in the parent list entry for the given item). In step 1147 of the inner loop, a given parent of the given item is identified. In step 1149, it is determined whether any other parent (e.g., a parent other than the given patent) of the given item is a child of the given parent (as specified in the child list entry for the given parent). If so, operation continues to step 1155 to process the next parent of the given item in the inner loop; otherwise, operation continues to steps 1151 and 1153. In step 1151, an entry for the given parent is added to the next level (current level + 1) of the ordered list, if it does not exist there already. In step 1153, if no children of the given item (as specified in the entry for the given item in the current level of the ordered list) matches (e.g., is covered by) any child (or grandchild or great grandchild etc) of item(s) in the entry for the given parent in the next level of the ordered list, the given item is added to the entry for the given parent in the next level of the ordered list. Levels 1 and 2 of the ordered list for the example described above are shown in Figs. 11C(viii) and 11C(ix), respectively. The children 5 (including grandchildren and great grandchildren. etc) of an item in the entry for a given parent in the next level of the ordered list may be identified by the information encoded in the lower levels of the ordered list. After step 1153, operation continues to step 1155 to process the next parent of the given item in the inner loop (steps 1145 - 1155) 10 After processing the inner loop (steps 1145 - 1155), operation continues to step 1157 to delete the given item from the work list, and processing continues to step 1159 to process the next item of the work list in the loop (steps 1141 - 1159).
After processing the loop (steps 1141 - 1159), the ordered list (e.g., transformed 15 hierarchy) has been built for the next higher level. The operation continues to step 1161 to increment the current level to the next higher level, and operation returns (in step 1163) to step 1138 to build the next higher level , until the highest level is reached (determined in step 1139) and the operation ends.
20 Fig. 12 summarizes the components of an exemplary aggregation module that takes advantage of the hierarchy transformation technique described above. More specifically, the aggregation module includes an hierarchy transformation module that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies. A second module loads and indexes the base data supplied from the DBMS using 25 the optimal hierarchy generated by the hierarchy transformation module.
An aggregation engine performs aggregation operations on the base data. During the aggregation operations along the dimension specified by the optimal hierarchy, the results of the aggregation operations of the level 0 items may be used in the aggregation operations of the level 1 items, the results of the aggregation operations of the level 1 items may be used in the aggregation 30 operations of the level 2 items, etc. Based on these operations, the loading and indexing operations of the base data, along with the aggregation become very efficient, minimizing memory and storage access, and speeding up storing and retrieval operations.
Fig. 13 shows the stand-alone Aggregation Server of the present invention as a component of a central data warehouse, serving the data aggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAP systems and OLAP systems alike.
The reason for the central multidimensional database's rise to corporate necessity is that it facilitates flexible, high-performance access and analysis of large volumes of complex and interrelated data.
A stand-alone specialized aggregation server, simultaneously serving many different kinds of clients (e.g. data mart, OLAP, URL, RDBMS), has the power of delivering an enterprise-wide aggregation in a cost-effective way. This kind of server eliminates the roll-up redundancy over the group of clients, delivering scalability and flexibility.
Performance associated with central data warehouse is an important consideration in the overall approach. Performance includes aggregation times and query response.
Effective interactive query applications require near real-time performance, measured in seconds. These application performances translate directly into the aggregation requirements.
In the prior art, in case of MOLAP, a full pre-aggregation must be done before starting querying. In the present invention, in contrast to prior art, the query directed roll-up (QDR) allows instant querying, while the full pre-aggregation is done in the background. In cases a full pre-aggregation is preferred, the currently invented aggregation outperforms any prior art.
For the ROLAP and RDBMS clients, partial aggregations maximize query performance. In both cases fast aggregation process is imperative. The aggregation performance of the current invention is by orders of magnitude higher than that of the prior art.
The stand-alone scalable aggregation server of the present invention can be used in any MOLAP system environment for answering questions about corporate performance in a particular market, economic trends, consumer behaviors, weather conditions, population trends, or the state of any physical, social, biological or other system or phenomenon on which different types or categories of information, organizable in accordance with a predetermined dimensional hierarchy, are collected and stored within a RDBMS of one sort or another.
Regardless of the particular application selected, the address data mapping processes of the present invention will provide a quick and efficient way of managing a MDDB
and also enabling decision support capabilities utilizing the same in diverse application environments.
The stand-alone "cartridge-style" plug-in features of the data aggregation server of the present invention, provides freedom in designing an optimized multidimensional data structure and handling method for aggregation, provides freedom in designing a generic aggregation server matching all OLAP vendors, and enables enterprise-wide centralized aggregation.
The method of Segmented Aggregation employed in the aggregation server of the present invention provides flexibility, scalability, a condition for Query Directed Aggregation, and speed improvement.
The method of Multidimensional data organization and indexing employed in the aggregation server of the present invention provides fast storage and retrieval, a condition for Segmented Aggregation, improves the storing, handling, and retrieval of data in a fast manner, and contributes to structural flexibility to allow sliced aggregation and QDR.
It also enables the forwarding and single handling of data with improvements in speed performance.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention minimizes the data handling operations in multi-hierarchy data structures.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention eliminates the need to wait for full aggregation to be completed, and provides build-up aggregated data required for full aggregation.
In another aspect of the present invention, an improved DBMS system (e.g., RDBMS
system, object oriented database system or object/relational database system) is provided that excels in performing two distinct functions, namely: the aggregation of data;
and the handling of the resulting data for "on demand" client use. Moreover, because of improved data aggregation capabilities, the DBMS of the present invention can be employed in a wide range of applications, including Data Warehouses supporting OLAP systems and the like. For purposes of illustration, initial focus will be accorded to the DBMS of the present invention. Referring now to Figs. 19 through Figs. 21, the preferred embodiments of the method and system of the present invention will be now described in great detail herein below.
Through this document, the term "aggregation" and "pre-aggregation" shall be understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc. It shall be understood that pre-aggregation operations occur asynchronously with respect to the traditional query processing operations. Moreover, the term "atomic data" shall be understood to refer to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, atomic data may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch.
FIG. 19A illustrates the primary components of an illustrative embodiment of an DBMS of the present invention, namely: support mechanisms including a query interface and query handler; a relational data store including one or more tables storing at least the atomic data (and possibly summary tables) and a meta-data store for storing a dictionary (sometimes referred to as a catalogue or directory); and an MDD Aggregation Module that stores atomic data and aggregated data in a MDDB. The MDDB is a non-relational data structure-it uses other data structures, either instead of or in addition to tables-to store data.
For illustrative purposes, Fig. 19A illustrates an RDBMS wherein the relational data store includes fact tables and a dictionary.
It should be noted that the DBMS typically includes additional components (not shown) that are not relevant to the present invention. The query interface and query handler service user-submitted queries (in the preferred embodiment, SQL query statements) forwarded, for example, from a client machine over a network as shown. The query handler and relational data store (tables and meta-data store) are operably coupled to the MDD
Aggregation Module. Importantly, the query handler and integrated MDD
Aggregation Module operate to provide for dramatically improved query response times for data aggregation operations and drill-downs. Moreover, illustrative embodiments may make user-querying of the non-relational MDDB no different than querying a relational table of the DBMS, in a manner that minimizes the delays associated with queries that involve aggregation or drill down operations. This is enabled by providing the novel DBMS system and integrated aggregation mechanism of the present invention.
FIG. 19B shows the primary components of an illustrative embodiment of the MDD
Aggregation Module of FIG. 19A, namely: a base data loader for loading the directory and table(s) of relational data store of the DBMS; an aggregation engine for receiving dimension data and atomic data from the base loader, a multi-dimensional database (MDDB); a MDDB handler and an SQL handler that operate cooperatively with the query handler of the DBMS to provide users with query access to the MDD Aggregation Module, and a control module for managing the operation of the components of the MDD
aggregation module. The base data loader may load the directory and table(s) of the relational data store over a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.). In this case, the DBMS and base data loader include components that provide communication of such data over these standard interfaces. Such interface components are well known in the art. For example, such interface components are readily available from Attunity Corporation, http://www.attunity.com.
During operation, base data originates from the table(s) of the DBMS. The core data aggregation operations are performed by the Aggregation Engine; a Multidimensional Data (MDDB) Handler; and a Multidimensional Data Storage (MDDB). The results of data aggregation are efficiently stored in the MDDB by the MDDB Handler. The SQL
handler of the MDD Aggregation module services user-submitted queries (in the preferred embodiment, SQL query statements) forwarded from the query handler of the DBMS. The SQL handler of the MDD Aggregation module may communicate with the query handler of the DBMS over a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.). In this case, the support mechanisms of the RDBMS and SQL handler include components that provide communication of such data over these standard interfaces. Such interface components are well known in the art. Aggregation (or drill down results) are retrieved on demand and returned to the user.
Typically, a user interacts with a client machine (for example, using a web-enabled browser) to generate a natural language query, that is communicated to the query interface of the DBMS, for example over a network as shown. The query interface disintegrates the query, via parsing, into a series of requests (in the preferred embodiment, SQL statements) that are communicated to the query handler of the DBMS. It should be noted that the functions of the query interface may be implemented in a module that is not part of the DBMS (for example, in the client machine). The query handler of the DBMS
forwards requests that involve data stored in the MDD of the MDD Aggregation module to the SQL
handler of the MDD Aggregation module for servicing. Each request specifies a set of n-dimensions. The SQL handler of the MDD Aggregation Module extracts this set of dimensions and operates cooperatively with the MDD handler to address the MDDB
using the set of dimensions, retrieve the addressed data from the MDDB, and return the results to 5 the user via the query handler of the DBMS.
Fig. 19C(i) and 19C(ii) is a flow chart illustrating the operations of an illustrative DBMS of the present invention. In step 601, the base data loader of the MDD
Aggregation Module loads the dictionary (or catalog) from the meta-data store of the DBMS.
In performing this function, the base data loader may utilize an adapter (interface) that maps 10 the data types of the dictionary of the DBMS (or that maps a standard data type used to represent the dictionary of the DBMS) into the data types used in the MDD
aggregation module. In addition, the base data loader extracts the dimensions from the dictionary and forwards the dimensions to the aggregation engine of the MDD Aggregation Module.
In step 603, the base data loader loads table(s) from the DBMS. In performing this 15 function, the base data loader may utilize an adapter (interface) that maps the data types of the table(s) of the DBMS (or that maps a standard data type used to represent the fact table(s) of the DBMS) into the data types used in the MDD Aggregation Module.
In addition, the base data loader extracts the atomic data from the table(s), and forwards the atomic data to the aggregation engine.
20 In step 605, the aggregation engine performs aggregation operations (i.e., roll-up operation) on the atomic data (provided by the base data loader in step 603) along at least one of the dimensions (extracted from the dictionary of the DBMS in step 601) and operates cooperatively with the MDD handler to store the resultant aggregated data in the MDDB. A more detailed description of exemplary aggregation operations according to a 25 preferred embodiment of the present invention is set forth below with respect to the QDR
process of Figs. 9A-9C.
In step 607, a reference is defined that provides users with the ability to query the data generated by the MDD Aggregation Module and/or stored in the MDDB of the MDD
Aggregation Module. This reference is preferably defined using the Create View SQL
30 statement, which allows the user to: i) define a table name (TN) associated with the MDDB
stored in the MDD Aggregation Module, and ii) define a link used to route SQL
statements on the table TN to the MDD Aggregation Module. In this embodiment, the view mechanism of the DBMS enables reference and linking to the data stored in the MDDB of the MDD Aggregation Engine as illustrated in FIG. 6(E). A more detailed description of the view mechanism and the Create View SQL statement may be found in C. J.
Date, "An Introduction to Database Systems," Addison-Wesley, Seventh Edition, 2000, pp.
289-326.
Thus, the view mechanism enables the query handler of the DBMS system to forward any SQL query on table TN to the MDD aggregation module via the associated link.
In an alternative embodiment, a direct mechanism (e.g., NA trigger mechanism) may be used to enable the DBMS system to reference and link to the data generated by the MDD
Aggregation Module and/or stored in the MDDB of the MDD Aggregation Engine as illustrated in FIG. 6F. A more detailed description of trigger mechanisms and methods may be found in C. J. Date, "An Introduction to Database Systems," Addison-Wesley, Seventh Edition, 2000, pp. 250, 266.
In step 609, a user interacts with a client machine to generate a query, and the query is communicated to the query interface. The query interface generate one or more SQL
statements. These SQL statements may refer to data stored in tables of the relational datastore, or may refer to the reference defined in step 607 (this reference refers to the data stored in the MDDB of the MDD Aggregation Module). These SQL statement(s) are forwarded to the query handler of the DBMS.
In step 611, the query handler receives the SQL statement(s); and optionally transforms such SQL statement(s) to optimize the SQL statement(s) for more efficient query handling. Such transformations are well known in the art. For example, see Kimball, "Aggregation Navigation With (Almost) No MetaData", DBMS Data Warehouse Supplement, August 1996, available at http://www.dbmsmag.com/9608d54.html.
In step 613: the query handler determines whether the received SQL
statement(s) [or transformed SQL statement(s)] is on the reference generated in step 607.
If so, operation continues to step 615; otherwise normal query handling operations continue in step 625 wherein the relational datastore is accessed to extract, store, and/or manipulate the data stored therein as directed by the query, and results are returned back to the user via the client machine, if needed.
In step 615, the received SQL statement(s) [or transformed SQL statement(s)]
is routed to the MDD aggregation engine for processing in step 617 using the link for the reference as described above with respect to step 607.
46a In step 617, the SQL statement(s) is received by the SQL handler of the MDD
Aggregation Module, wherein a set of one or more N-dimensional coordinates are extracted from the SQL statement. In performing this function, SQL handler may utilize an adapter (interface) that maps the data types of the SQL statement issued by query handler of the DBMS (or that maps a standard data type used to represent the SQL statement issued by query handler of the DBMS) into the data types used in the MDD aggregation module.
In step 619, the set of N-dimensional coordinates extracted in step 617 are used by the MDD handler to address the MDDB and retrieve the corresponding data from the MDDB.
Finally, in step 621, the retrieved data is returned to the user via the DBMS
(for example, by forwarding the retrieved data to the SQL handler, which returns the retrieved data to the query handler of the DBMS system, which returns the results of the user-submitted query to the user via the client machine), and the operation ends.
It should be noted that the table data (base data), as it arrives from DBMS, may be analyzed and reordered to optimize hierarchy handling, according to the unique method of the present invention, as described above with reference to Figs. 11A, 11B and 11C.
Moreover, the MDD control module of the MDD Aggregation Module preferably administers the aggregation process according to the method illustrated in Figs. 9A and 9B. Thus, in accordance with the principles of the present invention, data aggregation within the DBMS can be carried out either as a complete pre-aggregation process, where the base data is fully aggregated before commencing querying, or as a query directed roll-up (QDR) process, where querying is allowed at any stage of aggregation using the "on-the-fly" data aggregation process of the present invention. The QDR process will be described hereinafter in greater detail with reference to Fig.
9C. The response to a request (i.e. a basic component of a client query) requiring "on-the-fly" data aggregation, or requiring access to pre-aggregated result data via the MDD
handler is provided by a 2,5 query/request serving mechanism of the present invention within the MDD
control module, the primary operations of which are illustrated in the flow chart of Fig. 6D. The function of the MDD
Handler is to handle multidimensional data in the storage(s) module in a very efficient way, according to the novel method of the present invention, which will be described in detail hereinafter with reference to Figs. 10A and 10B.
The SQL handling mechanism shown in Fig. 6D is controlled by the MDD control module.
Requests are queued and served one by one. If the required data is already pre-calculated, then it is retrieved by the MDD handler and returned to the client. Otherwise, the required data is calculated "on-the-fly" by the aggregation engine, and the result moved out to the client, while simultaneously stored by the MDD handler, shown in Fig. 6C.
As illustrated in Fig. 19G, the DBMS of the present invention as described above may be logically partitioned into a relational part and a non-relational part. The relational part includes the relational datastore (e.g., table(s) and dictionary) and support mechanisms (e.g., query handling services). The non-relational part includes the MDD Aggregation Module. As described above, bi-directional data flow occurs between the relational part and the non-relational part as shown.
More specifically, during data load operations, data is loaded from the relational part (i.e., the relational datastore) into the non-relational part, wherein it is aggregated and stored in the MDDB.
And during query servicing operations, when a given query references data stored in the MDDB, data pertaining to the query is generated by the non-relational part (e.g., generated and/or retrieved from the MDDB) and supplied to the relational part (e.g., query servicing mechanism) for communication back to the user. Such bi-directional data flow represents an important distinguishing feature with respect to the prior art. For example, in the prior art MOLAP
architecture as illustrated in Fig. 1B, unidirectional data flows occurs from the relational data base (e.g., the Data Warehouse RDBMS system) into the MDDB during data loading operations.
Figs. 20A and 20B outline two different implementations of the DBMS of the present invention. In both implementations, the query handler of the DBMS system supplies aggregated results retrieved from the MDD to a client.
Fig. 20A shows a separate-platform implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational part of the DBMS reside on a separate hardware platform and/or OS system from that used to run the non-relational part (MDD
Aggregation Module). In this type of implementation, it is even possible to run parts of the DBMS
system and the MDD Aggregation Module on different-type operating systems (e.g. NT, Unix, MAC OS).
Fig. 20B shows a common-platform implementation of the DBMS system of the illustrative embodiment shown in Fig. 20A, wherein the relational part of the DBMS share the same hardware platform and operating system (OS) that is used to run the non-relational part (MDD Aggregation Module).
Fig. 21 shows the improved DBMS (e.g., RDBMS) of the present invention as a component of a data warehouse, serving the data storage and aggregation needs of a ROLAP
system (or other OLAP systems alike). Importantly, the improved DBMS of the present invention provides flexible, high-performance access and analysis of large volumes of complex and interrelated data.
Moreover, the improved Data Warehouse DBMS of the present invention can simultaneously serve many different kinds of clients (e.g. data mart, OLAP, URL) and has the power of delivering an enterprise-wide data storage and aggregation in a cost-effective way. This kind of system eliminates redundancy over the group of clients, delivering scalability and flexibility. Moreover, the improved DBMS of the present invention can be used as the data store component of in any informational database system as described above, including data analysis programs such as spread-sheet modeling programs, serving the data storage and aggregation needs of such systems.
Fig. 22 shows an embodiment of the present invention wherein the DBMS (e.g., RDBMS) of the present invention is a component of a data warehouse - OLAP system. The DBMS operates as a traditional data warehouse, serving the data storage and aggregation needs of an enterprise. In addition, the DBMS includes integrated OLAP Analysis Logic (and preferably an integrated Presentation Module not shown) that operates cooperatively with the query handling of the DBMS
system and the MDD Aggregation Module to enable users of the DBMS system to execute multidimensional reports (e.g., ratios, ranks, transforms, dynamic consolidation, complex filtering, forecasts, query governing, scheduling, flow control, pre-aggregate inferencing, denormalization support, and/or table partitioning and joins) and preferably perform traditional OLAP analyses (grids, graphs, maps, alerts, drill-down, data pivot, data surf, slice and dice, print). Importantly, the improved DBMS of the present invention provides flexible, high-performance access and analysis of large volumes of complex and interrelated data. Moreover, the improved DBMS of the present invention can simultaneously serve many different kinds of clients (e.g. data mart, other OLAP systems, URL-Directory Systems) and has the power of delivering enterprise-wide data storage and aggregation and OLAP analysis in a cost-effective way. This kind of system eliminates redundancy over the group of clients, delivering scalability and flexibility.
Moreover, the improved DBMS of the present invention can be used as the data store component of in any informational database system as described above, serving the data storage and aggregation needs of such systems.
Functional Advantages Gained By The Improved DBMS Of The Present Invention The features of the DBMS of the present invention, provides for dramatically improved response time in handling queries issued to the DBMS that involve aggregation, thus enabling enterprise-wide centralized aggregation. Moreover, in the preferred embodiment of the present invention, users can query the aggregated data in an manner no different than traditional queries on 5 the DBMS.
The method of Segmented Aggregation employed by the novel DBMS of the present invention provides flexibility, scalability, the capability of Query Directed Aggregation, and speed improvement.
Moreover, the method of Query Directed Aggregation (QDR) employed by the novel DBMS of the present invention minimizes the data handling operations in multi-hierarchy data structures, eliminates the need to wait for full aggregation to be complete, and provides for build-up of aggregated data required for full aggregation.
It is understood that the System and Method of the illustrative embodiments described herein above may be modified in a variety of ways which will become readily apparent to those skilled in the art of having the benefit of the novel teachings disclosed herein. All such modifications and variations of the illustrative embodiments thereof shall be deemed to be within the scope and spirit of the present invention as defined by the Claims to Invention appended hereto.
Fig. 6A illustrates a generalized embodiment of a multidimensional on-line analytical 30 processing (MOLAP) system of the present invention comprising: a Data Warehouse realized as a relational database; a stand-alone cartridge-style Aggregation Server of the present invention having an integrated aggregation engine and a MDDB; and an OLAP
server communicating with the Aggregation Server, and supporting a plurality of OLAP
clients. In accordance with the principles of the present invention, the stand-alone Aggregation Server performs aggregation functions (e.g. summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.) and multi-dimensional data storage functions.
Departing from conventional practices, the principles of the present invention teaches moving the aggregation engine and the MDDB into a separate Aggregation Server having standardized interfaces so that it can be plugged-into the OLAP server of virtually any user or vendor. This dramatic move discontinues the restricting dependency of aggregation from the analytical functions of OLAP, and by applying novel and independent algorithms. The stand-alone data aggregation server enables efficient organization and handling of data, fast aggregation processing, and fast access to and retrieval of any data element in the MDDB.
As will be described in greater detail hereinafter, the Aggregation Server of the present invention can serve the data aggregation requirements of other types of systems besides OLAP
systems such as, for example, URL directory management Data Marts, RDBMS, or ROLAP
systems.
The Aggregation Server of the present invention excels in performing two distinct functions, namely: the aggregation of data in the MDDB; and the handling of the resulting data base in the MDDB, for "on demand" client use. In the case of serving an OLAP
server, the Aggregation Server of the present invention focuses on performing these two functions in a high performance manner (i.e. aggregating and storing base data, originated at the Data Warehouse, in a multidimensional storage (MDDB), and providing the results of this data aggregation process "on demand" to the clients, such as the OLAP server, spreadsheet applications, the end user applications. As such, the Aggregation Server of the present invention frees each conventional OLAP server, with which it interfaces, from the need of making data aggregations, and therefore allows the conventional OLAP server to concentrate on the primary functions of OLAP servers, namely: data analysis and supporting a graphical interface with the user client.
Fig. 6B shows the primary components of the stand-alone Aggregation Server of the illustrative embodiment, namely: a base data interface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receiving RDBMS flat files lists and other files from the Data Warehouse (RDBMS), a base data loader for receiving base data from the base data interface, configuration manager for managing the operation of the base data interface and base data loader, an aggregation engine for receiving base data from the base loader, a multi-dimensional database (MDDB); a MDDB handler, an input analyzer, an aggregation client interface (e.g. OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.) and a configuration manager for managing the operation of the input analyzer and the aggregation client interface.
During operation, the base data originates at data warehouse or other sources, such as external ASCII files, MOLAP server, or others. The Configuration Manager, in order to enable proper communication with all possible sources and data structures, configures two blocks, the Base Data Interface and Data Loader. Their configuration is matched with different standards such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.
As shown in Fig. 6B, the core of the data Aggregation Server of the present invention comprises: a data Aggregation Engine; a Multidimensional Data Handler (MDDB Handler); and a Multidimensional Data Storage (MDDB). The results of data aggregation are efficiently stored in the MDDB by the MDDB Handler.
As shown in Figs. 6A and 6B, the stand-alone Aggregation Server of the present invention serves the OLAP Server (or other requesting computing system) via an aggregation client interface, which preferably conforms to standard interface protocols such as OLDB, OLE-DB, ODBC, SQL, JDBC, an API, etc. Aggregation results required by the OLAP server are supplied on demand. Typically, the OLAP Server disintegrates the query, via parsing process, into series of requests. Each such request, specifying a n-dimensional coordinate, is presented to the Aggregation Server. The Configuration Manager sets the Aggregation Client Interface and Input Analyzer for a proper communication protocol according to the client user. The Input Analyzer converts the input format to make it suitable for the MDDB Handler.
Illustrative embodiments may make the transfer of data completely transparent to the OLAP user, in a manner which is equivalent to the storing of data in the MOLAP
server's cache and without any query delays. This requires that the stand-alone Aggregation Server have exceptionally fast response characteristics. This is enabled by providing the unique data structure and aggregation mechanism of the present invention.
Fig. 6C shows the software modules comprising the aggregation engine and MDDB
handler components of the stand-alone Aggregation Server of the illustrative embodiment.
The base data list, as it arrives from RDBMS or text files, has to be analyzed and reordered to optimize hierarchy handling, according to the unique method of the present invention, as described later with reference to Figs. 11A and 11B.
The function of the aggregation management module is to administrate the aggregation process according to the method illustrated in Figs. 9A and 9B.
In accordance with the principles of the present invention, data aggregation within the stand-alone Aggregation Server can be carried out either as a complete pre-aggregation process, where the base data is fully aggregated before commencing querying, or as a query directed roll-up (QDR) process, where querying is allowed at any stage of aggregation using the "on-the-fly" data aggregation process of the present invention. The QDR
process will be described hereinafter in greater detail with reference to Fig. 9C. The response to a request (i.e.
a basic component of a client query), by calling the Aggregation management module for "on-the-fly" data aggregation, or for accessing pre-aggregated result data via the Storage management module. The query/request serving mechanism of the present invention within the QDR aggregation management module is illustrated in the flow chart of Fig. 6D.
The function of the Storage management module is to handle multidimensional data in the storage(s) module in a very efficient way, according to the novel method of the present invention, which will be described in detail hereinafter with reference to Figs. 10A and 10B.
The request serving mechanism shown in Fig. 6D is controlled by the QDR
aggregation management module. Requests are queued and served one by one. If the required data is already pre-calculated, then it is retrieved by the storage management module and returned to the client. Otherwise, the required data is calculated "on-the-fly" by the aggregation management module, and the result moved out to the client, while simultaneously stored by the storage management module, shown in Fig. 6C.
Figs. 7A and 7B outline two different implementations of the stand-alone (cartridge-style) Aggregation Server of the present invention. In both implementations, the Aggregation Server supplies aggregated results to a client.
Fig. 7A shows a separate-platform type implementation of the MOLAP system of the illustrative embodiment shown in Fig. 6A, wherein the Aggregation Server of the present invention resides on a separate hardware platform and OS system from that used to run the OLAP server. In this type of implementation, it is even possible to run the Aggregation Server and the OLAP Server on different-type operating systems (e.g. NT, Unix, MAC
OS).
Fig. 7B shows a common-platform type implementation of the MOLAP system of the illustrative embodiment shown in Fig. 6B, wherein the Aggregation Server of the present invention and OLAP Server share the same hardware platform and operating system (OS).
Fig. 8A shows a table setting forth the benchmark results of an aggregation engine, implemented on a shared/common hardware platform and OS, in accordance with the principles of the present invention. The common platform and OS is realized using a Pentium II 450Mhz, 1GB RAM, 18GB Disk, running the Microsoft NT operating system. The six (6) data sets shown in the table differ in number of dimensions, number of hierarchies, measure of sparcity and data size. A comparison with ORACLE Express, a major OLAP server, is made.
It is evident that the aggregation engine of the present invention outperforms currently leading aggregation technology by more than an order of magnitude.
The segmented data aggregation method of the present invention is described in Figs.
9A through 9C2. These figures outline a simplified setting of three dimensions only; however, the following analysis applies to any number of dimensions as well.
The data is being divided into autonomic segments to minimize the amount of simultaneously handled data. The initial aggregation is practiced on a single dimension only, while later on the aggregation process involves all other dimensions.
At the first stage of the aggregation method, an aggregation is performed along dimension 1. The first stage can be performed on more than one dimension. As shown in Fig.
9A, the space of the base data is expanded by the aggregation process.
In the next stage shown in Fig. 9B, any segment along dimension 1, such as the shown slice, can be separately aggregated along the remaining dimensions, 2 and 3.
In general, for an N dimensional system, the second stage involves aggregation in N-1 dimensions.
5 The principle of data segmentation can be applied on the first stage as well. However, only a large enough data set will justify such a sliced procedure in the first dimension.
Actually, it is possible to consider each segment as an N-1 cube, enabling recursive computation.
10 It is imperative to get aggregation results of a specific slice before the entire aggregation is completed, or alternatively, to have the roll-up done in a particular sequence.
This novel feature of the aggregation method of the present invention is that it allows the querying to begin, even before the regular aggregation process is accomplished, and still having fast response. Moreover, in relational OLAP and other systems requiring only partial 15 aggregations, the QDR process dramatically speeds up the query response.
The QDR process is made feasible by the slice-oriented roll-up method of the present invention. After aggregating the first dimension(s), the multidimensional space is composed of independent multidimensional cubes (slices). These cubes can be processed in any arbitrary 20 sequence.
Consequently the aggregation process of the present invention can be monitored by means of files, shared memory sockets, or queues to statically or dynamically set the roll-up order.
In order to satisfy a single query coming from a client, before the required aggregation result has been prepared, the QDR process of the present invention involves performing a fast on-the-fly aggregation (roll-up) involving only a thin slice of the multidimensional data.
Fig. 9C1 shows a slice required for building-up a roll-up result of the 2nd dimension. In case 1, as shown, the aggregation starts from an existing data, either basic or previously aggregated in the first dimension. This data is utilized as a basis for QDR
aggregation along the second dimension. In case 2, due to lack of previous data, a QDR involves an initial slice aggregation along dimension 3, and thereafter aggregation along the 211c1 dimension.
Fig. 9C2 shows two corresponding QDR cases for gaining results in the 3d dimension.
Cases 1 and 2 differ in the amount of initial aggregation required in 2"
dimension.
Fig. 10A illustrates the "Slice-Storage" method of storing sparse data on storage disks.
In general, this data storage method is based on the principle that an ascending-ordered index along aggregation direction, enables fast retrieval of data. Fig. 10A
illustrates a unit-wide slice of the multidimensional cube of data. Since the data is sparse, only few non-NA data points exist. These points are indexed as follows. The Data File consists of data records, in which each n-I dimensional slice is being stored, in a separate record. These records have a varying length, according to the amount of non-NA stored points. For each registered point in the record, /NDk stands for an index in a n-dimensional cube, and Data stands for the value of a given point in the cube.
Fig. 10B illustrates a novel method for randomly searching for a queried data point in the MDDB of Fig. 6B by using a novel technique of organizing data files and the directory file used in the storages of the MDDB, so that a simple binary search technique can then be employed within the Aggregation Server of the prsent invention. According to this method, a metafile termed DIR File, keeps pointers to Data Files as well as additional parameters such as the start and end addresses of data record (/NDo, /ND) , its location within the Data File, record size (n), file's physical address on disk (D Path), and auxiliary information on the record (Flags).
A search for a queried data point is then performed by an access to the DIR
file. The search along the file can be made using a simple binary search due to file's ascending order.
When the record is found, it is then loaded into main memory to search for the required point, characterized by its index /NDk. The attached Data field represents the queried value. In case the exact index is not found, it means that the point is a NA.
In another aspect of the present invention, a novel method is provided for optimally merging multiple hierarchies in multi-hierarchical structures. The method, illustrated in Figs.
11A, 11B, and 11C is preferably used by the Aggregation Server of the present invention in processing the table data (base data), as it arrives from RDBMS.
According to the devised method, the inner order of hierarchies within a dimension is optimized, to achieve efficient data handling for summations and other mathematical formulas (termed in general "Aggregation"). The order of hierarchy is defined externally. It is brought from a data source to the stand-alone aggregation engine, as a descriptor of data, before the data itself In the illustrative embodiment, the method assumes hierarchical relations of the data, as shown in Fig. 11A. The way data items are ordered in the memory space of the Aggregation Server, with regard to the hierarchy, has a significant impact on its data handling efficiency.
Notably, when using prior art techniques, multiple handling of data elements, which occurs when a data element is accessed more than once during aggregation process, has been hitherto unavoidable when the main concern is to effectively handle the sparse data. The data structures used in prior art data handling methods have been designed for fast access to a non NA data. According to prior art techniques, each access is associated with a timely search and retrieval in the data structure. For the massive amount of data typically accessed from a Data Warehouse in an OLAP application, such multiple handling of data elements has significantly degraded the efficiency of prior art data aggregation processes. When using prior art data handling techniques, the data element D shown in Fig. 11A must be accessed three times, causing poor aggregation performance.
In accordance with the data handling method of the present invention, the data is being pre-ordered for a singular handling, as opposed to multiple handling taught by prior art methods. According to the present invention, elements of base data and their aggregated results are contiguously stored in a way that each element will be accessed only once. This particular order allows a forward-only handling, never backward. Once a base data element is stored, or aggregated result is generated and stored, it is never to be retrieved again for further aggregation. As a result the storage access is minimized. This way of singular handling greatly elevates the aggregation efficiency of large data bases. An efficient handling method as used in the present invention, is shown in Fig. 7A. The data element D, as any other element, is accessed and handled only once.
Fig. 11A shows an example of a multi-hierarchical database structure having 3 hierarchies. As shown, the base data has a dimension that includes items A,B,F, and G., The second level is composed of items C,E,H and I. The third level has a single item D, which is common to all three hierarchical structures. In accordance with the method of the present invention, a minimal computing path is always taken. For example, according to the method of the present invention, item D will be calculated as part of structure 1, requiring two mathematical operations only, rather than as in structure 3, which would need four mathematical operations. Fig. 11B depicts an optimized structure merged from all three hierarchies.
Fig. 11C(i) through 11C(ix) represent a flow chart description (and accompanying data structures) of the operations of an exemplary hierarchy transformation mechanism of the present invention that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies. For the sake of description, the data structures correspond to exemplary hierarchical structures described above with respect to Figs.
11(A) and 11(B). As illustrated in Fig. 11C(i), in step 1101, a catalogue is loaded from the DBMS system. As is conventional, the catalogue includes data ("hierarchy descriptor data") describing multiple hierarchies for at least one dimension of the data stored in the DBMS. In step 1103, this hierarchy descriptor data is extracted from the catalogue. A
loop (steps 1105-1119) is performed over the items in the multiple hierarchy described by the hierarchy descriptor data.
In the loop 1105-1119, a given item in the multiple hierarchy is selected (step 1107);
and, in step 1109, the parent(s) (if any) - including grandparents, great-grandparents, etc. - of the given item are identified and added to an entry (for the given item) in a parent list data structure, which is illustrated in Fig. 11C(v). Each entry in the parent list corresponds to a specific item and includes zero or more identifiers for items that are parents (or grandparents, or great-grandparents) of th6 specific item. In addition, an inner loop (steps 1111-1117) is performed over the hierarchies of the multiple hierarchies described by the hierarchy descriptor data, wherein in step 1113 one of the multiple hierarchies is selected. In step 1115, the child of the given item in the selected hierarchy (if any) is identified and added (if need be) to a group of identifiers in an entry (for the given item) in a child list data structure, which is illustrated in Fig. 11C(vi). Each entry in the child list corresponds to a specific item and includes zero or more groups of identifiers each identifying a child of the specific item. Each group corresponds to one or more of the hierarchies described by the hierarchy descriptor data.
The operation then continues to steps 1121 and 1123 as illustrated in Fig.
11C(ii) to verify the integrity of the multiple hierarchies described by the hierarchy descriptor data (step 1121) and fix (or report to the user) any errors discovered therein (step 1123). Preferably, the integrity of the multiple hierarchies is verified in step 1121 by iteratively expanding each group of identifiers in the child list to include the children, grandchildren, etc of any item listed in the group. If the child(ren) for each group for a specific item do not match, a verification error is encountered, and such error is fixed (or reported to the user (step 1123). The operation then proceeds to a loop (steps 1125 - 1133) over the items in the child list.
In the loop (steps 1125 - 1133), a given item in the child list is identified in step 1127.
In step 1129, the entry in the child list for the given item is examined to determine if the given item has no children (e.g., the corresponding entry is null). If so, the operation continues to step 1131 to add an entry for the item in level 0 of an ordered list data structure, which is illustrated in Fig. 11C(vii); otherwise the operation continues to process the next item of the child list in the loop. Each entry in a given level of the order list corresponds to a specific item and includes zero or more identifiers each identifying a child of the specific item. The levels of the order list described the transformed hierarchy as will readily become apparent in light of the following. Essentially, loop 1125-1333 builds the lowest level (level 0) of the transformed hierarchy.
After loop 1125-1133, operation continues to process the lowest level to derive the next higher level, and iterate over this process to build out the entire transformed hierarchy. More specifically, in step 1135, a "current level" variable is set to identify the lowest level. In step 1137, the items of the "current level" of the ordered list are copied to a work list. In step 1139, it is determined if the worklist is empty. If so, the operation ends;
otherwise operation continues to step 1141 wherein a loop (steps 1141 - 1159) is performed over the items in the work list.
In step 1143, a given item in the work list is identified and operation continues to an inner loop (steps 1145 - 1155) over the parent(s) of the given item (which are specified in the parent list entry for the given item). In step 1147 of the inner loop, a given parent of the given item is identified. In step 1149, it is determined whether any other parent (e.g., a parent other than the given patent) of the given item is a child of the given parent (as specified in the child list entry for the given parent). If so, operation continues to step 1155 to process the next parent of the given item in the inner loop; otherwise, operation continues to steps 1151 and 1153. In step 1151, an entry for the given parent is added to the next level (current level + 1) of the ordered list, if it does not exist there already. In step 1153, if no children of the given item (as specified in the entry for the given item in the current level of the ordered list) matches (e.g., is covered by) any child (or grandchild or great grandchild etc) of item(s) in the entry for the given parent in the next level of the ordered list, the given item is added to the entry for the given parent in the next level of the ordered list. Levels 1 and 2 of the ordered list for the example described above are shown in Figs. 11C(viii) and 11C(ix), respectively. The children 5 (including grandchildren and great grandchildren. etc) of an item in the entry for a given parent in the next level of the ordered list may be identified by the information encoded in the lower levels of the ordered list. After step 1153, operation continues to step 1155 to process the next parent of the given item in the inner loop (steps 1145 - 1155) 10 After processing the inner loop (steps 1145 - 1155), operation continues to step 1157 to delete the given item from the work list, and processing continues to step 1159 to process the next item of the work list in the loop (steps 1141 - 1159).
After processing the loop (steps 1141 - 1159), the ordered list (e.g., transformed 15 hierarchy) has been built for the next higher level. The operation continues to step 1161 to increment the current level to the next higher level, and operation returns (in step 1163) to step 1138 to build the next higher level , until the highest level is reached (determined in step 1139) and the operation ends.
20 Fig. 12 summarizes the components of an exemplary aggregation module that takes advantage of the hierarchy transformation technique described above. More specifically, the aggregation module includes an hierarchy transformation module that optimally merges multiple hierarchies into a single hierarchy that is functionally equivalent to the multiple hierarchies. A second module loads and indexes the base data supplied from the DBMS using 25 the optimal hierarchy generated by the hierarchy transformation module.
An aggregation engine performs aggregation operations on the base data. During the aggregation operations along the dimension specified by the optimal hierarchy, the results of the aggregation operations of the level 0 items may be used in the aggregation operations of the level 1 items, the results of the aggregation operations of the level 1 items may be used in the aggregation 30 operations of the level 2 items, etc. Based on these operations, the loading and indexing operations of the base data, along with the aggregation become very efficient, minimizing memory and storage access, and speeding up storing and retrieval operations.
Fig. 13 shows the stand-alone Aggregation Server of the present invention as a component of a central data warehouse, serving the data aggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAP systems and OLAP systems alike.
The reason for the central multidimensional database's rise to corporate necessity is that it facilitates flexible, high-performance access and analysis of large volumes of complex and interrelated data.
A stand-alone specialized aggregation server, simultaneously serving many different kinds of clients (e.g. data mart, OLAP, URL, RDBMS), has the power of delivering an enterprise-wide aggregation in a cost-effective way. This kind of server eliminates the roll-up redundancy over the group of clients, delivering scalability and flexibility.
Performance associated with central data warehouse is an important consideration in the overall approach. Performance includes aggregation times and query response.
Effective interactive query applications require near real-time performance, measured in seconds. These application performances translate directly into the aggregation requirements.
In the prior art, in case of MOLAP, a full pre-aggregation must be done before starting querying. In the present invention, in contrast to prior art, the query directed roll-up (QDR) allows instant querying, while the full pre-aggregation is done in the background. In cases a full pre-aggregation is preferred, the currently invented aggregation outperforms any prior art.
For the ROLAP and RDBMS clients, partial aggregations maximize query performance. In both cases fast aggregation process is imperative. The aggregation performance of the current invention is by orders of magnitude higher than that of the prior art.
The stand-alone scalable aggregation server of the present invention can be used in any MOLAP system environment for answering questions about corporate performance in a particular market, economic trends, consumer behaviors, weather conditions, population trends, or the state of any physical, social, biological or other system or phenomenon on which different types or categories of information, organizable in accordance with a predetermined dimensional hierarchy, are collected and stored within a RDBMS of one sort or another.
Regardless of the particular application selected, the address data mapping processes of the present invention will provide a quick and efficient way of managing a MDDB
and also enabling decision support capabilities utilizing the same in diverse application environments.
The stand-alone "cartridge-style" plug-in features of the data aggregation server of the present invention, provides freedom in designing an optimized multidimensional data structure and handling method for aggregation, provides freedom in designing a generic aggregation server matching all OLAP vendors, and enables enterprise-wide centralized aggregation.
The method of Segmented Aggregation employed in the aggregation server of the present invention provides flexibility, scalability, a condition for Query Directed Aggregation, and speed improvement.
The method of Multidimensional data organization and indexing employed in the aggregation server of the present invention provides fast storage and retrieval, a condition for Segmented Aggregation, improves the storing, handling, and retrieval of data in a fast manner, and contributes to structural flexibility to allow sliced aggregation and QDR.
It also enables the forwarding and single handling of data with improvements in speed performance.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention minimizes the data handling operations in multi-hierarchy data structures.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention eliminates the need to wait for full aggregation to be completed, and provides build-up aggregated data required for full aggregation.
In another aspect of the present invention, an improved DBMS system (e.g., RDBMS
system, object oriented database system or object/relational database system) is provided that excels in performing two distinct functions, namely: the aggregation of data;
and the handling of the resulting data for "on demand" client use. Moreover, because of improved data aggregation capabilities, the DBMS of the present invention can be employed in a wide range of applications, including Data Warehouses supporting OLAP systems and the like. For purposes of illustration, initial focus will be accorded to the DBMS of the present invention. Referring now to Figs. 19 through Figs. 21, the preferred embodiments of the method and system of the present invention will be now described in great detail herein below.
Through this document, the term "aggregation" and "pre-aggregation" shall be understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc. It shall be understood that pre-aggregation operations occur asynchronously with respect to the traditional query processing operations. Moreover, the term "atomic data" shall be understood to refer to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, atomic data may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch.
FIG. 19A illustrates the primary components of an illustrative embodiment of an DBMS of the present invention, namely: support mechanisms including a query interface and query handler; a relational data store including one or more tables storing at least the atomic data (and possibly summary tables) and a meta-data store for storing a dictionary (sometimes referred to as a catalogue or directory); and an MDD Aggregation Module that stores atomic data and aggregated data in a MDDB. The MDDB is a non-relational data structure-it uses other data structures, either instead of or in addition to tables-to store data.
For illustrative purposes, Fig. 19A illustrates an RDBMS wherein the relational data store includes fact tables and a dictionary.
It should be noted that the DBMS typically includes additional components (not shown) that are not relevant to the present invention. The query interface and query handler service user-submitted queries (in the preferred embodiment, SQL query statements) forwarded, for example, from a client machine over a network as shown. The query handler and relational data store (tables and meta-data store) are operably coupled to the MDD
Aggregation Module. Importantly, the query handler and integrated MDD
Aggregation Module operate to provide for dramatically improved query response times for data aggregation operations and drill-downs. Moreover, illustrative embodiments may make user-querying of the non-relational MDDB no different than querying a relational table of the DBMS, in a manner that minimizes the delays associated with queries that involve aggregation or drill down operations. This is enabled by providing the novel DBMS system and integrated aggregation mechanism of the present invention.
FIG. 19B shows the primary components of an illustrative embodiment of the MDD
Aggregation Module of FIG. 19A, namely: a base data loader for loading the directory and table(s) of relational data store of the DBMS; an aggregation engine for receiving dimension data and atomic data from the base loader, a multi-dimensional database (MDDB); a MDDB handler and an SQL handler that operate cooperatively with the query handler of the DBMS to provide users with query access to the MDD Aggregation Module, and a control module for managing the operation of the components of the MDD
aggregation module. The base data loader may load the directory and table(s) of the relational data store over a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.). In this case, the DBMS and base data loader include components that provide communication of such data over these standard interfaces. Such interface components are well known in the art. For example, such interface components are readily available from Attunity Corporation, http://www.attunity.com.
During operation, base data originates from the table(s) of the DBMS. The core data aggregation operations are performed by the Aggregation Engine; a Multidimensional Data (MDDB) Handler; and a Multidimensional Data Storage (MDDB). The results of data aggregation are efficiently stored in the MDDB by the MDDB Handler. The SQL
handler of the MDD Aggregation module services user-submitted queries (in the preferred embodiment, SQL query statements) forwarded from the query handler of the DBMS. The SQL handler of the MDD Aggregation module may communicate with the query handler of the DBMS over a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.). In this case, the support mechanisms of the RDBMS and SQL handler include components that provide communication of such data over these standard interfaces. Such interface components are well known in the art. Aggregation (or drill down results) are retrieved on demand and returned to the user.
Typically, a user interacts with a client machine (for example, using a web-enabled browser) to generate a natural language query, that is communicated to the query interface of the DBMS, for example over a network as shown. The query interface disintegrates the query, via parsing, into a series of requests (in the preferred embodiment, SQL statements) that are communicated to the query handler of the DBMS. It should be noted that the functions of the query interface may be implemented in a module that is not part of the DBMS (for example, in the client machine). The query handler of the DBMS
forwards requests that involve data stored in the MDD of the MDD Aggregation module to the SQL
handler of the MDD Aggregation module for servicing. Each request specifies a set of n-dimensions. The SQL handler of the MDD Aggregation Module extracts this set of dimensions and operates cooperatively with the MDD handler to address the MDDB
using the set of dimensions, retrieve the addressed data from the MDDB, and return the results to 5 the user via the query handler of the DBMS.
Fig. 19C(i) and 19C(ii) is a flow chart illustrating the operations of an illustrative DBMS of the present invention. In step 601, the base data loader of the MDD
Aggregation Module loads the dictionary (or catalog) from the meta-data store of the DBMS.
In performing this function, the base data loader may utilize an adapter (interface) that maps 10 the data types of the dictionary of the DBMS (or that maps a standard data type used to represent the dictionary of the DBMS) into the data types used in the MDD
aggregation module. In addition, the base data loader extracts the dimensions from the dictionary and forwards the dimensions to the aggregation engine of the MDD Aggregation Module.
In step 603, the base data loader loads table(s) from the DBMS. In performing this 15 function, the base data loader may utilize an adapter (interface) that maps the data types of the table(s) of the DBMS (or that maps a standard data type used to represent the fact table(s) of the DBMS) into the data types used in the MDD Aggregation Module.
In addition, the base data loader extracts the atomic data from the table(s), and forwards the atomic data to the aggregation engine.
20 In step 605, the aggregation engine performs aggregation operations (i.e., roll-up operation) on the atomic data (provided by the base data loader in step 603) along at least one of the dimensions (extracted from the dictionary of the DBMS in step 601) and operates cooperatively with the MDD handler to store the resultant aggregated data in the MDDB. A more detailed description of exemplary aggregation operations according to a 25 preferred embodiment of the present invention is set forth below with respect to the QDR
process of Figs. 9A-9C.
In step 607, a reference is defined that provides users with the ability to query the data generated by the MDD Aggregation Module and/or stored in the MDDB of the MDD
Aggregation Module. This reference is preferably defined using the Create View SQL
30 statement, which allows the user to: i) define a table name (TN) associated with the MDDB
stored in the MDD Aggregation Module, and ii) define a link used to route SQL
statements on the table TN to the MDD Aggregation Module. In this embodiment, the view mechanism of the DBMS enables reference and linking to the data stored in the MDDB of the MDD Aggregation Engine as illustrated in FIG. 6(E). A more detailed description of the view mechanism and the Create View SQL statement may be found in C. J.
Date, "An Introduction to Database Systems," Addison-Wesley, Seventh Edition, 2000, pp.
289-326.
Thus, the view mechanism enables the query handler of the DBMS system to forward any SQL query on table TN to the MDD aggregation module via the associated link.
In an alternative embodiment, a direct mechanism (e.g., NA trigger mechanism) may be used to enable the DBMS system to reference and link to the data generated by the MDD
Aggregation Module and/or stored in the MDDB of the MDD Aggregation Engine as illustrated in FIG. 6F. A more detailed description of trigger mechanisms and methods may be found in C. J. Date, "An Introduction to Database Systems," Addison-Wesley, Seventh Edition, 2000, pp. 250, 266.
In step 609, a user interacts with a client machine to generate a query, and the query is communicated to the query interface. The query interface generate one or more SQL
statements. These SQL statements may refer to data stored in tables of the relational datastore, or may refer to the reference defined in step 607 (this reference refers to the data stored in the MDDB of the MDD Aggregation Module). These SQL statement(s) are forwarded to the query handler of the DBMS.
In step 611, the query handler receives the SQL statement(s); and optionally transforms such SQL statement(s) to optimize the SQL statement(s) for more efficient query handling. Such transformations are well known in the art. For example, see Kimball, "Aggregation Navigation With (Almost) No MetaData", DBMS Data Warehouse Supplement, August 1996, available at http://www.dbmsmag.com/9608d54.html.
In step 613: the query handler determines whether the received SQL
statement(s) [or transformed SQL statement(s)] is on the reference generated in step 607.
If so, operation continues to step 615; otherwise normal query handling operations continue in step 625 wherein the relational datastore is accessed to extract, store, and/or manipulate the data stored therein as directed by the query, and results are returned back to the user via the client machine, if needed.
In step 615, the received SQL statement(s) [or transformed SQL statement(s)]
is routed to the MDD aggregation engine for processing in step 617 using the link for the reference as described above with respect to step 607.
46a In step 617, the SQL statement(s) is received by the SQL handler of the MDD
Aggregation Module, wherein a set of one or more N-dimensional coordinates are extracted from the SQL statement. In performing this function, SQL handler may utilize an adapter (interface) that maps the data types of the SQL statement issued by query handler of the DBMS (or that maps a standard data type used to represent the SQL statement issued by query handler of the DBMS) into the data types used in the MDD aggregation module.
In step 619, the set of N-dimensional coordinates extracted in step 617 are used by the MDD handler to address the MDDB and retrieve the corresponding data from the MDDB.
Finally, in step 621, the retrieved data is returned to the user via the DBMS
(for example, by forwarding the retrieved data to the SQL handler, which returns the retrieved data to the query handler of the DBMS system, which returns the results of the user-submitted query to the user via the client machine), and the operation ends.
It should be noted that the table data (base data), as it arrives from DBMS, may be analyzed and reordered to optimize hierarchy handling, according to the unique method of the present invention, as described above with reference to Figs. 11A, 11B and 11C.
Moreover, the MDD control module of the MDD Aggregation Module preferably administers the aggregation process according to the method illustrated in Figs. 9A and 9B. Thus, in accordance with the principles of the present invention, data aggregation within the DBMS can be carried out either as a complete pre-aggregation process, where the base data is fully aggregated before commencing querying, or as a query directed roll-up (QDR) process, where querying is allowed at any stage of aggregation using the "on-the-fly" data aggregation process of the present invention. The QDR process will be described hereinafter in greater detail with reference to Fig.
9C. The response to a request (i.e. a basic component of a client query) requiring "on-the-fly" data aggregation, or requiring access to pre-aggregated result data via the MDD
handler is provided by a 2,5 query/request serving mechanism of the present invention within the MDD
control module, the primary operations of which are illustrated in the flow chart of Fig. 6D. The function of the MDD
Handler is to handle multidimensional data in the storage(s) module in a very efficient way, according to the novel method of the present invention, which will be described in detail hereinafter with reference to Figs. 10A and 10B.
The SQL handling mechanism shown in Fig. 6D is controlled by the MDD control module.
Requests are queued and served one by one. If the required data is already pre-calculated, then it is retrieved by the MDD handler and returned to the client. Otherwise, the required data is calculated "on-the-fly" by the aggregation engine, and the result moved out to the client, while simultaneously stored by the MDD handler, shown in Fig. 6C.
As illustrated in Fig. 19G, the DBMS of the present invention as described above may be logically partitioned into a relational part and a non-relational part. The relational part includes the relational datastore (e.g., table(s) and dictionary) and support mechanisms (e.g., query handling services). The non-relational part includes the MDD Aggregation Module. As described above, bi-directional data flow occurs between the relational part and the non-relational part as shown.
More specifically, during data load operations, data is loaded from the relational part (i.e., the relational datastore) into the non-relational part, wherein it is aggregated and stored in the MDDB.
And during query servicing operations, when a given query references data stored in the MDDB, data pertaining to the query is generated by the non-relational part (e.g., generated and/or retrieved from the MDDB) and supplied to the relational part (e.g., query servicing mechanism) for communication back to the user. Such bi-directional data flow represents an important distinguishing feature with respect to the prior art. For example, in the prior art MOLAP
architecture as illustrated in Fig. 1B, unidirectional data flows occurs from the relational data base (e.g., the Data Warehouse RDBMS system) into the MDDB during data loading operations.
Figs. 20A and 20B outline two different implementations of the DBMS of the present invention. In both implementations, the query handler of the DBMS system supplies aggregated results retrieved from the MDD to a client.
Fig. 20A shows a separate-platform implementation of the DBMS system of the illustrative embodiment shown in Fig. 19A, wherein the relational part of the DBMS reside on a separate hardware platform and/or OS system from that used to run the non-relational part (MDD
Aggregation Module). In this type of implementation, it is even possible to run parts of the DBMS
system and the MDD Aggregation Module on different-type operating systems (e.g. NT, Unix, MAC OS).
Fig. 20B shows a common-platform implementation of the DBMS system of the illustrative embodiment shown in Fig. 20A, wherein the relational part of the DBMS share the same hardware platform and operating system (OS) that is used to run the non-relational part (MDD Aggregation Module).
Fig. 21 shows the improved DBMS (e.g., RDBMS) of the present invention as a component of a data warehouse, serving the data storage and aggregation needs of a ROLAP
system (or other OLAP systems alike). Importantly, the improved DBMS of the present invention provides flexible, high-performance access and analysis of large volumes of complex and interrelated data.
Moreover, the improved Data Warehouse DBMS of the present invention can simultaneously serve many different kinds of clients (e.g. data mart, OLAP, URL) and has the power of delivering an enterprise-wide data storage and aggregation in a cost-effective way. This kind of system eliminates redundancy over the group of clients, delivering scalability and flexibility. Moreover, the improved DBMS of the present invention can be used as the data store component of in any informational database system as described above, including data analysis programs such as spread-sheet modeling programs, serving the data storage and aggregation needs of such systems.
Fig. 22 shows an embodiment of the present invention wherein the DBMS (e.g., RDBMS) of the present invention is a component of a data warehouse - OLAP system. The DBMS operates as a traditional data warehouse, serving the data storage and aggregation needs of an enterprise. In addition, the DBMS includes integrated OLAP Analysis Logic (and preferably an integrated Presentation Module not shown) that operates cooperatively with the query handling of the DBMS
system and the MDD Aggregation Module to enable users of the DBMS system to execute multidimensional reports (e.g., ratios, ranks, transforms, dynamic consolidation, complex filtering, forecasts, query governing, scheduling, flow control, pre-aggregate inferencing, denormalization support, and/or table partitioning and joins) and preferably perform traditional OLAP analyses (grids, graphs, maps, alerts, drill-down, data pivot, data surf, slice and dice, print). Importantly, the improved DBMS of the present invention provides flexible, high-performance access and analysis of large volumes of complex and interrelated data. Moreover, the improved DBMS of the present invention can simultaneously serve many different kinds of clients (e.g. data mart, other OLAP systems, URL-Directory Systems) and has the power of delivering enterprise-wide data storage and aggregation and OLAP analysis in a cost-effective way. This kind of system eliminates redundancy over the group of clients, delivering scalability and flexibility.
Moreover, the improved DBMS of the present invention can be used as the data store component of in any informational database system as described above, serving the data storage and aggregation needs of such systems.
Functional Advantages Gained By The Improved DBMS Of The Present Invention The features of the DBMS of the present invention, provides for dramatically improved response time in handling queries issued to the DBMS that involve aggregation, thus enabling enterprise-wide centralized aggregation. Moreover, in the preferred embodiment of the present invention, users can query the aggregated data in an manner no different than traditional queries on 5 the DBMS.
The method of Segmented Aggregation employed by the novel DBMS of the present invention provides flexibility, scalability, the capability of Query Directed Aggregation, and speed improvement.
Moreover, the method of Query Directed Aggregation (QDR) employed by the novel DBMS of the present invention minimizes the data handling operations in multi-hierarchy data structures, eliminates the need to wait for full aggregation to be complete, and provides for build-up of aggregated data required for full aggregation.
It is understood that the System and Method of the illustrative embodiments described herein above may be modified in a variety of ways which will become readily apparent to those skilled in the art of having the benefit of the novel teachings disclosed herein. All such modifications and variations of the illustrative embodiments thereof shall be deemed to be within the scope and spirit of the present invention as defined by the Claims to Invention appended hereto.
Claims (45)
1. A database management system (DBMS) comprising:
a relational datastore storing data in tables;
an aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore;
a reference generating mechanism for generating a first reference to data stored in the relational datastore and a second reference to aggregated data generated by the aggregation module and stored in the non-relational datastore; and a query processing mechanism for processing query statements, wherein, upon identifying that a given query statement is on said second reference, the query processing mechanism communicates with said aggregation module to retrieve portions of aggregated data identified by said reference that are relevant to said given query statement.
a relational datastore storing data in tables;
an aggregation module, operatively coupled to the relational datastore, for aggregating the data stored in the tables of the relational datastore and storing the resultant aggregated data in a non-relational datastore;
a reference generating mechanism for generating a first reference to data stored in the relational datastore and a second reference to aggregated data generated by the aggregation module and stored in the non-relational datastore; and a query processing mechanism for processing query statements, wherein, upon identifying that a given query statement is on said second reference, the query processing mechanism communicates with said aggregation module to retrieve portions of aggregated data identified by said reference that are relevant to said given query statement.
2. The DBMS of claim 1, for use as a relational database management system (RDBMS) wherein the relational datastore stores fact data.
3. The DBMS of claim 2, wherein the reference generating mechanism comprises a view mechanism.
4. The DBMS of claim 1, wherein the reference generating mechanism comprises a native trigger mechanism.
5. The DBMS of claim 1, wherein the non-relational datastore comprises a multi-dimensional database.
6. The DBMS of claim 1, wherein the reference generating mechanism is part of a query servicing mechanism for servicing user submitted query statements.
7. The DBMS of claim 6, wherein said aggregation module includes a query handling mechanism for receiving query statements, and wherein communication between said query processing mechanism and said query handling mechanism is accomplished by forwarding the given query statement to the query handling mechanism of the aggregation module.
8. The DBMS of claim 7, wherein said query handling mechanism extracts at least one dimension from the received query statement and forwards the at least one dimension to a storage handler, and wherein the storage handler accesses locations of the non-relational datastore based upon the forwarded at least one dimension and returns the portions of aggregated data retrieved back to the query servicing mechanism for communication to the user.
9. The DBMS of claim 6, wherein said aggregation module includes a data loading mechanism for loading data from the relational datastore, an aggregation engine for aggregating the data loaded from the relational datastore, and a storage handler for storing in the non-relational datastore the data loaded from the relational datastore and the aggregated data generated by the aggregation engine.
10. The DBMS of claim 9, wherein said aggregation module includes control logic that, upon determining that the non-relational datastore does not contain data required to service a given query statement, controls the aggregation engine to generate aggregated data required to service the given query statement and controls the aggregation module to return the aggregated data back to the query servicing mechanism for communication to the user.
11. The DBMS of claim 1, further comprising OLAP analysis logic integral to the DBMS.
12. The DBMS of claim 1, further comprising OLAP presentation logic integral to the DBMS.
13. The DBMS of claim 1, for use as an enterprise wide data warehouse that interfaces to a plurality of information technology systems.
14. The DBMS of claim 1, for use as a database store in an informational database system.
15. The DBMS of claim 14, wherein the informational database system requires aggregation and calculations on basic detailed data.
16. The DBMS of claim 11, for use as a database store in an operational database system.
17. The DBMS of claim 16, wherein the operational database system is part of one of the following systems: a Customer Relations Management System, an Enterprise Resource Planning System, a Customer Data Record Database System.
18. The DBMS of claim 1, wherein said query statements are generated by a query interface in response to communication of a natural language query communicated from a client machine.
19. The DBMS of claim 18, wherein said client machine comprises a web-enabled browser to communicate said natural language query to the query interface.
20. The DBMS of claim 7, wherein the query processing mechanism of the DBMS
and the query handling mechanism of the aggregation module communicate over a standard interface.
and the query handling mechanism of the aggregation module communicate over a standard interface.
21. The DBMS of claim 20, wherein the standard interface comprises one of:
OLDB, OLE-DB, ODBC, SQL, and JDBC.
OLDB, OLE-DB, ODBC, SQL, and JDBC.
22. The DBMS of claim 1, for use as an object database management system (ODBMS).
23. The DBMS of claim 1, for use as an object-relational database management system (ORDBMS).
24. The DBMS of claim 5, further comprising a relational part that includes the relational datastore and support mechanisms.
25. The DBMS of claim 24, wherein bi-directional data flow occurs between the relational part and the aggregation module whereby data stored in the relational datastore in loaded into the aggregation module and aggregated data stored in the multidimensional database of the aggregation module is communicated to the relational part.
26. The DBMS of claim 5, wherein user operations in querying the multidimensional database and the relational datastore are the same.
27. In a database management system (DBMS) comprising a relational datastore storing data in tables, a method for aggregating the data stored in the tables of the relational datastore and providing query access to the aggregated data, the method comprising the steps of:
(a) aggregating the data stored in the relational datastore and storing the resultant aggregated data in a non-relational datastore, wherein the aggregation is performed by an integrated aggregation module, operatively coupled to the relational datastore;
(b) in response to user input, generating a reference to aggregated data generated by the integrated aggregation module; and (c) processing a given query statement generated in response to user input, wherein, upon identifying that the given query statement is on said reference, retrieving from the integrated aggregation module portions of aggregated data identified by said reference that are relevant to said given query statement.
(a) aggregating the data stored in the relational datastore and storing the resultant aggregated data in a non-relational datastore, wherein the aggregation is performed by an integrated aggregation module, operatively coupled to the relational datastore;
(b) in response to user input, generating a reference to aggregated data generated by the integrated aggregation module; and (c) processing a given query statement generated in response to user input, wherein, upon identifying that the given query statement is on said reference, retrieving from the integrated aggregation module portions of aggregated data identified by said reference that are relevant to said given query statement.
28. The method of claim 27, wherein step (c) further comprises the step of extracting at least one dimension from the given query statement, accessing locations of the non-relational datastore based upon the extracted at least one dimension, and returning the portions of aggregated data retrieved back to the user.
29. The method of claim 27, wherein step (a) further comprises the steps of loading data from the relational datastore, aggregating the data loaded from the relational datastore, and storing in the non-relational datastore the data loaded from the relational datastore and resultant aggregated data.
30. The method of claim 29, wherein said integrated aggregation module, upon determining that the non-relational datastore does not contain data required to service the given query statement, controls the aggregation engine to generate aggregated data required to service the given query statement and returns the aggregated data back to the user.
31. The method of claim 27, wherein the DBMS comprises a relational database management system (RDBMS) storing fact data in the relational datastore.
32. The method of claim 27, wherein the non-relational datastore comprises a multi-dimensional database.
33. The method of claim 27, wherein the DBMS includes OLAP analysis logic integral to the DBMS.
34. The method of claim 33, wherein the DBMS includes OLAP presentation logic integral to the DBMS.
35. The method of claim 27, wherein the DBMS is used as an enterprise wide data warehouse that interfaces to a plurality of information technology systems.
36. The method of claim 27, wherein the DBMS is used as a database store in an informational database system.
37. The method of claim 36, wherein the informational database system requires aggregation and calculations on basic detailed data.
38. The method claim 27, wherein the DBMS is used as a database store in an operational database system.
39. The method of claim 38, wherein the operational database system is part of one of following systems: a Customer Relations Management System, an Enterprise Resource Planning System, a Customer Data Record Database System.
40. The method of claim 27, wherein user operations in querying the relational datastore and non-relational datastore generate natural language queries communicated from a client machine.
41. The method of claim 40, wherein said client machine comprises a web-enabled browser to generate said natural language queries.
42. The method of claim 27, wherein communication with the integrated aggregation module occurs over a standard interface.
43. The method of claim 42, wherein the standard interface comprises one of: OLDB, OLE-DB, ODBC, SQL, and JDBC.
44. The method of claim 27, wherein the DBMS comprises an object database management system (ODBMS).
45. The method of claim 27, wherein the DBMS comprises as an object-relational database management system (ORDBMS).
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/514,611 | 2000-02-28 | ||
US09/514,611 US6434544B1 (en) | 1999-08-04 | 2000-02-28 | Stand-alone cartridge-style data aggregation server providing data aggregation for OLAP analyses |
US09/634,748 US6385604B1 (en) | 1999-08-04 | 2000-08-09 | Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements |
US09/634,748 | 2000-08-09 | ||
PCT/US2001/006316 WO2001067303A1 (en) | 2000-02-28 | 2001-02-28 | Multi-dimensional database and integrated aggregation server |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2401348A1 CA2401348A1 (en) | 2001-09-13 |
CA2401348C true CA2401348C (en) | 2013-08-27 |
Family
ID=27058253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2401348A Expired - Fee Related CA2401348C (en) | 2000-02-28 | 2001-02-28 | Multi-dimensional database and integrated aggregation server |
Country Status (6)
Country | Link |
---|---|
US (7) | US6385604B1 (en) |
EP (1) | EP1266308A4 (en) |
JP (1) | JP5242875B2 (en) |
AU (1) | AU2001239919A1 (en) |
CA (1) | CA2401348C (en) |
WO (1) | WO2001067303A1 (en) |
Families Citing this family (315)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DK173451B1 (en) * | 1999-04-16 | 2000-11-20 | Targit As | Method, apparatus and data carrier for processing queries to a database |
US7146354B1 (en) * | 1999-06-18 | 2006-12-05 | F5 Networks, Inc. | Method and system for network load balancing with a compound data structure |
US6408292B1 (en) * | 1999-08-04 | 2002-06-18 | Hyperroll, Israel, Ltd. | Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions |
US6385604B1 (en) * | 1999-08-04 | 2002-05-07 | Hyperroll, Israel Limited | Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements |
EP1087306A3 (en) * | 1999-09-24 | 2004-11-10 | Xerox Corporation | Meta-documents and method of managing them |
FR2806183B1 (en) * | 1999-12-01 | 2006-09-01 | Cartesis S A | DEVICE AND METHOD FOR INSTANT CONSOLIDATION, ENRICHMENT AND "REPORTING" OR BACKGROUND OF INFORMATION IN A MULTIDIMENSIONAL DATABASE |
US6529953B1 (en) * | 1999-12-17 | 2003-03-04 | Reliable Network Solutions | Scalable computer network resource monitoring and location system |
US6901406B2 (en) * | 1999-12-29 | 2005-05-31 | General Electric Capital Corporation | Methods and systems for accessing multi-dimensional customer data |
US20020029207A1 (en) * | 2000-02-28 | 2002-03-07 | Hyperroll, Inc. | Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein |
CA2403716A1 (en) * | 2000-03-22 | 2001-09-27 | Arac Management Services, Inc. | Apparatus and methods for interactive rental information retrieval and management |
US7222130B1 (en) * | 2000-04-03 | 2007-05-22 | Business Objects, S.A. | Report then query capability for a multidimensional database model |
US6768986B2 (en) * | 2000-04-03 | 2004-07-27 | Business Objects, S.A. | Mapping of an RDBMS schema onto a multidimensional data model |
AU2001257077A1 (en) * | 2000-04-17 | 2001-10-30 | Brio Technology, Inc. | Analytical server including metrics engine |
US7136821B1 (en) | 2000-04-18 | 2006-11-14 | Neat Group Corporation | Method and apparatus for the composition and sale of travel-oriented packages |
US7072897B2 (en) * | 2000-04-27 | 2006-07-04 | Hyperion Solutions Corporation | Non-additive measures and metric calculation |
US6748394B2 (en) | 2000-04-27 | 2004-06-08 | Hyperion Solutions Corporation | Graphical user interface for relational database |
US6941311B2 (en) | 2000-04-27 | 2005-09-06 | Hyperion Solutions Corporation | Aggregate navigation system |
US6732115B2 (en) | 2000-04-27 | 2004-05-04 | Hyperion Solutions Corporation | Chameleon measure and metric calculation |
US7167859B2 (en) * | 2000-04-27 | 2007-01-23 | Hyperion Solutions Corporation | Database security |
US7080090B2 (en) | 2000-04-27 | 2006-07-18 | Hyperion Solutions Corporation | Allocation measures and metric calculations in star schema multi-dimensional data warehouse |
US7269786B1 (en) | 2000-05-04 | 2007-09-11 | International Business Machines Corporation | Navigating an index to access a subject multi-dimensional database |
US6915289B1 (en) * | 2000-05-04 | 2005-07-05 | International Business Machines Corporation | Using an index to access a subject multi-dimensional database |
US20010044796A1 (en) * | 2000-05-19 | 2001-11-22 | Hiroyasu Fujiwara | Totalization system and recording medium |
AU2001261702B2 (en) * | 2000-05-22 | 2004-04-29 | International Business Machines Corporation | Revenue forecasting and sales force management using statistical analysis |
US7117215B1 (en) | 2001-06-07 | 2006-10-03 | Informatica Corporation | Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface |
US7043457B1 (en) | 2000-06-28 | 2006-05-09 | Probuild, Inc. | System and method for managing and evaluating network commodities purchasing |
US20050119980A1 (en) * | 2000-06-29 | 2005-06-02 | Neat Group Corporation | Electronic negotiation systems |
US6826564B2 (en) * | 2000-07-10 | 2004-11-30 | Fastforward Networks | Scalable and programmable query distribution and collection in a network of queryable devices |
US7165065B1 (en) * | 2000-07-14 | 2007-01-16 | Oracle Corporation | Multidimensional database storage and retrieval system |
US7130822B1 (en) * | 2000-07-31 | 2006-10-31 | Cognos Incorporated | Budget planning |
US6704740B1 (en) * | 2000-08-10 | 2004-03-09 | Ford Motor Company | Method for analyzing product performance data |
US6850947B1 (en) * | 2000-08-10 | 2005-02-01 | Informatica Corporation | Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications |
US6801921B2 (en) * | 2000-09-08 | 2004-10-05 | Hitachi, Ltd. | Method and system for managing multiple database storage units |
JP3827936B2 (en) * | 2000-10-18 | 2006-09-27 | シャープ株式会社 | Information providing control device, information providing method, recording medium recording information providing program, and information providing system |
US7039871B2 (en) * | 2000-10-27 | 2006-05-02 | Swiftknowledge, Inc. | Secure data access in a multidimensional data environment |
US7257596B1 (en) * | 2000-11-09 | 2007-08-14 | Integrated Marketing Technology | Subscription membership marketing application for the internet |
US6842904B1 (en) * | 2000-11-21 | 2005-01-11 | Microsoft Corporation | Extensible architecture for versioning APIs |
US6748384B1 (en) * | 2000-12-13 | 2004-06-08 | Objective Systems Integrators Inc. | System and method for dynamically summarizing data stores |
US6687693B2 (en) * | 2000-12-18 | 2004-02-03 | Ncr Corporation | Architecture for distributed relational data mining systems |
US7143099B2 (en) * | 2001-02-08 | 2006-11-28 | Amdocs Software Systems Limited | Historical data warehousing system |
WO2002069300A1 (en) * | 2001-02-22 | 2002-09-06 | Koyo Musen America, Inc. | Collecting, analyzing, consolidating, delivering and utilizing data relating to a current event |
GB2372600B (en) * | 2001-02-27 | 2003-02-19 | 3Com Corp | Network area storage block and file aggregation |
US7240285B2 (en) * | 2001-03-01 | 2007-07-03 | Sony Corporation | Encoding and distribution of schema for multimedia content descriptions |
US20020129145A1 (en) * | 2001-03-06 | 2002-09-12 | Accelerate Software Inc. | Method and system for real-time querying, retrieval and integration of data from database over a computer network |
US20020129342A1 (en) * | 2001-03-07 | 2002-09-12 | David Kil | Data mining apparatus and method with user interface based ground-truth tool and user algorithms |
US6931418B1 (en) * | 2001-03-26 | 2005-08-16 | Steven M. Barnes | Method and system for partial-order analysis of multi-dimensional data |
US7415438B1 (en) * | 2001-06-12 | 2008-08-19 | Microstrategy, Incorporated | System and method for obtaining feedback from delivery of informational and transactional data |
US7162643B1 (en) | 2001-06-15 | 2007-01-09 | Informatica Corporation | Method and system for providing transfer of analytic application data over a network |
US7720842B2 (en) | 2001-07-16 | 2010-05-18 | Informatica Corporation | Value-chained queries in analytic applications |
US6965886B2 (en) * | 2001-11-01 | 2005-11-15 | Actimize Ltd. | System and method for analyzing and utilizing data, by executing complex analytical models in real time |
US7062479B2 (en) * | 2001-11-02 | 2006-06-13 | Cognos Incorporated | Calculation engine for use in OLAP environments |
US7937363B2 (en) * | 2001-11-02 | 2011-05-03 | International Business Machines Corporation | Calculation engine for use in OLAP environments |
EA200400873A1 (en) * | 2001-12-28 | 2005-12-29 | Джеффри Джэймс Джонас | REAL-TIME DATA STORAGE |
MXPA04008142A (en) | 2002-02-22 | 2006-03-10 | Abel Noser Corp | Systems and methods for analysis of portfolio returns and trade cost measurement based on fiduciary roles. |
US6820077B2 (en) | 2002-02-22 | 2004-11-16 | Informatica Corporation | Method and system for navigating a large amount of data |
US7171427B2 (en) * | 2002-04-26 | 2007-01-30 | Oracle International Corporation | Methods of navigating a cube that is implemented as a relational object |
US7366730B2 (en) | 2002-04-26 | 2008-04-29 | Oracle International Corporation | Registration of solved cubes within a relational database management system |
US7415457B2 (en) * | 2002-04-26 | 2008-08-19 | Oracle International Corporation | Using a cache to provide cursor isolation |
US8868544B2 (en) * | 2002-04-26 | 2014-10-21 | Oracle International Corporation | Using relational structures to create and support a cube within a relational database system |
US7149983B1 (en) * | 2002-05-08 | 2006-12-12 | Microsoft Corporation | User interface and method to facilitate hierarchical specification of queries using an information taxonomy |
US7548935B2 (en) * | 2002-05-09 | 2009-06-16 | Robert Pecherer | Method of recursive objects for representing hierarchies in relational database systems |
US8001112B2 (en) * | 2002-05-10 | 2011-08-16 | Oracle International Corporation | Using multidimensional access as surrogate for run-time hash table |
US7447687B2 (en) * | 2002-05-10 | 2008-11-04 | International Business Machines Corporation | Methods to browse database query information |
BR0312989A (en) * | 2002-07-26 | 2008-03-04 | Ron Everett | database and knowledge operation system |
US7257612B2 (en) * | 2002-09-30 | 2007-08-14 | Cognos Incorporated | Inline compression of a network communication within an enterprise planning environment |
US7072822B2 (en) * | 2002-09-30 | 2006-07-04 | Cognos Incorporated | Deploying multiple enterprise planning models across clusters of application servers |
US6768995B2 (en) * | 2002-09-30 | 2004-07-27 | Adaytum, Inc. | Real-time aggregation of data within an enterprise planning environment |
CN1685351A (en) * | 2002-09-30 | 2005-10-19 | 厄得塔姆公司 | Node-level modification during execution of an enterprise planning model |
US7370270B2 (en) | 2002-10-23 | 2008-05-06 | Aol Llc A Delaware Limited Liability Company | XML schema evolution |
US7900052B2 (en) * | 2002-11-06 | 2011-03-01 | International Business Machines Corporation | Confidential data sharing and anonymous entity resolution |
US7703028B2 (en) * | 2002-12-12 | 2010-04-20 | International Business Machines Corporation | Modifying the graphical display of data entities and relational database structures |
US7467125B2 (en) * | 2002-12-12 | 2008-12-16 | International Business Machines Corporation | Methods to manage the display of data entities and relational database structures |
US7472127B2 (en) * | 2002-12-18 | 2008-12-30 | International Business Machines Corporation | Methods to identify related data in a multidimensional database |
US7716167B2 (en) * | 2002-12-18 | 2010-05-11 | International Business Machines Corporation | System and method for automatically building an OLAP model in a relational database |
US20040122814A1 (en) * | 2002-12-18 | 2004-06-24 | International Business Machines Corporation | Matching groupings, re-aggregation avoidance and comprehensive aggregate function derivation rules in query rewrites using materialized views |
US7181450B2 (en) * | 2002-12-18 | 2007-02-20 | International Business Machines Corporation | Method, system, and program for use of metadata to create multidimensional cubes in a relational database |
US7305410B2 (en) * | 2002-12-26 | 2007-12-04 | Rocket Software, Inc. | Low-latency method to replace SQL insert for bulk data transfer to relational database |
US8620937B2 (en) * | 2002-12-27 | 2013-12-31 | International Business Machines Corporation | Real time data warehousing |
CN100541443C (en) * | 2002-12-31 | 2009-09-16 | 国际商业机器公司 | The method and system that is used for deal with data |
US7953694B2 (en) | 2003-01-13 | 2011-05-31 | International Business Machines Corporation | Method, system, and program for specifying multidimensional calculations for a relational OLAP engine |
US7200602B2 (en) * | 2003-02-07 | 2007-04-03 | International Business Machines Corporation | Data set comparison and net change processing |
US7756901B2 (en) | 2003-02-19 | 2010-07-13 | International Business Machines Corporation | Horizontal enterprise planning in accordance with an enterprise planning model |
US7155398B2 (en) * | 2003-02-19 | 2006-12-26 | Cognos Incorporated | Cascaded planning of an enterprise planning model |
US20040181518A1 (en) * | 2003-03-14 | 2004-09-16 | Mayo Bryan Edward | System and method for an OLAP engine having dynamic disaggregation |
US7962757B2 (en) * | 2003-03-24 | 2011-06-14 | International Business Machines Corporation | Secure coordinate identification method, system and program |
US20040193633A1 (en) * | 2003-03-28 | 2004-09-30 | Cristian Petculescu | Systems, methods, and apparatus for automated dimensional model definitions and builds utilizing simplified analysis heuristics |
US7895191B2 (en) | 2003-04-09 | 2011-02-22 | International Business Machines Corporation | Improving performance of database queries |
US20040215656A1 (en) * | 2003-04-25 | 2004-10-28 | Marcus Dill | Automated data mining runs |
US7765211B2 (en) * | 2003-04-29 | 2010-07-27 | International Business Machines Corporation | System and method for space management of multidimensionally clustered tables |
US8200612B2 (en) * | 2003-05-07 | 2012-06-12 | Oracle International Corporation | Efficient SQL access to multidimensional data |
US8612421B2 (en) * | 2003-05-07 | 2013-12-17 | Oracle International Corporation | Efficient processing of relational joins of multidimensional data |
US8209280B2 (en) * | 2003-05-07 | 2012-06-26 | Oracle International Corporation | Exposing multidimensional calculations through a relational database server |
US7530012B2 (en) * | 2003-05-22 | 2009-05-05 | International Business Machines Corporation | Incorporation of spreadsheet formulas of multi-dimensional cube data into a multi-dimensional cube |
US20040267746A1 (en) * | 2003-06-26 | 2004-12-30 | Cezary Marcjan | User interface for controlling access to computer objects |
JP4330941B2 (en) * | 2003-06-30 | 2009-09-16 | 株式会社日立製作所 | Database divided storage management apparatus, method and program |
JP4186987B2 (en) | 2003-07-11 | 2008-11-26 | 日本電信電話株式会社 | Database access control method, database access control device, database access control program, and recording medium storing the program |
US7299223B2 (en) | 2003-07-16 | 2007-11-20 | Oracle International Corporation | Spreadsheet to SQL translation |
US7707548B2 (en) * | 2003-07-22 | 2010-04-27 | Verizon Business Global Llc | Integration of information distribution systems |
US9230007B2 (en) * | 2003-10-03 | 2016-01-05 | Oracle International Corporation | Preserving sets of information in rollup tables |
US7421458B1 (en) | 2003-10-16 | 2008-09-02 | Informatica Corporation | Querying, versioning, and dynamic deployment of database objects |
US20050108204A1 (en) * | 2003-11-13 | 2005-05-19 | International Business Machines | System and method for managing OLAP summary tables |
US7657516B2 (en) * | 2003-12-01 | 2010-02-02 | Siebel Systems, Inc. | Conversion of a relational database query to a query of a multidimensional data source by modeling the multidimensional data source |
US7254590B2 (en) * | 2003-12-03 | 2007-08-07 | Informatica Corporation | Set-oriented real-time data processing based on transaction boundaries |
US7756739B2 (en) * | 2004-02-12 | 2010-07-13 | Microsoft Corporation | System and method for aggregating a measure over a non-additive account dimension |
US7263520B2 (en) * | 2004-02-27 | 2007-08-28 | Sap Ag | Fast aggregation of compressed data using full table scans |
US8478668B2 (en) * | 2004-03-12 | 2013-07-02 | Sybase, Inc. | Hierarchical entitlement system with integrated inheritance and limit checks |
US7797239B2 (en) * | 2004-03-12 | 2010-09-14 | Sybase, Inc. | Hierarchical entitlement system with integrated inheritance and limit checks |
US9940374B2 (en) * | 2004-04-26 | 2018-04-10 | Right90, Inc. | Providing feedback in a operating plan data aggregation system |
US9684703B2 (en) | 2004-04-29 | 2017-06-20 | Precisionpoint Software Limited | Method and apparatus for automatically creating a data warehouse and OLAP cube |
EP1747548A4 (en) * | 2004-05-17 | 2009-08-05 | Visible Path Corp | System and method for enforcing privacy in social networks |
US8572221B2 (en) | 2004-05-26 | 2013-10-29 | Facebook, Inc. | System and method for managing an online social network |
US20060149739A1 (en) * | 2004-05-28 | 2006-07-06 | Metadata, Llc | Data security in a semantic data model |
US7076493B2 (en) * | 2004-05-28 | 2006-07-11 | Metadata, Llc | Defining a data dependency path through a body of related data |
US7707143B2 (en) | 2004-06-14 | 2010-04-27 | International Business Machines Corporation | Systems, methods, and computer program products that automatically discover metadata objects and generate multidimensional models |
US20050283494A1 (en) * | 2004-06-22 | 2005-12-22 | International Business Machines Corporation | Visualizing and manipulating multidimensional OLAP models graphically |
US7480663B2 (en) * | 2004-06-22 | 2009-01-20 | International Business Machines Corporation | Model based optimization with focus regions |
US7213199B2 (en) * | 2004-07-16 | 2007-05-01 | Cognos Incorporated | Spreadsheet user-interface for an enterprise planning system having multi-dimensional data store |
US20060036641A1 (en) * | 2004-07-28 | 2006-02-16 | Antony Brydon | System and method for using social networks for the distribution of communications |
US8131472B2 (en) * | 2004-09-28 | 2012-03-06 | International Business Machines Corporation | Methods for hierarchical organization of data associated with medical events in databases |
US8276150B2 (en) * | 2004-10-12 | 2012-09-25 | International Business Machines Corporation | Methods, systems and computer program products for spreadsheet-based autonomic management of computer systems |
US8892571B2 (en) | 2004-10-12 | 2014-11-18 | International Business Machines Corporation | Systems for associating records in healthcare database with individuals |
JP4463661B2 (en) * | 2004-11-01 | 2010-05-19 | 株式会社日立製作所 | Computer system, computer, database access method and database system |
US7505888B2 (en) * | 2004-11-30 | 2009-03-17 | International Business Machines Corporation | Reporting model generation within a multidimensional enterprise software system |
US7418438B2 (en) * | 2004-11-30 | 2008-08-26 | International Business Machines Corporation | Automated default dimension selection within a multidimensional enterprise software system |
US7610300B2 (en) * | 2004-11-30 | 2009-10-27 | International Business Machines Corporation | Automated relational schema generation within a multidimensional enterprise software system |
US7593955B2 (en) * | 2004-11-30 | 2009-09-22 | International Business Machines Corporation | Generation of aggregatable dimension information within a multidimensional enterprise software system |
US20060136380A1 (en) * | 2004-12-17 | 2006-06-22 | Purcell Terence P | System and method for executing a multi-table query |
US7580922B2 (en) * | 2005-01-04 | 2009-08-25 | International Business Machines Corporation | Methods for relating data in healthcare databases |
JP4159099B2 (en) | 2005-05-16 | 2008-10-01 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Dimension table processing apparatus, dimension hierarchy extraction apparatus, dimension table processing method, dimension hierarchy extraction method, and program |
US7584205B2 (en) * | 2005-06-27 | 2009-09-01 | Ab Initio Technology Llc | Aggregating data with complex operations |
US8099674B2 (en) | 2005-09-09 | 2012-01-17 | Tableau Software Llc | Computer systems and methods for automatically viewing multidimensional databases |
US20070088706A1 (en) * | 2005-10-17 | 2007-04-19 | Goff Thomas C | Methods and devices for simultaneously accessing multiple databases |
US7464083B2 (en) | 2005-10-24 | 2008-12-09 | Wolfgang Otter | Combining multi-dimensional data sources using database operations |
US7702615B1 (en) | 2005-11-04 | 2010-04-20 | M-Factor, Inc. | Creation and aggregation of predicted data |
US20070174091A1 (en) * | 2006-01-26 | 2007-07-26 | International Business Machines Corporation | Methods, data structures, systems and computer program products for identifying obsure patterns in healthcare related data |
US8200501B2 (en) * | 2006-01-26 | 2012-06-12 | International Business Machines Corporation | Methods, systems and computer program products for synthesizing medical procedure information in healthcare databases |
US20070174318A1 (en) | 2006-01-26 | 2007-07-26 | International Business Machines Corporation | Methods and apparatus for constructing declarative componentized applications |
US8566113B2 (en) * | 2006-02-07 | 2013-10-22 | International Business Machines Corporation | Methods, systems and computer program products for providing a level of anonymity to patient records/information |
US8285752B1 (en) | 2006-03-20 | 2012-10-09 | Symantec Operating Corporation | System and method for maintaining a plurality of summary levels in a single table |
US7970735B2 (en) * | 2006-03-20 | 2011-06-28 | Microsoft Corporation | Cross varying dimension support for analysis services engine |
US7961189B2 (en) * | 2006-05-16 | 2011-06-14 | Sony Corporation | Displaying artists related to an artist of interest |
US7774288B2 (en) * | 2006-05-16 | 2010-08-10 | Sony Corporation | Clustering and classification of multimedia data |
US20070271286A1 (en) * | 2006-05-16 | 2007-11-22 | Khemdut Purang | Dimensionality reduction for content category data |
US7750909B2 (en) * | 2006-05-16 | 2010-07-06 | Sony Corporation | Ordering artists by overall degree of influence |
US7840568B2 (en) * | 2006-05-16 | 2010-11-23 | Sony Corporation | Sorting media objects by similarity |
US7698257B2 (en) * | 2006-05-16 | 2010-04-13 | Business Objects Software Ltd. | Apparatus and method for recursively rationalizing data source queries |
US9330170B2 (en) * | 2006-05-16 | 2016-05-03 | Sony Corporation | Relating objects in different mediums |
WO2007136825A2 (en) * | 2006-05-19 | 2007-11-29 | Lehman Brothers Inc. | Trust information management system |
US7831617B2 (en) * | 2006-07-25 | 2010-11-09 | Microsoft Corporation | Re-categorization of aggregate data as detail data and automated re-categorization based on data usage context |
US20080033919A1 (en) * | 2006-08-04 | 2008-02-07 | Yan Arrouye | Methods and systems for managing data |
US8104048B2 (en) | 2006-08-04 | 2012-01-24 | Apple Inc. | Browsing or searching user interfaces and other aspects |
US20080066067A1 (en) * | 2006-09-07 | 2008-03-13 | Cognos Incorporated | Enterprise performance management software system having action-based data capture |
US9202184B2 (en) * | 2006-09-07 | 2015-12-01 | International Business Machines Corporation | Optimizing the selection, verification, and deployment of expert resources in a time of chaos |
US20080294459A1 (en) * | 2006-10-03 | 2008-11-27 | International Business Machines Corporation | Health Care Derivatives as a Result of Real Time Patient Analytics |
US8055603B2 (en) * | 2006-10-03 | 2011-11-08 | International Business Machines Corporation | Automatic generation of new rules for processing synthetic events using computer-based learning processes |
US8145582B2 (en) * | 2006-10-03 | 2012-03-27 | International Business Machines Corporation | Synthetic events for real time patient analysis |
US8204831B2 (en) * | 2006-11-13 | 2012-06-19 | International Business Machines Corporation | Post-anonymous fuzzy comparisons without the use of pre-anonymization variants |
US7649853B1 (en) * | 2007-01-22 | 2010-01-19 | Narus, Inc. | Method for keeping and searching network history for corrective and preventive measures |
US20080201303A1 (en) * | 2007-02-20 | 2008-08-21 | International Business Machines Corporation | Method and system for a wizard based complex filter with realtime feedback |
US7970759B2 (en) | 2007-02-26 | 2011-06-28 | International Business Machines Corporation | System and method for deriving a hierarchical event based database optimized for pharmaceutical analysis |
US7853611B2 (en) | 2007-02-26 | 2010-12-14 | International Business Machines Corporation | System and method for deriving a hierarchical event based database having action triggers based on inferred probabilities |
US7792774B2 (en) | 2007-02-26 | 2010-09-07 | International Business Machines Corporation | System and method for deriving a hierarchical event based database optimized for analysis of chaotic events |
US8086593B2 (en) * | 2007-03-01 | 2011-12-27 | Microsoft Corporation | Dynamic filters for relational query processing |
US7720837B2 (en) * | 2007-03-15 | 2010-05-18 | International Business Machines Corporation | System and method for multi-dimensional aggregation over large text corpora |
US8782075B2 (en) * | 2007-05-08 | 2014-07-15 | Paraccel Llc | Query handling in databases with replicated data |
US8185839B2 (en) * | 2007-06-09 | 2012-05-22 | Apple Inc. | Browsing or searching user interfaces and other aspects |
US8201096B2 (en) | 2007-06-09 | 2012-06-12 | Apple Inc. | Browsing or searching user interfaces and other aspects |
US7765216B2 (en) * | 2007-06-15 | 2010-07-27 | Microsoft Corporation | Multidimensional analysis tool for high dimensional data |
US7747988B2 (en) * | 2007-06-15 | 2010-06-29 | Microsoft Corporation | Software feature usage analysis and reporting |
US7870114B2 (en) * | 2007-06-15 | 2011-01-11 | Microsoft Corporation | Efficient data infrastructure for high dimensional data analysis |
US7739666B2 (en) * | 2007-06-15 | 2010-06-15 | Microsoft Corporation | Analyzing software users with instrumentation data and user group modeling and analysis |
EP2026264A2 (en) | 2007-08-17 | 2009-02-18 | Searete LLC | Effectively documenting irregularities in a responsive user's environment |
US7930262B2 (en) * | 2007-10-18 | 2011-04-19 | International Business Machines Corporation | System and method for the longitudinal analysis of education outcomes using cohort life cycles, cluster analytics-based cohort analysis, and probabilistic data schemas |
US9058337B2 (en) * | 2007-10-22 | 2015-06-16 | Apple Inc. | Previewing user interfaces and other aspects |
US9292567B2 (en) | 2007-12-12 | 2016-03-22 | Oracle International Corporation | Bulk matching with update |
US8943057B2 (en) * | 2007-12-12 | 2015-01-27 | Oracle America, Inc. | Method and system for distributed bulk matching and loading |
US7779051B2 (en) * | 2008-01-02 | 2010-08-17 | International Business Machines Corporation | System and method for optimizing federated and ETL'd databases with considerations of specialized data structures within an environment having multidimensional constraints |
US9477702B1 (en) | 2009-01-22 | 2016-10-25 | Joviandata, Inc. | Apparatus and method for accessing materialized and non-materialized values in a shared nothing system |
US9177079B1 (en) * | 2009-01-22 | 2015-11-03 | Joviandata, Inc. | Apparatus and method for processing multi-dimensional queries in a shared nothing system through tree reduction |
US8838652B2 (en) * | 2008-03-18 | 2014-09-16 | Novell, Inc. | Techniques for application data scrubbing, reporting, and analysis |
US8458285B2 (en) | 2008-03-20 | 2013-06-04 | Post Dahl Co. Limited Liability Company | Redundant data forwarding storage |
US9203928B2 (en) | 2008-03-20 | 2015-12-01 | Callahan Cellular L.L.C. | Data storage and retrieval |
WO2009120617A2 (en) | 2008-03-24 | 2009-10-01 | Jda Software, Inc. | Linking discrete dimensions to enhance dimensional analysis |
US8121858B2 (en) * | 2008-03-24 | 2012-02-21 | International Business Machines Corporation | Optimizing pharmaceutical treatment plans across multiple dimensions |
US8195712B1 (en) | 2008-04-17 | 2012-06-05 | Lattice Engines, Inc. | Lattice data set-based methods and apparatus for information storage and retrieval |
US9659073B2 (en) * | 2008-06-18 | 2017-05-23 | Oracle International Corporation | Techniques to extract and flatten hierarchies |
US20100036873A1 (en) * | 2008-08-05 | 2010-02-11 | Richard Bruce Diehl | Processing Metadata Along With Alphanumeric Data |
US9727628B2 (en) * | 2008-08-11 | 2017-08-08 | Innography, Inc. | System and method of applying globally unique identifiers to relate distributed data sources |
US8463739B2 (en) * | 2008-08-28 | 2013-06-11 | Red Hat, Inc. | Systems and methods for generating multi-population statistical measures using middleware |
US8495007B2 (en) * | 2008-08-28 | 2013-07-23 | Red Hat, Inc. | Systems and methods for hierarchical aggregation of multi-dimensional data sources |
US7970728B2 (en) * | 2008-10-23 | 2011-06-28 | International Business Machines Corporation | Dynamically building and populating data marts with data stored in repositories |
US8170931B2 (en) | 2008-10-28 | 2012-05-01 | Dell Products L.P. | Configuring user-customized services for networked devices |
US8255246B2 (en) * | 2008-10-31 | 2012-08-28 | Demandtec, Inc. | Method and apparatus for creating compound due-to reports |
US8244575B2 (en) * | 2008-10-31 | 2012-08-14 | Demandtec, Inc. | Method and apparatus for creating due-to reports for activities that may not have reference value |
US20100114658A1 (en) * | 2008-10-31 | 2010-05-06 | M-Factor, Inc. | Method and apparatus for creating a consistent hierarchy of decomposition of a business metric |
US8209216B2 (en) * | 2008-10-31 | 2012-06-26 | Demandtec, Inc. | Method and apparatus for configurable model-independent decomposition of a business metric |
US9613123B2 (en) * | 2009-04-13 | 2017-04-04 | Hewlett Packard Enterprise Development Lp | Data stream processing |
US8793701B2 (en) * | 2009-05-26 | 2014-07-29 | Business Objects Software Limited | Method and system for data reporting and analysis |
US8566341B2 (en) * | 2009-11-12 | 2013-10-22 | Oracle International Corporation | Continuous aggregation on a data grid |
US8543535B2 (en) * | 2010-02-24 | 2013-09-24 | Oracle International Corporation | Generation of star schemas from snowflake schemas containing a large number of dimensions |
US8447754B2 (en) | 2010-04-19 | 2013-05-21 | Salesforce.Com, Inc. | Methods and systems for optimizing queries in a multi-tenant store |
US10162851B2 (en) * | 2010-04-19 | 2018-12-25 | Salesforce.Com, Inc. | Methods and systems for performing cross store joins in a multi-tenant store |
US9535965B2 (en) | 2010-05-28 | 2017-01-03 | Oracle International Corporation | System and method for specifying metadata extension input for extending data warehouse |
CN102314460B (en) * | 2010-07-07 | 2014-05-14 | 阿里巴巴集团控股有限公司 | Data analysis method and system and servers |
US8817053B2 (en) | 2010-09-30 | 2014-08-26 | Apple Inc. | Methods and systems for opening a file |
US10318877B2 (en) | 2010-10-19 | 2019-06-11 | International Business Machines Corporation | Cohort-based prediction of a future event |
US9292575B2 (en) * | 2010-11-19 | 2016-03-22 | International Business Machines Corporation | Dynamic data aggregation from a plurality of data sources |
US8996463B2 (en) | 2012-07-26 | 2015-03-31 | Mongodb, Inc. | Aggregation framework system architecture and method |
US10997211B2 (en) | 2010-12-23 | 2021-05-04 | Mongodb, Inc. | Systems and methods for database zone sharding and API integration |
US11615115B2 (en) | 2010-12-23 | 2023-03-28 | Mongodb, Inc. | Systems and methods for managing distributed database deployments |
US9740762B2 (en) | 2011-04-01 | 2017-08-22 | Mongodb, Inc. | System and method for optimizing data migration in a partitioned database |
US10262050B2 (en) | 2015-09-25 | 2019-04-16 | Mongodb, Inc. | Distributed database systems and methods with pluggable storage engines |
US10346430B2 (en) | 2010-12-23 | 2019-07-09 | Mongodb, Inc. | System and method for determining consensus within a distributed database |
US10614098B2 (en) | 2010-12-23 | 2020-04-07 | Mongodb, Inc. | System and method for determining consensus within a distributed database |
US11544288B2 (en) | 2010-12-23 | 2023-01-03 | Mongodb, Inc. | Systems and methods for managing distributed database deployments |
US10713280B2 (en) | 2010-12-23 | 2020-07-14 | Mongodb, Inc. | Systems and methods for managing distributed database deployments |
US8572031B2 (en) | 2010-12-23 | 2013-10-29 | Mongodb, Inc. | Method and apparatus for maintaining replica sets |
US10740353B2 (en) | 2010-12-23 | 2020-08-11 | Mongodb, Inc. | Systems and methods for managing distributed database deployments |
US9881034B2 (en) | 2015-12-15 | 2018-01-30 | Mongodb, Inc. | Systems and methods for automating management of distributed databases |
US10366100B2 (en) * | 2012-07-26 | 2019-07-30 | Mongodb, Inc. | Aggregation framework system architecture and method |
US10977277B2 (en) | 2010-12-23 | 2021-04-13 | Mongodb, Inc. | Systems and methods for database zone sharding and API integration |
US9805108B2 (en) | 2010-12-23 | 2017-10-31 | Mongodb, Inc. | Large distributed database clustering systems and methods |
WO2012095839A2 (en) | 2011-01-10 | 2012-07-19 | Optier Ltd. | Systems and methods for performing online analytical processing |
US8473507B2 (en) | 2011-01-14 | 2013-06-25 | Apple Inc. | Tokenized search suggestions |
US9355145B2 (en) | 2011-01-25 | 2016-05-31 | Hewlett Packard Enterprise Development Lp | User defined function classification in analytical data processing systems |
CN103262076A (en) * | 2011-01-25 | 2013-08-21 | 惠普发展公司,有限责任合伙企业 | Analytical data processing |
US9229984B2 (en) | 2011-01-25 | 2016-01-05 | Hewlett Packard Enterprise Development Lp | Parameter expressions for modeling user defined function execution in analytical data processing systems |
US8856151B2 (en) | 2011-01-25 | 2014-10-07 | Hewlett-Packard Development Company, L.P. | Output field mapping of user defined functions in databases |
CN102226897A (en) * | 2011-05-13 | 2011-10-26 | 南京烽火星空通信发展有限公司 | Comprehensive indexing and querying method and device |
US20130042008A1 (en) | 2011-08-12 | 2013-02-14 | Splunk Inc. | Elastic scaling of data volume |
CN103930887B (en) | 2011-11-18 | 2017-11-07 | 惠普发展公司,有限责任合伙企业 | The inquiry stored using raw column data collects generation |
US9886474B2 (en) | 2011-11-22 | 2018-02-06 | Microsoft Technology Licensing, Llc | Multidimensional grouping operators |
US9348874B2 (en) * | 2011-12-23 | 2016-05-24 | Sap Se | Dynamic recreation of multidimensional analytical data |
CN103294525A (en) | 2012-02-27 | 2013-09-11 | 国际商业机器公司 | Method and system for inquiring database with user defined function |
US10872095B2 (en) | 2012-07-26 | 2020-12-22 | Mongodb, Inc. | Aggregation framework system architecture and method |
US11544284B2 (en) | 2012-07-26 | 2023-01-03 | Mongodb, Inc. | Aggregation framework system architecture and method |
US11403317B2 (en) | 2012-07-26 | 2022-08-02 | Mongodb, Inc. | Aggregation framework system architecture and method |
US8812488B2 (en) | 2012-08-16 | 2014-08-19 | Oracle International Corporation | Constructing multidimensional histograms for complex spatial geometry objects |
US9507825B2 (en) | 2012-09-28 | 2016-11-29 | Oracle International Corporation | Techniques for partition pruning based on aggregated zone map information |
US9430550B2 (en) | 2012-09-28 | 2016-08-30 | Oracle International Corporation | Clustering a table in a relational database management system |
US8996544B2 (en) | 2012-09-28 | 2015-03-31 | Oracle International Corporation | Pruning disk blocks of a clustered table in a relational database management system |
CN103714086A (en) | 2012-09-29 | 2014-04-09 | 国际商业机器公司 | Method and device used for generating non-relational data base module |
US9633076B1 (en) * | 2012-10-15 | 2017-04-25 | Tableau Software Inc. | Blending and visualizing data from multiple data sources |
US9471628B2 (en) | 2013-03-04 | 2016-10-18 | Mastercard International Incorporated | Methods and systems for calculating and retrieving analytic data |
US10642837B2 (en) | 2013-03-15 | 2020-05-05 | Oracle International Corporation | Relocating derived cache during data rebalance to maintain application performance |
CN103235793A (en) * | 2013-04-01 | 2013-08-07 | 华为技术有限公司 | On-line data processing method, equipment and system |
US9390162B2 (en) | 2013-04-25 | 2016-07-12 | International Business Machines Corporation | Management of a database system |
US10275484B2 (en) * | 2013-07-22 | 2019-04-30 | International Business Machines Corporation | Managing sparsity in a multidimensional data structure |
US9317529B2 (en) | 2013-08-14 | 2016-04-19 | Oracle International Corporation | Memory-efficient spatial histogram construction |
CN104376006A (en) * | 2013-08-14 | 2015-02-25 | 沈阳中科博微自动化技术有限公司 | Technical method for integrating data of multiple regional devices of integrated circuit production line |
US9740718B2 (en) | 2013-09-20 | 2017-08-22 | Oracle International Corporation | Aggregating dimensional data using dense containers |
US9836519B2 (en) | 2013-09-20 | 2017-12-05 | Oracle International Corporation | Densely grouping dimensional data |
US9990398B2 (en) | 2013-09-20 | 2018-06-05 | Oracle International Corporation | Inferring dimensional metadata from content of a query |
US10210197B2 (en) * | 2013-10-18 | 2019-02-19 | New York Air Brake Corporation | Dynamically scalable distributed heterogenous platform relational database |
US20150112953A1 (en) * | 2013-10-22 | 2015-04-23 | Omnition Analytics, LLC | Expandable method and system for storing and using fact data structure for use with dimensional data structure |
US9396246B2 (en) | 2013-11-08 | 2016-07-19 | International Business Machines Corporation | Reporting and summarizing metrics in sparse relationships on an OLTP database |
US9547834B2 (en) | 2014-01-08 | 2017-01-17 | Bank Of America Corporation | Transaction performance monitoring |
US9992090B2 (en) | 2014-01-08 | 2018-06-05 | Bank Of America Corporation | Data metrics analytics |
US9442996B2 (en) | 2014-01-15 | 2016-09-13 | International Business Machines Corporation | Enabling collaborative development of a database application across multiple database management systems |
US9348870B2 (en) | 2014-02-06 | 2016-05-24 | International Business Machines Corporation | Searching content managed by a search engine using relational database type queries |
US9530226B2 (en) * | 2014-02-18 | 2016-12-27 | Par Technology Corporation | Systems and methods for optimizing N dimensional volume data for transmission |
US9373263B2 (en) | 2014-02-19 | 2016-06-21 | Pearson Education, Inc. | Dynamic and individualized scheduling engine for app-based learning |
US9368042B2 (en) * | 2014-02-19 | 2016-06-14 | Pearson Education, Inc. | Educational-app engine for representing conceptual understanding using student populations' electronic response latencies |
WO2015148739A1 (en) * | 2014-03-26 | 2015-10-01 | Systems Imagination, Inc. | System and methods for data integration in n-dimensional space |
US9619769B2 (en) * | 2014-04-01 | 2017-04-11 | Sap Se | Operational leading indicator (OLI) management using in-memory database |
US10635645B1 (en) * | 2014-05-04 | 2020-04-28 | Veritas Technologies Llc | Systems and methods for maintaining aggregate tables in databases |
GB2541616A (en) * | 2014-06-30 | 2017-02-22 | Cronus Consulting Group Pty Ltd | Data processing system and method for financial or non-financial data |
GB2531537A (en) | 2014-10-21 | 2016-04-27 | Ibm | Database Management system and method of operation |
EP3040284B1 (en) * | 2014-12-31 | 2017-12-06 | Airbus Group SAS | Device for recovering thermal energy dissipated by a satellite placed in a vacuum |
US20180075117A1 (en) * | 2015-03-17 | 2018-03-15 | Matthew E. Wong | System and method of providing a platform for enabling drill-down analysis of tabular data |
JP5847344B1 (en) * | 2015-03-24 | 2016-01-20 | 株式会社ギックス | Data processing system, data processing method, program, and computer storage medium |
US10262024B1 (en) | 2015-05-19 | 2019-04-16 | Amazon Technologies, Inc. | Providing consistent access to data objects transcending storage limitations in a non-relational data store |
US10713275B2 (en) | 2015-07-02 | 2020-07-14 | Mongodb, Inc. | System and method for augmenting consensus election in a distributed database |
US10673623B2 (en) | 2015-09-25 | 2020-06-02 | Mongodb, Inc. | Systems and methods for hierarchical key management in encrypted distributed databases |
US10394822B2 (en) | 2015-09-25 | 2019-08-27 | Mongodb, Inc. | Systems and methods for data conversion and comparison |
US10846411B2 (en) | 2015-09-25 | 2020-11-24 | Mongodb, Inc. | Distributed database systems and methods with encrypted storage engines |
US10423626B2 (en) | 2015-09-25 | 2019-09-24 | Mongodb, Inc. | Systems and methods for data conversion and comparison |
US10678792B2 (en) | 2015-10-23 | 2020-06-09 | Oracle International Corporation | Parallel execution of queries with a recursive clause |
US10642831B2 (en) | 2015-10-23 | 2020-05-05 | Oracle International Corporation | Static data caching for queries with a clause that requires multiple iterations to execute |
US10783142B2 (en) | 2015-10-23 | 2020-09-22 | Oracle International Corporation | Efficient data retrieval in staged use of in-memory cursor duration temporary tables |
KR101706252B1 (en) * | 2016-02-29 | 2017-02-13 | 주식회사 티맥스데이터 | Method, server and computer program stored in computer readable medium for synchronizing query result |
US10164990B2 (en) | 2016-03-11 | 2018-12-25 | Bank Of America Corporation | Security test tool |
US10671496B2 (en) | 2016-05-31 | 2020-06-02 | Mongodb, Inc. | Method and apparatus for reading and writing committed data |
US10621050B2 (en) | 2016-06-27 | 2020-04-14 | Mongodb, Inc. | Method and apparatus for restoring data from snapshots |
US10558659B2 (en) | 2016-09-16 | 2020-02-11 | Oracle International Corporation | Techniques for dictionary based join and aggregation |
US11086895B2 (en) | 2017-05-09 | 2021-08-10 | Oracle International Corporation | System and method for providing a hybrid set-based extract, load, and transformation of data |
US10866868B2 (en) | 2017-06-20 | 2020-12-15 | Mongodb, Inc. | Systems and methods for optimization of database operations |
KR102277728B1 (en) * | 2017-07-31 | 2021-07-14 | 삼성전자주식회사 | A system and method for data storage, and a method of manufacturing a ssd using this |
US10853349B2 (en) | 2017-08-09 | 2020-12-01 | Vmware, Inc. | Event based analytics database synchronization |
US10489225B2 (en) | 2017-08-10 | 2019-11-26 | Bank Of America Corporation | Automatic resource dependency tracking and structure for maintenance of resource fault propagation |
US10909134B2 (en) | 2017-09-01 | 2021-02-02 | Oracle International Corporation | System and method for client-side calculation in a multidimensional database environment |
US11687567B2 (en) * | 2017-09-21 | 2023-06-27 | Vmware, Inc. | Trigger based analytics database synchronization |
US11086876B2 (en) | 2017-09-29 | 2021-08-10 | Oracle International Corporation | Storing derived summaries on persistent memory of a storage device |
US11620315B2 (en) | 2017-10-09 | 2023-04-04 | Tableau Software, Inc. | Using an object model of heterogeneous data to facilitate building data visualizations |
CN108551478B (en) | 2018-03-29 | 2020-12-18 | 中国银联股份有限公司 | Transaction processing method, server and transaction processing system |
US10838964B2 (en) * | 2018-03-30 | 2020-11-17 | International Business Machines Corporation | Supporting a join operation against multiple NoSQL databases |
US10740333B1 (en) * | 2018-06-27 | 2020-08-11 | Cloudera, Inc. | Apparatus and method for accelerated query processing using eager aggregation and analytical view matching |
WO2020019000A1 (en) * | 2018-07-20 | 2020-01-23 | Benanav Dan | Automatic object inference in a database system |
US11537276B2 (en) | 2018-10-22 | 2022-12-27 | Tableau Software, Inc. | Generating data visualizations according to an object model of selected data sources |
US11966406B2 (en) | 2018-10-22 | 2024-04-23 | Tableau Software, Inc. | Utilizing appropriate measure aggregation for generating data visualizations of multi-fact datasets |
US10996835B1 (en) | 2018-12-14 | 2021-05-04 | Tableau Software, Inc. | Data preparation user interface with coordinated pivots |
CN110413620A (en) * | 2019-07-31 | 2019-11-05 | 四川长虹电器股份有限公司 | Visualized data structure configuration method and system |
US11138204B2 (en) | 2019-08-02 | 2021-10-05 | Salesforce.Com, Inc. | Metric determination for an interaction data stream using multiple databases |
US11256709B2 (en) | 2019-08-15 | 2022-02-22 | Clinicomp International, Inc. | Method and system for adapting programs for interoperability and adapters therefor |
US10657018B1 (en) * | 2019-08-26 | 2020-05-19 | Coupang Corp. | Systems and methods for dynamic aggregation of data and minimization of data loss |
US11222018B2 (en) | 2019-09-09 | 2022-01-11 | Oracle International Corporation | Cache conscious techniques for generation of quasi-dense grouping codes of compressed columnar data in relational database systems |
US11126401B2 (en) | 2019-09-18 | 2021-09-21 | Bank Of America Corporation | Pluggable sorting for distributed databases |
US11016978B2 (en) * | 2019-09-18 | 2021-05-25 | Bank Of America Corporation | Joiner for distributed databases |
US11030256B2 (en) | 2019-11-05 | 2021-06-08 | Tableau Software, Inc. | Methods and user interfaces for visually analyzing data visualizations with multi-row calculations |
US10997217B1 (en) | 2019-11-10 | 2021-05-04 | Tableau Software, Inc. | Systems and methods for visualizing object models of database tables |
US11366858B2 (en) | 2019-11-10 | 2022-06-21 | Tableau Software, Inc. | Data preparation using semantic roles |
US11281668B1 (en) | 2020-06-18 | 2022-03-22 | Tableau Software, LLC | Optimizing complex database queries using query fusion |
US11899665B2 (en) * | 2020-11-20 | 2024-02-13 | AtScale, Inc. | Data aggregation and pre-positioning for multi-store queries |
CN113641669B (en) * | 2021-06-30 | 2023-08-01 | 北京邮电大学 | Multi-dimensional data query method and device based on hybrid engine |
WO2024063757A1 (en) * | 2022-09-20 | 2024-03-28 | Rakuten Mobile, Inc. | Inventory management system for managing functions, resources and services of a telecommunications network |
Family Cites Families (277)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US77107A (en) * | 1868-04-21 | Improvement in wkenohes | ||
US76983A (en) * | 1868-04-21 | Byron boahdman | ||
US236655A (en) * | 1881-01-11 | Post-hole | ||
US114243A (en) * | 1871-04-25 | Improvement in mirror-reflectors | ||
US4590465A (en) | 1982-02-18 | 1986-05-20 | Henry Fuchs | Graphics display system using logic-enhanced pixel memory cells |
US4598400A (en) | 1983-05-31 | 1986-07-01 | Thinking Machines Corporation | Method and apparatus for routing message packets |
US4641351A (en) * | 1984-07-25 | 1987-02-03 | Preston Jr Kendall | Logical transform image processor |
US6545589B1 (en) | 1984-09-14 | 2003-04-08 | Aspect Communications Corporation | Method and apparatus for managing telecommunications |
US4685144A (en) | 1984-10-29 | 1987-08-04 | Environmental Research Institute Of Michigan | Image processing system with transformation detection |
JPS61220027A (en) | 1985-03-27 | 1986-09-30 | Hitachi Ltd | Information memory system |
US5696916A (en) | 1985-03-27 | 1997-12-09 | Hitachi, Ltd. | Information storage and retrieval system and display method therefor |
US5404506A (en) | 1985-03-27 | 1995-04-04 | Hitachi, Ltd. | Knowledge based information retrieval system |
US5553226A (en) | 1985-03-27 | 1996-09-03 | Hitachi, Ltd. | System for displaying concept networks |
JPH0814795B2 (en) | 1986-01-14 | 1996-02-14 | 株式会社日立製作所 | Multiprocessor virtual computer system |
US6182062B1 (en) | 1986-03-26 | 2001-01-30 | Hitachi, Ltd. | Knowledge based information retrieval system |
US4814980A (en) * | 1986-04-01 | 1989-03-21 | California Institute Of Technology | Concurrent hypercube system with improved message passing |
CA1258923A (en) * | 1986-04-14 | 1989-08-29 | Robert A. Drebin | Methods and apparatus for imaging volume data |
US5189608A (en) * | 1987-06-01 | 1993-02-23 | Imrs Operations, Inc. | Method and apparatus for storing and generating financial information employing user specified input and output formats |
US4989141A (en) * | 1987-06-01 | 1991-01-29 | Corporate Class Software | Computer system for financial analyses and reporting |
JPH01123320A (en) | 1987-10-30 | 1989-05-16 | Internatl Business Mach Corp <Ibm> | Method and apparatus for forming searching command |
US5055999A (en) * | 1987-12-22 | 1991-10-08 | Kendall Square Research Corporation | Multiprocessor digital data processing system |
US5222237A (en) | 1988-02-02 | 1993-06-22 | Thinking Machines Corporation | Apparatus for aligning the operation of a plurality of processors |
US5089985A (en) | 1988-04-07 | 1992-02-18 | International Business Machines Corporation | System and method for performing a sort operation in a relational database manager to pass results directly to a user without writing to disk |
US5202985A (en) | 1988-04-14 | 1993-04-13 | Racal-Datacom, Inc. | Apparatus and method for displaying data communication network configuration after searching the network |
US4987554A (en) * | 1988-08-24 | 1991-01-22 | The Research Foundation Of State University Of New York | Method of converting continuous three-dimensional geometrical representations of polygonal objects into discrete three-dimensional voxel-based representations thereof within a three-dimensional voxel-based system |
US4985856A (en) | 1988-11-10 | 1991-01-15 | The Research Foundation Of State University Of New York | Method and apparatus for storing, accessing, and processing voxel-based data |
US4985834A (en) * | 1988-11-22 | 1991-01-15 | General Electric Company | System and method employing pipelined parallel circuit architecture for displaying surface structures of the interior region of a solid body |
SE466029B (en) * | 1989-03-06 | 1991-12-02 | Ibm Svenska Ab | DEVICE AND PROCEDURE FOR ANALYSIS OF NATURAL LANGUAGES IN A COMPUTER-BASED INFORMATION PROCESSING SYSTEM |
US5101475A (en) * | 1989-04-17 | 1992-03-31 | The Research Foundation Of State University Of New York | Method and apparatus for generating arbitrary projections of three-dimensional voxel-based data |
US5197005A (en) * | 1989-05-01 | 1993-03-23 | Intelligent Business Systems | Database retrieval system having a natural language interface |
US5280474A (en) * | 1990-01-05 | 1994-01-18 | Maspar Computer Corporation | Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays |
US5257365A (en) | 1990-03-16 | 1993-10-26 | Powers Frederick A | Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records |
US5278966A (en) * | 1990-06-29 | 1994-01-11 | The United States Of America As Represented By The Secretary Of The Navy | Toroidal computer memory for serial and parallel processors |
US5293615A (en) | 1990-11-16 | 1994-03-08 | Amada Carlos A | Point and shoot interface for linking database records to spreadsheets whereby data of a record is automatically reformatted and loaded upon issuance of a recalculation command |
US5379419A (en) | 1990-12-07 | 1995-01-03 | Digital Equipment Corporation | Methods and apparatus for accesssing non-relational data files using relational queries |
US5299321A (en) * | 1990-12-18 | 1994-03-29 | Oki Electric Industry Co., Ltd. | Parallel processing device to operate with parallel execute instructions |
US5307484A (en) | 1991-03-06 | 1994-04-26 | Chrysler Corporation | Relational data base repository system for managing functional and physical data structures of nodes and links of multiple computer networks |
US5222216A (en) | 1991-07-12 | 1993-06-22 | Thinking Machines Corporation | High performance communications interface for multiplexing a plurality of computers to a high performance point to point communications bus |
US5297280A (en) * | 1991-08-07 | 1994-03-22 | Occam Research Corporation | Automatically retrieving queried data by extracting query dimensions and modifying the dimensions if an extract match does not occur |
US5359724A (en) | 1992-03-30 | 1994-10-25 | Arbor Software Corporation | Method and apparatus for storing and retrieving multi-dimensional data in computer memory |
US5361385A (en) * | 1992-08-26 | 1994-11-01 | Reuven Bakalash | Parallel computing system for volumetric modeling, data processing and visualization |
US5867501A (en) | 1992-12-17 | 1999-02-02 | Tandem Computers Incorporated | Encoding for communicating data and commands |
US5805885A (en) | 1992-12-24 | 1998-09-08 | Microsoft Corporation | Method and system for aggregating objects |
US5918225A (en) * | 1993-04-16 | 1999-06-29 | Sybase, Inc. | SQL-based database system with improved indexing methodology |
US5852821A (en) * | 1993-04-16 | 1998-12-22 | Sybase, Inc. | High-speed data base query method and apparatus |
US5794229A (en) | 1993-04-16 | 1998-08-11 | Sybase, Inc. | Database system with methodology for storing a database table by vertically partitioning all columns of the table |
US5794228A (en) | 1993-04-16 | 1998-08-11 | Sybase, Inc. | Database system with buffer manager providing per page native data compression and decompression |
US5519859A (en) | 1993-11-15 | 1996-05-21 | Grace; John A. | Method and apparatus for automatic table selection and generation of structured query language instructions |
US5410693A (en) | 1994-01-26 | 1995-04-25 | Wall Data Incorporated | Method and apparatus for accessing a database |
US5742806A (en) | 1994-01-31 | 1998-04-21 | Sun Microsystems, Inc. | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
US5706503A (en) | 1994-05-18 | 1998-01-06 | Etak Inc | Method of clustering multi-dimensional related data in a computer database by combining the two verticles of a graph connected by an edge having the highest score |
US5537589A (en) * | 1994-06-30 | 1996-07-16 | Microsoft Corporation | Method and system for efficiently performing database table aggregation using an aggregation index |
US5915257A (en) * | 1994-10-11 | 1999-06-22 | Brio Technology, Inc. | Cross tab analysis and reporting method |
CA2176165A1 (en) | 1995-05-19 | 1996-11-20 | Hosagrahar Visvesvaraya Jagadish | Method for querying incrementally maintained transactional databases |
US5701451A (en) | 1995-06-07 | 1997-12-23 | International Business Machines Corporation | Method for fulfilling requests of a web browser |
US6047323A (en) * | 1995-10-19 | 2000-04-04 | Hewlett-Packard Company | Creation and migration of distributed streams in clusters of networked computers |
FR2740884B1 (en) | 1995-11-03 | 1997-12-19 | Bull Sa | ADMINISTRATOR INTERFACE FOR A DATABASE IN A DISTRIBUTED COMPUTING ENVIRONMENT |
US5761652A (en) | 1996-03-20 | 1998-06-02 | International Business Machines Corporation | Constructing balanced multidimensional range-based bitmap indices |
US5832475A (en) | 1996-03-29 | 1998-11-03 | International Business Machines Corporation | Database system and method employing data cube operator for group-by operations |
JP3952518B2 (en) * | 1996-03-29 | 2007-08-01 | 株式会社日立製作所 | Multidimensional data processing method |
US5901287A (en) * | 1996-04-01 | 1999-05-04 | The Sabre Group Inc. | Information aggregation and synthesization system |
US6041103A (en) * | 1996-04-16 | 2000-03-21 | Lucent Technologies, Inc. | Interactive call identification |
US5999192A (en) | 1996-04-30 | 1999-12-07 | Lucent Technologies Inc. | Interactive data exploration apparatus and methods |
US5857184A (en) | 1996-05-03 | 1999-01-05 | Walden Media, Inc. | Language and method for creating, organizing, and retrieving data from a database |
US5706495A (en) | 1996-05-07 | 1998-01-06 | International Business Machines Corporation | Encoded-vector indices for decision support and warehousing |
US5765028A (en) | 1996-05-07 | 1998-06-09 | Ncr Corporation | Method and apparatus for providing neural intelligence to a mail query agent in an online analytical processing system |
US5721910A (en) | 1996-06-04 | 1998-02-24 | Exxon Research And Engineering Company | Relational database system containing a multidimensional hierachical model of interrelated subject categories with recognition capabilities |
US5767854A (en) | 1996-09-27 | 1998-06-16 | Anwar; Mohammed S. | Multidimensional data display and manipulation system and methods for using same |
US5848424A (en) * | 1996-11-18 | 1998-12-08 | Toptier Software, Inc. | Data navigator interface with navigation as a function of draggable elements and drop targets |
US5799300A (en) | 1996-12-12 | 1998-08-25 | International Business Machines Corporations | Method and system for performing range-sum queries on a data cube |
US5822751A (en) | 1996-12-16 | 1998-10-13 | Microsoft Corporation | Efficient multidimensional data aggregation operator implementation |
US5850547A (en) | 1997-01-08 | 1998-12-15 | Oracle Corporation | Method and apparatus for parallel processing aggregates using intermediate aggregate values |
US6034697A (en) * | 1997-01-13 | 2000-03-07 | Silicon Graphics, Inc. | Interpolation between relational tables for purposes of animating a data visualization |
US5852819A (en) | 1997-01-30 | 1998-12-22 | Beller; Stephen E. | Flexible, modular electronic element patterning method and apparatus for compiling, processing, transmitting, and reporting data and information |
US5884299A (en) * | 1997-02-06 | 1999-03-16 | Ncr Corporation | Optimization of SQL queries involving aggregate expressions using a plurality of local and global aggregation operations |
US5926820A (en) | 1997-02-27 | 1999-07-20 | International Business Machines Corporation | Method and system for performing range max/min queries on a data cube |
JPH10333953A (en) | 1997-04-01 | 1998-12-18 | Kokusai Zunou Sangyo Kk | Integrated data base system and computer-readable recording medium recording program for managing its data base structure |
JP3155991B2 (en) * | 1997-04-09 | 2001-04-16 | 日本アイ・ビー・エム株式会社 | Aggregate operation execution method and computer system |
US5978788A (en) | 1997-04-14 | 1999-11-02 | International Business Machines Corporation | System and method for generating multi-representations of a data cube |
US6182060B1 (en) | 1997-04-15 | 2001-01-30 | Robert Hedgcock | Method and apparatus for storing, retrieving, and processing multi-dimensional customer-oriented data sets |
US5794246A (en) * | 1997-04-30 | 1998-08-11 | Informatica Corporation | Method for incremental aggregation of dynamically increasing database data sets |
US5946692A (en) | 1997-05-08 | 1999-08-31 | At & T Corp | Compressed representation of a data base that permits AD HOC querying |
US5890151A (en) | 1997-05-09 | 1999-03-30 | International Business Machines Corporation | Method and system for performing partial-sum queries on a data cube |
US6115705A (en) * | 1997-05-19 | 2000-09-05 | Microsoft Corporation | Relational database system and method for query processing using early aggregation |
US6324623B1 (en) | 1997-05-30 | 2001-11-27 | Oracle Corporation | Computing system for implementing a shared cache |
US6078994A (en) | 1997-05-30 | 2000-06-20 | Oracle Corporation | System for maintaining a shared cache in a multi-threaded computer environment |
US5946711A (en) | 1997-05-30 | 1999-08-31 | Oracle Corporation | System for locking data in a shared cache |
US6209036B1 (en) | 1997-06-06 | 2001-03-27 | International Business Machines Corporation | Management of and access to information and other material via the world wide web in an LDAP environment |
US5890154A (en) | 1997-06-06 | 1999-03-30 | International Business Machines Corp. | Merging database log files through log transformations |
US6397195B1 (en) * | 1997-06-27 | 2002-05-28 | Hyperion Solutions Corporation | System for managing accounting information in a multi-dimensional database |
US6205447B1 (en) * | 1997-06-30 | 2001-03-20 | International Business Machines Corporation | Relational database management of multi-dimensional data |
US5940818A (en) * | 1997-06-30 | 1999-08-17 | International Business Machines Corporation | Attribute-based access for multi-dimensional databases |
US5926818A (en) | 1997-06-30 | 1999-07-20 | International Business Machines Corporation | Relational database implementation of a multi-dimensional database |
US5978796A (en) * | 1997-06-30 | 1999-11-02 | International Business Machines Corporation | Accessing multi-dimensional data by mapping dense data blocks to rows in a relational database |
US5963936A (en) * | 1997-06-30 | 1999-10-05 | International Business Machines Corporation | Query processing system that computes GROUPING SETS, ROLLUP, and CUBE with a reduced number of GROUP BYs in a query graph model |
US5905985A (en) * | 1997-06-30 | 1999-05-18 | International Business Machines Corporation | Relational database modifications based on multi-dimensional database modifications |
US5943668A (en) * | 1997-06-30 | 1999-08-24 | International Business Machines Corporation | Relational emulation of a multi-dimensional database |
US5999924A (en) | 1997-07-25 | 1999-12-07 | Amazon.Com, Inc. | Method and apparatus for producing sequenced queries |
US6006216A (en) | 1997-07-29 | 1999-12-21 | Lucent Technologies Inc. | Data architecture for fetch-intensive database applications |
US6073140A (en) | 1997-07-29 | 2000-06-06 | Acxiom Corporation | Method and system for the creation, enhancement and update of remote data using persistent keys |
KR19990015003A (en) * | 1997-08-01 | 1999-03-05 | 윤종용 | Color registration adjustment method in the image forming apparatus |
US5987467A (en) | 1997-08-15 | 1999-11-16 | At&T Corp. | Method of calculating tuples for data cubes |
US6094651A (en) | 1997-08-22 | 2000-07-25 | International Business Machines Corporation | Discovery-driven exploration of OLAP data cubes |
US6003029A (en) | 1997-08-22 | 1999-12-14 | International Business Machines Corporation | Automatic subspace clustering of high dimensional data for data mining applications |
US5995945A (en) | 1997-08-25 | 1999-11-30 | I2 Technologies, Inc. | System and process for inter-domain planning analysis and optimization using model agents as partial replicas of remote domains |
US5940822A (en) * | 1997-08-29 | 1999-08-17 | International Business Machines Corporation | Encoding method of members related by multiple concept or group hierarchies and identification of members in a corpus or a database that are descendants of one or more selected concepts or groups from the encoding |
US6141655A (en) | 1997-09-23 | 2000-10-31 | At&T Corp | Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template |
US5937410A (en) | 1997-10-16 | 1999-08-10 | Johnson Controls Technology Company | Method of transforming graphical object diagrams to product data manager schema |
US5943677A (en) | 1997-10-31 | 1999-08-24 | Oracle Corporation | Sparsity management system for multi-dimensional databases |
US6691118B1 (en) * | 1997-10-31 | 2004-02-10 | Oracle International Corporation | Context management system for modular software architecture |
US6023695A (en) * | 1997-10-31 | 2000-02-08 | Oracle Corporation | Summary table management in a computer system |
US6122628A (en) | 1997-10-31 | 2000-09-19 | International Business Machines Corporation | Multidimensional data clustering and dimension reduction for indexing and searching |
US6134541A (en) * | 1997-10-31 | 2000-10-17 | International Business Machines Corporation | Searching multidimensional indexes using associated clustering and dimension reduction information |
US6023696A (en) * | 1997-10-31 | 2000-02-08 | Oracle Corporation | Summary table query routing |
US6275818B1 (en) | 1997-11-06 | 2001-08-14 | International Business Machines Corporation | Cost based optimization of decision support queries using transient views |
US6014670A (en) | 1997-11-07 | 2000-01-11 | Informatica Corporation | Apparatus and method for performing data transformations in data warehousing |
US6339775B1 (en) * | 1997-11-07 | 2002-01-15 | Informatica Corporation | Apparatus and method for performing data transformations in data warehousing |
US5974416A (en) | 1997-11-10 | 1999-10-26 | Microsoft Corporation | Method of creating a tabular data stream for sending rows of data between client and server |
US6151601A (en) | 1997-11-12 | 2000-11-21 | Ncr Corporation | Computer architecture and method for collecting, analyzing and/or transforming internet and/or electronic commerce data for storage into a data storage area |
US6151584A (en) | 1997-11-20 | 2000-11-21 | Ncr Corporation | Computer architecture and method for validating and collecting and metadata and data about the internet and electronic commerce environments (data discoverer) |
US6934687B1 (en) | 1997-11-20 | 2005-08-23 | Ncr Corporation | Computer architecture and method for supporting and analyzing electronic commerce over the world wide web for commerce service providers and/or internet service providers |
US5918232A (en) * | 1997-11-26 | 1999-06-29 | Whitelight Systems, Inc. | Multidimensional domain modeling method and system |
US6628312B1 (en) | 1997-12-02 | 2003-09-30 | Inxight Software, Inc. | Interactive interface for visualizing and manipulating multi-dimensional data |
US6418450B2 (en) * | 1998-01-26 | 2002-07-09 | International Business Machines Corporation | Data warehouse programs architecture |
US6078924A (en) | 1998-01-30 | 2000-06-20 | Aeneid Corporation | Method and apparatus for performing data collection, interpretation and analysis, in an information platform |
US6003036A (en) | 1998-02-12 | 1999-12-14 | Martin; Michael W. | Interval-partitioning method for multidimensional data |
US6363393B1 (en) * | 1998-02-23 | 2002-03-26 | Ron Ribitzky | Component based object-relational database infrastructure and user interface |
US6601034B1 (en) | 1998-03-05 | 2003-07-29 | American Management Systems, Inc. | Decision management system which is cross-function, cross-industry and cross-platform |
US6430545B1 (en) | 1998-03-05 | 2002-08-06 | American Management Systems, Inc. | Use of online analytical processing (OLAP) in a rules based decision management system |
US6321206B1 (en) | 1998-03-05 | 2001-11-20 | American Management Systems, Inc. | Decision management system for creating strategies to control movement of clients across categories |
US6405173B1 (en) * | 1998-03-05 | 2002-06-11 | American Management Systems, Inc. | Decision management system providing qualitative account/customer assessment via point in time simulation |
US6609120B1 (en) | 1998-03-05 | 2003-08-19 | American Management Systems, Inc. | Decision management system which automatically searches for strategy components in a strategy |
US6546545B1 (en) * | 1998-03-05 | 2003-04-08 | American Management Systems, Inc. | Versioning in a rules based decision management system |
US6115714A (en) | 1998-03-20 | 2000-09-05 | Kenan Systems Corp. | Triggering mechanism for multi-dimensional databases |
US6385301B1 (en) * | 1998-03-26 | 2002-05-07 | Bell Atlantic Services Network, Inc. | Data preparation for traffic track usage measurement |
US7260192B2 (en) * | 1998-03-26 | 2007-08-21 | Verizon Services Corp. | Internet user finder |
US6480842B1 (en) * | 1998-03-26 | 2002-11-12 | Sap Portals, Inc. | Dimension to domain server |
US6775674B1 (en) * | 1998-03-26 | 2004-08-10 | Sap Aktiengesellschaft | Auto completion of relationships between objects in a data model |
US6411681B1 (en) * | 1998-03-26 | 2002-06-25 | Bell Atlantic Network Services, Inc. | Traffic track measurements for analysis of network troubles |
US6441834B1 (en) * | 1998-03-26 | 2002-08-27 | Sap Portals, Inc. | Hyper-relational correlation server |
US6199063B1 (en) * | 1998-03-27 | 2001-03-06 | Red Brick Systems, Inc. | System and method for rewriting relational database queries |
US6594653B2 (en) * | 1998-03-27 | 2003-07-15 | International Business Machines Corporation | Server integrated system and methods for processing precomputed views |
US6078918A (en) | 1998-04-02 | 2000-06-20 | Trivada Corporation | Online predictive memory |
US6125624A (en) | 1998-04-17 | 2000-10-03 | Pratt & Whitney Canada Corp. | Anti-coking fuel injector purging device |
US6189004B1 (en) * | 1998-05-06 | 2001-02-13 | E. Piphany, Inc. | Method and apparatus for creating a datamart and for creating a query structure for the datamart |
US6212524B1 (en) * | 1998-05-06 | 2001-04-03 | E.Piphany, Inc. | Method and apparatus for creating and populating a datamart |
US6161103A (en) | 1998-05-06 | 2000-12-12 | Epiphany, Inc. | Method and apparatus for creating aggregates for use in a datamart |
US6212617B1 (en) * | 1998-05-13 | 2001-04-03 | Microsoft Corporation | Parallel processing method and system using a lazy parallel data type to reduce inter-processor communication |
US6108647A (en) | 1998-05-21 | 2000-08-22 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases |
US6324533B1 (en) * | 1998-05-29 | 2001-11-27 | International Business Machines Corporation | Integrated database and data-mining system |
US6289352B1 (en) | 1998-05-29 | 2001-09-11 | Crystal Decisions, Inc. | Apparatus and method for compound on-line analytical processing in databases |
US6157955A (en) * | 1998-06-15 | 2000-12-05 | Intel Corporation | Packet processing system including a policy engine having a classification unit |
JP2000011005A (en) * | 1998-06-17 | 2000-01-14 | Hitachi Ltd | Data analyzing method and its device and computer- readable recording medium recorded with data analytical program |
US6587857B1 (en) | 1998-06-30 | 2003-07-01 | Citicorp Development Center, Inc. | System and method for warehousing and retrieving data |
US6282546B1 (en) | 1998-06-30 | 2001-08-28 | Cisco Technology, Inc. | System and method for real-time insertion of data into a multi-dimensional database for network intrusion detection and vulnerability assessment |
US6009432A (en) | 1998-07-08 | 1999-12-28 | Required Technologies, Inc. | Value-instance-connectivity computer-implemented database |
JP3213585B2 (en) * | 1998-07-09 | 2001-10-02 | 株式会社インフォメックス | Data search method and apparatus, data search system, recording medium |
JP2000048087A (en) * | 1998-07-15 | 2000-02-18 | Internatl Business Mach Corp <Ibm> | View synthesizing system |
US6226647B1 (en) | 1998-07-24 | 2001-05-01 | Oracle Corporation | Method, article of manufacture, and apparatus for constructing a multi-dimensional view containing two-pass value measure results |
US6446061B1 (en) | 1998-07-31 | 2002-09-03 | International Business Machines Corporation | Taxonomy generation for document collections |
US6567814B1 (en) * | 1998-08-26 | 2003-05-20 | Thinkanalytics Ltd | Method and apparatus for knowledge discovery in databases |
US6535868B1 (en) * | 1998-08-27 | 2003-03-18 | Debra A. Galeazzi | Method and apparatus for managing metadata in a database management system |
US6826593B1 (en) | 1998-09-01 | 2004-11-30 | Lucent Technologies Inc. | Computer implemented method and apparatus for fulfilling a request for information content with a user-selectable version of a file containing that information content |
WO2000019340A1 (en) * | 1998-09-30 | 2000-04-06 | I2 Technologies, Inc. | Multi-dimensional data management system |
US6480850B1 (en) | 1998-10-02 | 2002-11-12 | Ncr Corporation | System and method for managing data privacy in a database management system including a dependently connected privacy data mart |
US6301579B1 (en) | 1998-10-20 | 2001-10-09 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing a data structure |
US6317750B1 (en) * | 1998-10-26 | 2001-11-13 | Hyperion Solutions Corporation | Method and apparatus for accessing multidimensional data |
US6249769B1 (en) | 1998-11-02 | 2001-06-19 | International Business Machines Corporation | Method, system and program product for evaluating the business requirements of an enterprise for generating business solution deliverables |
US6212515B1 (en) * | 1998-11-03 | 2001-04-03 | Platinum Technology, Inc. | Method and apparatus for populating sparse matrix entries from corresponding data |
US6256676B1 (en) | 1998-11-18 | 2001-07-03 | Saga Software, Inc. | Agent-adapter architecture for use in enterprise application integration systems |
US6738975B1 (en) * | 1998-11-18 | 2004-05-18 | Software Ag, Inc. | Extensible distributed enterprise application integration system |
US6532459B1 (en) * | 1998-12-15 | 2003-03-11 | Berson Research Corp. | System for finding, identifying, tracking, and correcting personal information in diverse databases |
JP4172559B2 (en) * | 1998-12-22 | 2008-10-29 | カシオ計算機株式会社 | Data analysis result notification device and recording medium |
US5991754A (en) | 1998-12-28 | 1999-11-23 | Oracle Corporation | Rewriting a query in terms of a summary based on aggregate computability and canonical format, and when a dimension table is on the child side of an outer join |
US6424979B1 (en) * | 1998-12-30 | 2002-07-23 | American Management Systems, Inc. | System for presenting and managing enterprise architectures |
US6363353B1 (en) * | 1999-01-15 | 2002-03-26 | Metaedge Corporation | System for providing a reverse star schema data model |
US6411961B1 (en) * | 1999-01-15 | 2002-06-25 | Metaedge Corporation | Apparatus for providing a reverse star schema data model |
US6377934B1 (en) * | 1999-01-15 | 2002-04-23 | Metaedge Corporation | Method for providing a reverse star schema data model |
US6487547B1 (en) | 1999-01-29 | 2002-11-26 | Oracle Corporation | Database appliance comprising hardware and software bundle configured for specific database applications |
US6330564B1 (en) * | 1999-02-10 | 2001-12-11 | International Business Machines Corporation | System and method for automated problem isolation in systems with measurements structured as a multidimensional database |
US6513019B2 (en) * | 1999-02-16 | 2003-01-28 | Financial Technologies International, Inc. | Financial consolidation and communication platform |
US6542886B1 (en) * | 1999-03-15 | 2003-04-01 | Microsoft Corporation | Sampling over joins for database systems |
US6532458B1 (en) * | 1999-03-15 | 2003-03-11 | Microsoft Corporation | Sampling for database systems |
US6154766A (en) | 1999-03-23 | 2000-11-28 | Microstrategy, Inc. | System and method for automatic transmission of personalized OLAP report output |
US6173310B1 (en) | 1999-03-23 | 2001-01-09 | Microstrategy, Inc. | System and method for automatic transmission of on-line analytical processing system report output |
US6694316B1 (en) * | 1999-03-23 | 2004-02-17 | Microstrategy Inc. | System and method for a subject-based channel distribution of automatic, real-time delivery of personalized informational and transactional data |
US6567796B1 (en) * | 1999-03-23 | 2003-05-20 | Microstrategy, Incorporated | System and method for management of an automatic OLAP report broadcast system |
US6260050B1 (en) | 1999-03-23 | 2001-07-10 | Microstrategy, Inc. | System and method of adapting automatic output of service related OLAP reports to disparate output devices |
US6460026B1 (en) | 1999-03-30 | 2002-10-01 | Microsoft Corporation | Multidimensional data ordering |
US6535872B1 (en) * | 1999-04-08 | 2003-03-18 | International Business Machines Corporation | Method and apparatus for dynamically representing aggregated and segmented data views using view element sets |
US6804714B1 (en) * | 1999-04-16 | 2004-10-12 | Oracle International Corporation | Multidimensional repositories for problem discovery and capacity planning of database applications |
US6549907B1 (en) * | 1999-04-22 | 2003-04-15 | Microsoft Corporation | Multi-dimensional database and data cube compression for aggregate query support on numeric dimensions |
US6167396A (en) | 1999-05-12 | 2000-12-26 | Knosys, Inc. | Method and apparatus for navigating and displaying data points stored in a multidimensional database |
US6560594B2 (en) * | 1999-05-13 | 2003-05-06 | International Business Machines Corporation | Cube indices for relational database management systems |
US6282544B1 (en) | 1999-05-24 | 2001-08-28 | Computer Associates Think, Inc. | Method and apparatus for populating multiple data marts in a single aggregation process |
US6163774A (en) | 1999-05-24 | 2000-12-19 | Platinum Technology Ip, Inc. | Method and apparatus for simplified and flexible selection of aggregate and cross product levels for a data warehouse |
US6285994B1 (en) | 1999-05-25 | 2001-09-04 | International Business Machines Corporation | Method and system for efficiently searching an encoded vector index |
US6381605B1 (en) * | 1999-05-29 | 2002-04-30 | Oracle Corporation | Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets |
US6470344B1 (en) | 1999-05-29 | 2002-10-22 | Oracle Corporation | Buffering a hierarchical index of multi-dimensional data |
US6411313B1 (en) * | 1999-06-14 | 2002-06-25 | Microsoft Corporation | User interface for creating a spreadsheet pivottable |
US6442560B1 (en) | 1999-06-22 | 2002-08-27 | Microsoft Corporation | Record for multidimensional databases |
US6477536B1 (en) * | 1999-06-22 | 2002-11-05 | Microsoft Corporation | Virtual cubes |
US6374234B1 (en) * | 1999-06-22 | 2002-04-16 | Microsoft Corporation | Aggregations performance estimation in database systems |
US6493728B1 (en) * | 1999-06-22 | 2002-12-10 | Microsoft Corporation | Data compression for records of multidimensional database |
US6446059B1 (en) | 1999-06-22 | 2002-09-03 | Microsoft Corporation | Record for a multidimensional database with flexible paths |
US6456999B1 (en) | 1999-06-22 | 2002-09-24 | Microsoft Corporation | Aggregations size estimation in database services |
US6366905B1 (en) * | 1999-06-22 | 2002-04-02 | Microsoft Corporation | Aggregations design in database services |
US6424972B1 (en) | 1999-06-22 | 2002-07-23 | Microsoft Corporation | Floating point conversion for records of multidimensional database |
US6438537B1 (en) | 1999-06-22 | 2002-08-20 | Microsoft Corporation | Usage based aggregation optimization |
US6223573B1 (en) | 1999-06-25 | 2001-05-01 | General Electric Company | Method for precision temperature controlled hot forming |
US6460031B1 (en) | 1999-06-28 | 2002-10-01 | Sap Aktiengesellschaft | System and method for creating and titling reports using an integrated title bar and navigator |
US6707454B1 (en) * | 1999-07-01 | 2004-03-16 | Lucent Technologies Inc. | Systems and methods for visualizing multi-dimensional data in spreadsheets and other data structures |
US6708155B1 (en) * | 1999-07-07 | 2004-03-16 | American Management Systems, Inc. | Decision management system with automated strategy optimization |
US6480848B1 (en) | 1999-07-19 | 2002-11-12 | International Business Machines Corporation | Extension of data definition language (DDL) capabilities for relational databases for applications issuing DML and DDL statements |
US6453322B1 (en) | 1999-07-19 | 2002-09-17 | International Business Machines Corporation | Extension of data definition language (DDL) capabilities for relational databases for applications issuing multiple units of work |
US6374263B1 (en) * | 1999-07-19 | 2002-04-16 | International Business Machines Corp. | System for maintaining precomputed views |
US6665682B1 (en) | 1999-07-19 | 2003-12-16 | International Business Machines Corporation | Performance of table insertion by using multiple tables or multiple threads |
US6836894B1 (en) | 1999-07-27 | 2004-12-28 | International Business Machines Corporation | Systems and methods for exploratory analysis of data for event management |
US6842758B1 (en) * | 1999-07-30 | 2005-01-11 | Computer Associates Think, Inc. | Modular method and system for performing database queries |
US6691140B1 (en) * | 1999-07-30 | 2004-02-10 | Computer Associates Think, Inc. | Method and system for multidimensional storage model with interdimensional links |
US6581054B1 (en) * | 1999-07-30 | 2003-06-17 | Computer Associates Think, Inc. | Dynamic query model and method |
US6408292B1 (en) * | 1999-08-04 | 2002-06-18 | Hyperroll, Israel, Ltd. | Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions |
US6385604B1 (en) * | 1999-08-04 | 2002-05-07 | Hyperroll, Israel Limited | Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements |
US6442269B1 (en) | 1999-08-23 | 2002-08-27 | Aspect Communications | Method and apparatus for integrating business data and transaction data in a transaction processing environment |
US6546395B1 (en) * | 1999-08-30 | 2003-04-08 | International Business Machines Corporation | Multi-dimensional restructure performance by selecting a technique to modify a relational database based on a type of restructure |
US6542895B1 (en) * | 1999-08-30 | 2003-04-01 | International Business Machines Corporation | Multi-dimensional restructure performance when adding or removing dimensions and dimensions members |
US6658093B1 (en) * | 1999-09-13 | 2003-12-02 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for travel availability information |
US6493723B1 (en) | 1999-09-22 | 2002-12-10 | International Business Machines Corporation | Method and system for integrating spatial analysis and data mining analysis to ascertain warranty issues associated with transportation products |
US6430547B1 (en) | 1999-09-22 | 2002-08-06 | International Business Machines Corporation | Method and system for integrating spatial analysis and data mining analysis to ascertain relationships between collected samples and geology with remotely sensed data |
US6438538B1 (en) * | 1999-10-07 | 2002-08-20 | International Business Machines Corporation | Data replication in data warehousing scenarios |
US6493718B1 (en) | 1999-10-15 | 2002-12-10 | Microsoft Corporation | Adaptive database caching and data retrieval mechanism |
US6473764B1 (en) * | 1999-10-15 | 2002-10-29 | Microsoft Corporation | Virtual dimensions in databases and method therefor |
US6405207B1 (en) * | 1999-10-15 | 2002-06-11 | Microsoft Corporation | Reporting aggregate results from database queries |
US6898603B1 (en) * | 1999-10-15 | 2005-05-24 | Microsoft Corporation | Multi-dimensional data structure caching |
US6473750B1 (en) | 1999-10-15 | 2002-10-29 | Microsoft Corporation | Adaptive query execution in a distributed database system |
US6484179B1 (en) * | 1999-10-25 | 2002-11-19 | Oracle Corporation | Storing multidimensional data in a relational database management system |
US6677963B1 (en) * | 1999-11-16 | 2004-01-13 | Verizon Laboratories Inc. | Computer-executable method for improving understanding of business data by interactive rule manipulation |
FR2806183B1 (en) * | 1999-12-01 | 2006-09-01 | Cartesis S A | DEVICE AND METHOD FOR INSTANT CONSOLIDATION, ENRICHMENT AND "REPORTING" OR BACKGROUND OF INFORMATION IN A MULTIDIMENSIONAL DATABASE |
US6766325B1 (en) * | 1999-12-02 | 2004-07-20 | Microsoft Corporation | System and method for maintaining data for performing “what if” analysis |
US6557008B1 (en) * | 1999-12-07 | 2003-04-29 | International Business Machines Corporation | Method for managing a heterogeneous IT computer complex |
US6405208B1 (en) * | 1999-12-13 | 2002-06-11 | Hyperion Solutions Corporation | Dynamic recursive build for multidimensional databases and methods and apparatus thereof |
US6633875B2 (en) | 1999-12-30 | 2003-10-14 | Shaun Michael Brady | Computer database system and method for collecting and reporting real estate property and loan performance information over a computer driven network |
US6356900B1 (en) * | 1999-12-30 | 2002-03-12 | Decode Genetics Ehf | Online modifications of relations in multidimensional processing |
US6418427B1 (en) * | 1999-12-30 | 2002-07-09 | Decode Genetics Ehf | Online modifications of dimension structures in multidimensional processing |
US6434557B1 (en) | 1999-12-30 | 2002-08-13 | Decode Genetics Ehf. | Online syntheses programming technique |
US6671715B1 (en) | 2000-01-21 | 2003-12-30 | Microstrategy, Inc. | System and method for automatic, real-time delivery of personalized informational and transactional data to users via high throughput content delivery device |
US6615096B1 (en) | 2000-01-31 | 2003-09-02 | Ncr Corporation | Method using statistically analyzed product test data to control component manufacturing process |
US6947934B1 (en) | 2000-02-16 | 2005-09-20 | International Business Machines Corporation | Aggregate predicates and search in a database management system |
US6643608B1 (en) | 2000-02-22 | 2003-11-04 | General Electric Company | System and method for collecting and analyzing shipment parameter data affecting predicted statistical variables of shipped articles |
CA2327948A1 (en) * | 2000-02-25 | 2001-08-25 | International Business Machines Corporation | System and method for accessing non-relational data by relational access methods |
US20020029207A1 (en) * | 2000-02-28 | 2002-03-07 | Hyperroll, Inc. | Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein |
CA2407974A1 (en) * | 2000-03-16 | 2001-09-20 | Poly Vista, Inc. | A system and method for analyzing a query and generating results and related questions |
US6768986B2 (en) * | 2000-04-03 | 2004-07-27 | Business Objects, S.A. | Mapping of an RDBMS schema onto a multidimensional data model |
AU2001257077A1 (en) * | 2000-04-17 | 2001-10-30 | Brio Technology, Inc. | Analytical server including metrics engine |
US7167859B2 (en) * | 2000-04-27 | 2007-01-23 | Hyperion Solutions Corporation | Database security |
US6748394B2 (en) * | 2000-04-27 | 2004-06-08 | Hyperion Solutions Corporation | Graphical user interface for relational database |
US7080090B2 (en) * | 2000-04-27 | 2006-07-18 | Hyperion Solutions Corporation | Allocation measures and metric calculations in star schema multi-dimensional data warehouse |
US6941311B2 (en) * | 2000-04-27 | 2005-09-06 | Hyperion Solutions Corporation | Aggregate navigation system |
US6643661B2 (en) * | 2000-04-27 | 2003-11-04 | Brio Software, Inc. | Method and apparatus for implementing search and channel features in an enterprise-wide computer system |
US6732115B2 (en) * | 2000-04-27 | 2004-05-04 | Hyperion Solutions Corporation | Chameleon measure and metric calculation |
US7072897B2 (en) * | 2000-04-27 | 2006-07-04 | Hyperion Solutions Corporation | Non-additive measures and metric calculation |
US7096219B1 (en) | 2000-05-10 | 2006-08-22 | Teleran Technologies, Inc. | Method and apparatus for optimizing a data access customer service system |
US6594672B1 (en) * | 2000-06-01 | 2003-07-15 | Hyperion Solutions Corporation | Generating multidimensional output using meta-models and meta-outlines |
US6601062B1 (en) * | 2000-06-27 | 2003-07-29 | Ncr Corporation | Active caching for multi-dimensional data sets in relational database management system |
US6763357B1 (en) * | 2000-06-27 | 2004-07-13 | Ncr Corporation | Method for determining the computability of data for an active multi-dimensional cache in a relational database management system |
US6399775B1 (en) * | 2000-07-13 | 2002-06-04 | Thota Giridhar | Methods for the preparation of polymorphs of doxazosin mesylate |
US6829621B2 (en) * | 2000-10-06 | 2004-12-07 | International Business Machines Corporation | Automatic determination of OLAP cube dimensions |
US7054866B2 (en) | 2001-03-20 | 2006-05-30 | Mci, Inc. | Systems and methods for communicating from an integration platform to a provisioning server |
US6931418B1 (en) * | 2001-03-26 | 2005-08-16 | Steven M. Barnes | Method and system for partial-order analysis of multi-dimensional data |
US6801908B1 (en) | 2002-01-28 | 2004-10-05 | Supplychainge Inc | System and method for selectively presenting multi-dimensional data in two-dimensional form |
CA2371731A1 (en) | 2002-02-12 | 2003-08-12 | Cognos Incorporated | Database join disambiguation by grouping |
US20040247105A1 (en) | 2002-03-29 | 2004-12-09 | Karen Mullis | System and method for a network-based call reception limiter |
US7853508B2 (en) * | 2003-05-19 | 2010-12-14 | Serena Software, Inc. | Method and system for object-oriented management of multi-dimensional data |
US7778899B2 (en) | 2003-05-19 | 2010-08-17 | Serena Software, Inc. | Method and system for object-oriented workflow management of multi-dimensional data |
US7366725B2 (en) * | 2003-08-11 | 2008-04-29 | Descisys Limited | Method and apparatus for data validation in multidimensional database |
US6848758B1 (en) * | 2003-10-31 | 2005-02-01 | Chih-Cheng Yeh | Do it yourself (DIY) modular cabinet |
US20080129747A1 (en) * | 2003-11-19 | 2008-06-05 | Reuven Bakalash | Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control |
-
2000
- 2000-08-09 US US09/634,748 patent/US6385604B1/en not_active Expired - Lifetime
-
2001
- 2001-02-28 JP JP2001565050A patent/JP5242875B2/en not_active Expired - Fee Related
- 2001-02-28 CA CA2401348A patent/CA2401348C/en not_active Expired - Fee Related
- 2001-02-28 AU AU2001239919A patent/AU2001239919A1/en not_active Abandoned
- 2001-02-28 WO PCT/US2001/006316 patent/WO2001067303A1/en active Application Filing
- 2001-02-28 EP EP01914545A patent/EP1266308A4/en not_active Ceased
-
2002
- 2002-05-01 US US10/136,937 patent/US20020194167A1/en not_active Abandoned
- 2002-12-09 US US10/314,902 patent/US20030225752A1/en not_active Abandoned
- 2002-12-09 US US10/314,868 patent/US7392248B2/en not_active Expired - Fee Related
-
2004
- 2004-04-06 US US10/818,697 patent/US20050091237A1/en not_active Abandoned
-
2006
- 2006-06-22 US US11/473,299 patent/US20070192295A1/en not_active Abandoned
-
2009
- 2009-03-31 US US12/384,093 patent/US8463736B2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
US20090271384A1 (en) | 2009-10-29 |
CA2401348A1 (en) | 2001-09-13 |
US20070192295A1 (en) | 2007-08-16 |
US20030225752A1 (en) | 2003-12-04 |
JP2003526159A (en) | 2003-09-02 |
US7392248B2 (en) | 2008-06-24 |
EP1266308A4 (en) | 2004-09-01 |
AU2001239919A1 (en) | 2001-09-17 |
WO2001067303A1 (en) | 2001-09-13 |
US20030200221A1 (en) | 2003-10-23 |
US6385604B1 (en) | 2002-05-07 |
US20050091237A1 (en) | 2005-04-28 |
US8463736B2 (en) | 2013-06-11 |
JP5242875B2 (en) | 2013-07-24 |
EP1266308A1 (en) | 2002-12-18 |
US20020194167A1 (en) | 2002-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2401348C (en) | Multi-dimensional database and integrated aggregation server | |
US7315849B2 (en) | Enterprise-wide data-warehouse with integrated data aggregation engine | |
US8041670B2 (en) | Data aggregation module supporting dynamic query responsive aggregation during the servicing of database query requests provided by one or more client machines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20200228 |