What is data warehouse?
Non-volatile means that the data once loaded in the warehouse will not get deleted later. Time-variant means the data will change with respect to time.
What is the benefits of data warehouse?
Why Data Warehouse is used?
What is the difference between OLTP and OLAP?
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-Sales" machine. That data is transaction data and the related system is a OLTP system. On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them. Such report will come out from OLAP system
What is data mart?
What is ER model?
What is dimensional modeling?
What is dimension?
What is fact?
What are additive, semi-additive and non-additive measures?
What is Star-schema?
What is snow-flake schema?
Snow-flake increases degree of normalization in the design.
What are the different types of dimension?
1. Conformed Dimension
2. Junk Dimension
3. Degenerated Dimension
4. Role Playing Dimension
1. Unchanging or static dimension (UCD)
2. Slowly changing dimension (SCD)
3. Rapidly changing Dimension (RCD)
What is a 'Conformed Dimension'?
What is degenerated dimension?
What is junk dimension?
What is a role-playing dimension?
What is SCD?
What is rapidly changing dimension?
Describe different types of slowly changing
Dimension (SCD)
A Type 0 dimension is where dimensional changes are not considered. This does not mean that the attributes of the dimension do not change in actual business situation. It just means that, even if the value of the attributes change, history is not kept and the table holds all the previous data.
A type 1 dimension is where history is not maintained and the table always shows the recent data. This effectively means that such dimension table is always updated with recent data whenever there is a change, and because of this update, we lose the previous values.
A type 2 dimension table tracks the historical changes by creating separate rows in the table with different surrogate keys. Consider there is a customer C1 under group G1 first and later on the customer is changed to group G2. Then there will be two separate records in dimension table like below,
Key | Customer | Group | Start Date | End Date |
1 | C1 | G1 | 1st Jan 2000 | 31st Dec 2005 |
2 | C1 | G2 | 1st Jan 2006 | NULL |
A type 3 dimension stored the history in a separate column instead of separate rows. So unlike a type 2 dimension which is vertically growing, a type 3 dimension is horizontally growing. See the example below,
Key | Customer | Previous Group | Current Group |
1 | C1 | G1 | G2 |
A type 6 dimension is a hybrid of type 1, 2 and 3 (1+2+3) which acts very similar to type 2, but only you add one extra column to denote which record is the current record.
Key | Customer | Group | Start Date | End Date | Current Flag |
1 | C1 | G1 | 1st Jan 2000 | 31st Dec 2005 | N |
2 | C1 | G2 | 1st Jan 2006 | NULL | Y |
What is a mini dimension?
What is a fact-less-fact?
Consider a school, where a single student may be taught by many teachers and a single teacher may have many students. To model this situation in dimensional model, one might introduce a fact-less-fact table joining teacher and student keys. Such a fact table will then be able to answer queries like,
1. Who are the students taught by a specific teacher.
2. Which teacher teaches maximum students.
3. Which student has highest number of teachers.etc. etc.
What is a coverage fact?
2. Which student was not taught by any teacher?
What are incident and snapshot facts
What is aggregation and what is the benefit of
aggregation?
What is slicing-dicing?
What is drill-through?
DIMENSION TABLE VS FACT TABLE
DIMENSION TABLE
|
FACT TABLE
|
It provides the context /descriptive information for a fact table measurements. | It provides measurement of an enterprise. |
Structure of Dimension - Surrogate key , one or more other fields that compose the natural key (nk) and set of Attributes. | Measurement is the amount determined by observation. |
Size of Dimension Table is smaller than Fact Table. | Structure of Fact Table - foreign key (fk), Degenerated Dimension and Measurements. |
. In a schema more number of dimensions are presented than Fact Table. | Size of Fact Table is larger than Dimension Table. |
Surrogate Key is used to prevent the primary key (pk) violation(store historical data). | In a schema less number of Fact Tables observed compared to Dimension Tables. |
Provides entry points to data. | Compose of Degenerate Dimension fields act as Primary Key. |
Values of fields are in numeric and text representation. | Values of the fields always in numeric or integer form. |
OLTP VS OLAP
OLTP
|
OLAP
|
On Line Transaction processing | On Line Analytical processing |
Continuously updates data | Read Only Data |
Tables are in normalized form | Partially Normalized / Denormalized Tables |
Single record access | Multiple records for analysis purpose |
Holds current data | Holds current and historical data |
Records are maintained using Primary key feild | Records are baased on surogate keyfield |
Delete the table or record | Cannot delete the records |
Complex data model | Simplified data model |
DATAMART VS DATA WAREHOUSE
DATAMART
|
DATA WAREHOUSE
|
A scaled - down version of the Data Warehouse that addresses only one subject like Sales Department, HR Department etc., | It is a database management system that facilitates on-line analytical processing by allowing the data to be viewed in different dimensions or perspectives to provide business intelligence. |
One fact table with multiple dimension tables. | More than one fact table and multiple dimension tables. |
[Sales Department] [HR Department] [Manufacturing Department] | [Sales Department , HR Department , Manufacturing Department] |
Small Organizations prefer DATAMART | Bigger Organization prefer DATA WAREHOUSE |
DATA MINING VS WEB MINING
DATA MINING
|
WEB MINING
|
Data mining involves using techniques to find underlying structure and relationships in large amounts of data. | Web mining involves the analysis of Web server logs of a Web site. |
Data mining products tend to fall into five categories: neural networks, knowledge discovery, data visualization, fuzzy query analysis and case-based reasoning. | The Web server logs contain the entire collection of requests made by a potential or current customer through their browser and responses by the Web server |