Informatica,OBIEE,Oracle Blog: December 2016

Friday, 23 December 2016

Data Modeling

Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow.

Model: Model is business representation of information in one or more data sources.

In OLTP data model is designed with the help of Entity Relation model. This ER Model can be modified during the ETL process to fit into the data warehousing project.

In data warehousing data modeling can be achieved with the help of dimensional modeling based on facts and measures

Data Modeling Types:-

Conceptual Data Modeling

Conceptual data model includes all major entities and relationships and but does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE

Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.

Conceptual data modeling gives an idea to the functional and technical team about how business requirements would be projected in the logical data model.

Logical Data Modeling

This is the actual implementation and extension of a conceptual data model. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

Physical Data Modeling

Physical data model includes all required tables, columns, data types, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and demoralization are important parameters of a physical model.

The three levels of data modeling, conceptual data model, logical data model, and physical data model, were discussed in prior sections. Here we compare these three types of data models. The table below compares the different features:

Feature	Conceptual	Logical	Physical
Entity Names	✓	✓
Entity Relationships	✓	✓
Attributes		✓
Primary Keys		✓	✓
Foreign Keys		✓	✓
Table Names			✓
Column Names			✓
Column Data Types			✓

Below we show the conceptual, logical, and physical versions of a single data model.

Dimensional Data Modeling

· Dimension model consists of fact and dimension tables

· It is an approach to develop the schema DB designs

Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model.

To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:

Dimension: A category of information. For example, the time dimensions.

Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.

Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.

Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.

Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").

A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.

Data Modeling is a process of designing the data warehouse with its following types of schema

· Star schema

· Snowflake schema

Wednesday, 21 December 2016

Time Series Functions in OBIEE

Time series functions operate on time-oriented dimensions. The time series functions calculate AGO, TODATE, and PERIODROLLING functions based on user supplied calendar tables, not on standard SQL date manipulation functions.

These functions let you use Expression Builder to call a logical function to perform time series calculations instead of aliasing physical tables and modeling logically.

To use time series functions on a particular dimension, you must designate the dimension as a Time dimension and set one or more keys at one or more levels as chronological keys.

Functions include:

· AGO

· PERIODROLLING

· TODATE

In this topic you create time series calculation measures using Oracle BI time series functions.

Time series functions include AGO, TODATE, and PERIODROLLING. These functions let you use Expression Builder to call a logical function to perform time series calculations instead of creating aliases for physical tables and modeling logically. The time series functions calculate AGO, TODATE, and PERIODROLLING functions based on the calendar tables in your data warehouse, not on standard SQL date manipulation functions.

To create time series measures, you perform the following steps:
• Identify a Logical Dimension as a Time Dimension
• Identify Level Keys as Chronological Keys
• Create a Measure Using the AGO Function
• Create a Measure Using the TODATE Function
• Create a Measure Using the PERIODROLLING Function
• Test Your Work

Time Series Functions in OBIEE

1) Ago

Ago is a time series function that calculates the aggregated value from the current time back to a specified time period. this is useful for comparison such as amount compared to amount a month ago.

Syntax: AGO(<Measure >,<Level>,<number of period>)

Measure: A measure column.

Level: On which level you want to calculate this measure such as Year,Quarter,Month etc.

Number of period: how many level you want to go back from current time such as 2 year back,2 months back etc.

Example : AGO(Amount,Year,1) this will give the last year amount .

2) ToDate

To Date is a time series function that calculates a measure attributes from the beginning of a specified time period to the currently displayed time period.

Syntax : ToDate(<Measure>,<Level>)

Measure: A measure column.

Level: On which level you want to calculate this measure such as Year,Quarter,Month etc.

Example : ToDate(Amount,Year) this will give the amount from the beginning of the year to Currently displayed time.

3) Period Rolling

Period Rolling is a new time series function in 11g .It computes the sum of a measure over the period starting x units of time and ending y units of time from the current time .This function allow us to create a aggregated measures across a specified set of query grain period rather than within a fixed time series grain.

Syntax: PeriodRolling(<Measure>,<Starting Period Offset>, < Starting Period Offset>)

Measure: A measure column.

Starting Period Offset: Starting period used in rolling aggregation.

Ending Period Offset: Ending period used in rolling aggregation.

Example : PeriodRolling(Amount,-2,0) this will give three months rolling sum of column Amount.It includes past two months with the current month. Numerical value -2 indicates May-14,Jun-14 if current month is Jul-14.Numerical value 0 indicates current month.

Note : This function has no time series grain ,the length of the rolling sequence is determined by query grain for example it will give the last three Monthamount if the query grain is month , if the query grain is year then it will give last three Year Amount.

Slowly Changing Dimension (SCD)

Slowly changing dimensions are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.

For example, you may have a customer dimension in a retail domain. Let say the customer is in India and every month he does some shopping. Now creating the sales report for the customers is easy. Now assume that the customer is transferred to United States and he does shopping there.

There are many approaches how to deal with SCD. The most popular are:

 Type 0 - The passive method

 Type 1 - Overwriting the old value

 Type 2 - Creating a new additional record

 Type 3 - Adding a new column

 Type 4 - Using historical table

 Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Type 0 - The passive method. In this method no special action is performed upon dimensional changes. Some dimension data can remain the same as it was first time inserted, others may be overwritten.

How to record such a change in your customer dimension?
You could sum or average the sales done by the customers. In this case you won't get the exact comparison of the sales done by the customers. As the customer salary is increased after the transfer, he/she might do more shopping in United States compared to in India. If you sum the total sales, then the sales done by the customer might look stronger even if it is good. You can create a second customer record and treat the transferred customer as the new customer. However this will create problems too.

Handling these issues involves 3 main SCD management methodologies.They are:

Slowly changing dimension Type 1

Slowly changing dimension Type 2

Slowly changing dimension Type 3

Slowly changing dimension Type 1: SCD type 1 methodology is used when there is no need to store historical data in the dimension table. This method overwrites the old data in the dimension table with the new data. It is used to correct data errors in the dimension.

As an example, i have the customer table with the below data.

surrogate_key customer_id customer_name Location

------------------------------------------------------------------------------

1 1 Marspton Illions

Here the customer name is misspell. It should be Marston instead of Marspton. If you use type1 method, it just simply overwrites the data. The data in the updated table will be.

surrogate_key customer_id customer_name Location

------------------------------------------------------------------------------

1 1 Marston Illions

The advantage of type1 is ease of maintenance and less space occupied. The disadvantage is that there is no historical data kept in the data warehouse.

Slowly changing dimension Type 2: SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store unlimited history in the dimension table. In type 2, you can store the data in three different ways. They are

Versioning
Flagging
Effective Date

SCD Type 2 Versioning: In versioning method, a sequence number is used to represent the change. The latest sequence number always represents the current row and the previous sequence numbers represents the past data.

As an example, let’s use the same example of customer who changes the location. Initially the customer is in Illions location and the data in dimension table will look as.

surrogate_key customer_id customer_name Location Version

-------------------------------------------------------------------------------------------

1 1 Marston Illions 1

The customer moves from Illions to Seattle and the version number will be incremented. The dimension table will look as

surrogate_key customer_id customer_name Location Version

-------------------------------------------------------------------------------------------

1 1 Marston Illions 1

2 1 Marston Seattle 2

Now again if the customer is moved to another location, a new record will be inserted into the dimension table with the next version number.

SCD Type 2 Flagging: In flagging method, a flag column is created in the dimension table. The current record will have the flag value as 1 and the previous records will have the flag as 0.

Now for the first time, the customer dimension will look as.

surrogate_key customer_id customer_name Location Flag

---------------------------------------------------------------------------------------

1 1 Marston Illions 1

Now when the customer moves to a new location, the old records will be updated with flag value as 0 and the latest record will have the flag value as 1.

surrogate_key customer_id customer_name Location Version

-------------------------------------------------------------------------------------------

1 1 Marston Illions 0
2 1 Marston Seattle 1

SCD Type 2 Effective Date: In Effective Date method, the period of the change is tracked using the start_date and end_date columns in the dimension table.

surrogate_key customer_id customer_name Location Start_date End_date

----------------------------------------------------------------------------------------------------------------

1 1 Marston Illions 01/03/2010 20/02/2011

2 1 Marston Seattle 21/02/2011 NULL

The NULL in the End_Date indicates the current version of the data and the remaining records indicate the past data.

Slowly changing dimension Type 3: In type 3 method, only the current status and previous status of the row is maintained in the table. To track these changes two separate columns are created in the table. The customer dimension table in the type 3 method will look as

surrogate_key customer_id customer_name Current_Location previous_location

---------------------------------------------------------------------------------------------------------------------

1 1 Marston Illions NULL

Let say, the customer moves from Illions to Seattle and the updated table will look as

surrogate_key customer_id customer_name Current_Location previous_location

---------------------------------------------------------------------------------------------------------------------

1 1 Marston Seattle Illions

Now again if the customer moves from seattle to NewYork, then the updated table will be

surrogate_key customer_id customer_name Current_Location previous_location

---------------------------------------------------------------------------------------------------------------------

1 1 Marston NewYork Seattle

The type 3 method will have limited history and it depends on the number of columns you create.

Type 4 - Using historical table. In this method a separate historical table is used to track all dimension's attribute historical changes for each of the dimension. The 'main' dimension table keeps only the current data e.g. customer and customer_history tables.

Current table:

Customer_ID	Customer_Name	Customer_Type
1	Cust_1	Corporate

Historical table:

Customer_ID	Customer_Name	Customer_Type	Start_Date	End_Date
1	Cust_1	Retail	01-01-2010	21-07-2010
1	Cust_1	Oher	22-07-2010	17-05-2012
1	Cust_1	Corporate	18-05-2012	31-12-9999

Type 6 - Combine approaches of types 1,2,3 (1+2+3=6). In this type we have in dimension table such additional columns as:

 current_type - for keeping current value of the attribute. All history records for given item of attribute have the same current value.

 historical_type - for keeping historical value of the attribute. All history records for given item of attribute could have different values.

 start_date - for keeping start date of 'effective date' of attribute's history.

 end_date - for keeping end date of 'effective date' of attribute's history.

 current_flag - for keeping information about the most recent record.
In this method to capture attribute change we add a new record as in type 2. The current_type information is overwritten with the new one as in type 1. We store the history in a historical_column as in type 3.

Customer_ID	Customer_Name	Current_Type	Historical_Type	Start_Date	End_Date	Current_Flag
1	Cust_1	Corporate	Retail	01-01-2010	21-07-2010	N
2	Cust_1	Corporate	Other	22-07-2010	17-05-2012	N
3	Cust_1	Corporate	Corporate	18-05-2012	31-12-9999	Y

Informatica,OBIEE,Oracle Blog

Friday, 23 December 2016

Data Modeling

Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow.

Wednesday, 21 December 2016

Time Series Functions in OBIEE

Slowly Changing Dimension (SCD)

Big Data FQA

Live Traffic Feed