A
data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.
Subject-Oriented:
A data warehouse can be used to analyze
a particular subject area. For example, "sales" can be a particular
subject.
Integrated:
A data warehouse integrates data from
multiple data sources. For example, source A and source B may have different
ways of identifying a product, but in a data warehouse, there will be only a
single way of identifying a product.
Time-Variant:
Historical data is kept in a data
warehouse. For example, one can retrieve data from 3 months, 6 months, 12
months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept.
For example, a transaction system may hold the most recent address of a
customer, where a data warehouse can hold all addresses associated with a
customer.
Non-volatile:
Once data is in the data warehouse, it will not
change. So, historical data in a data warehouse should never be altered.
How many stages in Data warehousing?
Data
warehouse generally includes two stages
1. ETL
2. Report
Generation
ETL
Short
for extract, transform, load, three database functions that are combined into
one tool
·
Extract — the process of reading data from a
source database.
·
Transform — the process of converting the
extracted data from its previous form into required form.
·
Load — the process of writing the
data into the target database.
ETL
is used to migrate data from one database to another, to form data marts and
data warehouses and also to convert databases from one format to another format
It
is used to retrieve the data from various operational databases and is
transformed into useful information and finally loaded into Data warehousing
system
1.
Informatica
2.
Abinito
3.
Datastage
4.
Bodi
5.
Oracle
Warehouse Builders
Report
generation
In
report generation, OLAP is used (i.e.) online analytical processing. It is a
set of specification which allows the client applications in retrieving the
data for analytical processing
It
is a specialized tool that sits between a database and user in order to provide
various analyses of the data stored in the database.
OLAP
Tool is a reporting tool which generates the reports that are useful
for Decision support for top level management
1.
Business
Objects
2.
Cognos
3.
Micro
strategy
4.
Hyperion
5.
Oracle
Express
6.
Microsoft
Analysis Services
What are the types of datawarehousing?
EDW
(Enterprise data warehousing)
·
It
provides a central database for decision support throughout the enterprise
·
It
is a collection of DATAMARTS
DATAMART
·
It
is a subset of Data warehousing.
·
It
is a subject oriented database which supports the needs of
individuals depts. in an organizations.
·
It
is called high performance query structure.
·
It
supports particular line of business like sales, marketing etc…
ODS
(Operational data store)
·
It
is defined as an integrated view of operational database designed to support
operational monitoring
·
It
is a collection of operational data sources designed to support Transaction
processing
·
Data
is refreshed near real-time and used for business activity
·
It
is an intermediate between the OLTP and OLAP which helps to create an
instance reports.
What are the types of Approach
in DWH?
Top
down approach: first
we need to develop EDW then form that EDW we develop data mart
OLTP —>
ETL —> DWH —> Data
mart —> OLAP
Advantages
·
Cost
of initial planning & design is high
·
Takes
longer duration of more than an year
Bottom
up approach: first
we need to develop data mart then we integrate these data mart into EDW
OLTP
—> ETL —> Data mart —>
DWH —> OLAP
Advantages
·
Planning
& Designing the Data Marts without waiting for the Global warehouse design
·
Immediate
results from the data marts
·
Tends
to take less time to implement
·
Errors
in critical modules are detected earlier.
·
Benefits
are realized in the early phases.
·
It
is a Best Approach.
.
Why need staging area for DWH?
1.
Staging area needs to clean operational data before loading into data
warehouse.
2.
Cleaning in the sense your merging data which comes from different source.
3.
it’s the area where most of the ETL is done
Data
Cleansing
·
It
is used to remove duplication’s
·
It
is used to correct wrong email addresses
·
It
is used to identify missing data
·
It
used to convert the data types
·
It
is used to capitalize name & addresses.
Types of systems
Data Mart: - A data mart is a simple form of a data warehouse that is
focused on a single subject (or functional area), such as Sales, Finance, or
Marketing. Data marts are often built and controlled by a single department
within an organization. Given their single-subject focus, data marts usually
draw data from only a few sources. The sources could be internal operational
systems, a central data warehouse, or external data.
- It is a high performance query structure(HPQS)
- Fast retrieval of data is possible through Datamart
Differences
between a Data Warehouse and a Data Mart
Category
|
Data
Warehouse
|
Data
Mart
|
Scope
|
Corporate
|
Line of
Business (LOB)
|
Subject
|
Multiple
|
Single subject
|
Data Sources
|
Many
|
Few
|
Size (typical)
|
100 GB-TB+
|
< 100 GB
|
Implementation
Time
|
Months to years
|
Months
|
Metadata is data that describes
other data. Meta is a prefix that in most information technology
usages means "an underlying definition or
description." Metadata summarizes basic information about
data, which can make finding and working with particular instances
of data easier.
Data
integration means
combining Data coming from different sources and providing users
with a unified view of these data.
Dimensional vs. normalized approach for
storage of data
There are three or more leading
approaches to storing data in a data warehouse — the most important approaches
are the dimensional approach and the normalized approach.
The dimensional approach, whose
supporters are referred to as “Kimballites”, believe in Ralph Kimball’s
approach in which it is stated that the data warehouse should be modeled using
a Dimensional Model/star schema. The normalized approach, also called the 3NF
model (Third Normal Form), whose supporters are referred to as “Inmonites”,
believe in Bill Inmon's approach in which it is stated that the data warehouse
should be modeled using an E-R model/normalized model.
In a dimensional approach, transaction
data are partitioned into "facts", which are generally numeric
transaction data, and "dimensions", which are the reference
information that gives context to the facts. For example, a sales transaction
can be broken up into facts such as the number of products ordered and the
price paid for the products, and into dimensions such as order date, customer
name, product number, order ship-to and bill-to locations, and salesperson
responsible for receiving the order.
Features of Data Warehouse:
- A DWH is designed to support decision making process. Hence it is known as Decision Support System(DSS)
- It analyzes the business transactions in order to support decision making
- DWH is a container data
- DWH is the process of developing a data warehouse
- DWH is a read only database because it is designed to read the data for analysis but not for transactional processing
- DWH is a historical database because it can store historical business information
- Father of Data warehousing W.H.Inmon. In 1987, he designed a data warehouse.
No comments:
Post a Comment