I have attached some basics about Datawarehouse concepts here: it would be helpful for beginners:
(I'll be adding the Analysing Part soon: OLAP, HOLAP, MOLAP) and then will discuss about the current tool i'm working with Pentaho BI opensource, Kettle ETL tool, Mondrian OLAP and JFREE reporting tool and Shark Workflow engine tool soon)....
Courtesy: my reading from oracle Data warehouse consulting methodologies, Business Intelligence roadmap and internet.
Let's Start!!
Management Information Systems and Decision Support Systems
For your business to be successful, fast and accurate access to information is critical. You extract information from existing data. Important decisions are based on the information available at any point in time. In order to get the right information at right time; you need to gather data from internal and external resources. A typical Online Transaction Processing (OLTP) system has many users adding new data to the database while fewer users generate reports from the database. As the volume of data increases, reports takes longer to generate.
Early Management Information Systems
Early Management Information Systems (MIS) provided management with reports to assess the performance the business. Report requirements were submitted as a request to the MIS development team, who developed the report and made it available to the user some time afterward –days, weeks or even months later. The data in the reports was made available in a way that was difficult to use for analysis and forecasting.
Personal computing
With advent of personal computing and 4GL programming techniques, MIS became decision support (Decision Support Systems or DSS). DSS was judged to support business users better, by giving them direct access to the operational data for additional ad-hoc querying, which provided more flexible reporting as the information was needed.
Although decision support tools are friendly, intuitive, and easy to use, often the structure of data in the OLTP systems does not support the user’s analytical requirements
A data warehouse is often used as the basis for a decision-support system(also referred to from an analytical perspective as a business intelligence system). It is designed to overcome some of the problems encountered when an organisation attempts to perform strategic analysis using the same database that is used to perform online transaction processing (OLTP).
Definition of Data Warehouse
Bill Inmon defines data warehouse as:
“ An enterprise structured repository of subject oriented, time variant historical data used for information retrieval and decision support the data warehouse stories atomic and summary data.”
Subject Oriented:
Subject oriented data is organized around major subject areas of an enterprise and is useful for an enterprise wide understanding of those subjects, for example:
Customer financial information
Toll calls made in telecommunication industry
Airline passenger booking information
Insurance claim data
Healthcare analysis etc..
Integrated
In many organizations data resides in diverse independent systems, making it difficult to integrate into one set of meaningful information for analysis. A key characteristic of a DW is the data is completed integrated.
Data consistency: you must deal with data inconsistencies and anomalies before the data loaded into warehouse. Consistency is applied to naming conventions, measurements, encoding structures and physical attributes of data.
Data redundancy: data redundancy at the detail level in the warehouse environment is eliminated.
Time variant
Warehouse data is by name historical; it does not usually contain the current transactional data. Data is represented over a long time horizon, from two to ten years, compared with one to three months of data for a typical operational system. The data allows for analysis of past and present trends and for forecasting using “what If” scenarios.
Nonvolatile
Typically, data in the data warehouse is read-only. Data is loaded into the data warehouse for the first time load, and then refreshed regularly.
Warehousing operation typically involves:
Loading the initial set of warehouse data. (often called first-time load)
Refreshing the data regularly (called the refresh cycle)
Data warehouse versus Data mart
Data mart is the subset of data warehouse. A data mart is a simpler form of a data warehouse designed for a single line of business (LOB) or functional area such as sales, finance or marketing. Data marts are typically smaller and less complex than the data warehouse and therefore are typically easier to build and maintain. A data warehouse typically assembles data from multiple source systems. A data mart typically assembles data from fewer sources.
Dependent and independent Data Marts
Data marts can be categorized into two types: dependent and independent.
Dependent data marts source is data warehouse. Dependent data marts rely on the data ware house for content. The extraction, transformation and loading (ETL) process is easy. Dependent data marts draw data from central data ware house that has already been created.
Independent data marts are stand alone systems built from scratch that the data directly from operational and external source of data. ETL process is difficult because independent data marts draw data from unclean or inconsistent data sources, efforts are directed toward error processing and integration of data. The creation of independent data marts is often driven by the need for quick solution to analysis demands especially Pilot data warehouse projects.
The Heart of DataWarehouse
Extraction, Transformation and Loading (ETL)
These processes are fundamental to the creation of quality information in the data warehouse. You take the data from source systems; clean, verify, validate, and convert it into a consistent state; then move it into the warehouse.
Extraction: the process of selecting specific operational attributes from the various operational systems.
Transformation: The process of integrating, verifying, validating, cleaning and time stamping the selected data into a consistent (star schema model-will discuss with this later) and uniform format (renormalized format) for the target database. (This stage is called as Staging)
Loading (Transformation): The process of moving data from an intermediate storage area into the target warehouse database. This stage is called as Data ware house stage.
Financial justification (Return On Investment customer perspective)
The project is big investment in resources and finances. Management must be able to report on how the data warehouse benefits the business. Justification is divided into three main areas:
The tangible benefits are that the business can remain competitive, respond to changing business conditions, and support reorganization.
Better data and decision making reduce information technology costs. Provide better response times and provide rigorous reporting
Productivity or return on investment (ROI) benefit internal and external users
Return on Investment
The financial justification must set out a strong case that clearly establishes measurements such as cost versus return on investment, and increased efficiency and profit. It must also set clearly defined objectives that can be monitored and measured.
Associated costs: along with cost justification, you should provide a plan that specifies other factors that will impact the cost of the project and other aspects of the business.
The cost of developing ETT or purchasing the ETT tools,
The actual time required for data cleansing, transformation, and extraction, which may impact day to day operations
Storage requirement for extract, summarization, workspace, log space, backup, recovery and maintenance.
The cost of redundant data
Hardware and software costs
The cost of server and system software licenses.
Labor cost.
Obtaining business commitment for DW
A data warehouse implementation requires the total support of those who control the business and make the decisions that drive the business forward. The warehouse is a business driven project, not an information technology drive for the latest hardware, software, tools and techniques.
Business objectives must be clear, well defined, measurable and achievable:
Research and study the business problem; identify the business vision, goals and priorities
Research the solution and define what the warehouse solution may do
Identify the benefit of the solution, such as efficiency, people power, customer satisfaction and returns.
Identify the constraints such as schedule, costs and experience.
0 comments:
Post a Comment