Data warehouse architecture is a design for storing complex business data in a central domain for data mining, business intelligence, and general access. It includes reporting, data management, security, bandwidth, and archiving requirements. The architecture should be broken into specific domains, including source system access, staging area process, data enrichment process, data architecture, business intelligence process, and storage requirements. The ETL process is used to transfer data from source systems to the data warehouse. The staging area process validates and cleanses data before loading it into the business rules layer. The data architecture layer defines the schema of the enterprise data warehouse. The business intelligence layer contains predefined reports, ad-hoc reporting capabilities, and business dashboards or alerts. Maintenance and data storage requirements must be managed and maintained.
Data warehouse architecture is a design that encapsulates all aspects of data warehousing for an enterprise environment. Data warehousing is the creation of a central domain to store complex and decentralized business data in a logical unit that enables data mining, business intelligence and general access to all relevant data within an organization. The data warehouse architecture includes all reporting requirements, data management, security requirements, bandwidth requirements, and archiving requirements.
When creating a data warehouse architecture, it is important to break the architecture into specific domains which are merged into a holistic final design. This design should be considered the blueprint for enterprise data architecture. In particular, several primary areas should be developed when considering data warehouse architecture. These areas are source system access, staging area process, data enrichment process, data architecture, business intelligence process, and storage requirements.
Data warehousing requires that source data be transferred from a records or transactional database to the data warehouse. This process is simplified in the term Extract Transform and Load (ETL), which basically encapsulates the areas of source system access, data enrichment, and data architecture. For the sake of clarity, it’s best to plan these architectural areas in detail, which outlines how the ETL process will be accomplished. While some data is required by the source systems, all data is undesirable as it would overwhelm the corporate warehouse. The primary areas of concern when addressing the source system level are data access methodologies, the data required by the source system, and update requirements.
The next data warehousing architectural layer to consider is the staging area process. Since most data from source systems requires data validation and cleansing, it is important to create a landing zone where the source data resides before loading into the business rules layer of the data warehouse. The staging area maintains raw data feeds from source systems which are typically timestamped to ensure data is current.
The data enrichment or business rule process is where the data is cleaned to meet the desired outcome of the data warehouse. A good example of this cleanup approach is using address cleanup tools; in case the source system contains bad data, the data enrichment process will run the address from the raw dataset into a business rule system that will correct the bad addresses. This is also when inaccurate data is deleted or changed to ensure completeness within the data warehouse.
The next layer to consider is the data architecture layer. This area is where the actual design or schema of the enterprise data warehouse is completed. Data warehousing is not a combination of all datasets within a company, but instead is a newly defined database created to allow an overview of all business entities within the company.
This requires that the data architecture answer the questions that will be asked by the company in the area of business intelligence and data mining. By building data architecture this way, raw datasets will be transformed into fact tables that will allow users to run ad hoc reports on the entire Business View rather than a specific database. This is also the area that will hold metadata about the data from the raw system, which could include the source system name or primary keys.
The next area to consider is business intelligence and reporting requirements. This level can be thought of as the user-facing requirement for data warehousing. Typically, this area contains predefined reports, ad-hoc reporting capabilities, and business dashboards or alerts. The business intelligence layers typically receive the most consideration, as it is the only outward-facing component within the data warehouse.
The final level to consider is maintenance and general data storage requirements. As a data warehouse continues to grow and expand, user base data storage must be rigorously managed and maintained. Also, when creating the data warehouse architecture, the project should make realistic estimates of what data storage capacity and bandwidth with data access capacity will be required. These requirements will be critical as the data warehouse becomes widely used throughout the enterprise.
Protect your devices with Threat Protection by NordVPN