Exam 70-458 – Objective 4 – Design and implement a data warehouse

This entry is part 4 of 8 in the series Exam 70-458

Exam 70-458 – Objective 1 – Manage Data
Exam 70-458 – Objective 2 – Implement Security
Exam 70-458 – Objective 3 – Implement high availability
Exam 70-458 – Objective 4 – Design and implement a data warehouse
Exam 70-458 – Objective 5 – Extract and transform data
Exam 70-458 – Objective 6 – Load data
Exam 70-458 – Objective 7 – Configure and deploy SSIS solutions
Exam 70-458 – Objective 8 – Build Data Quality solutions

Design and implement dimensions

Design shared/conformed dimensions
- Conformed dimensions have same meaning to each fact table (http://www.jamesserra.com/archive/2011/11/conformed-dimensions/)
Determine whether you need support for slowly changing dimensions (http://msdn.microsoft.com/en-us/library/ms141715.aspx)
- If old values must be retained then you need this
Determine attributes (http://msdn.microsoft.com/en-us/library/ms174760.aspx)
Design hierarchies (http://msdn.microsoft.com/en-us/library/ms174935.aspx)
Determine whether you need star or snowflake schema (http://dwhlaureate.blogspot.com/2012/06/difference-between-star-snowflake.html)
- Star schema
  - Table for each dimension
  - Each table has all attributes for that dimension
  - More de-normalized
  - Every dimension is tied to the fact tables
  - Usually takes hit performance during ETL loads
- Snowflake schema
  - More normalized
  - Contains multiple tables per dimension
  - Usually takes hit performance on Analysis Services cubes
Determine the granularity of relationship with fact tables
- Fact tables hold the measures that the business uses
- Granularity defined as the the level of detail a fact has
- Should be the lowest level of detail that needs to examined, never load more detail than you need
Determine the need for auditing or lineage
- Can audit each update, when it was made, and can save load times
- Lineage – record on table where data came from or when it was added
Determine keys (business transactional or your own data warehouse/surrogate keys)
- Don’t change business keys, when from multiple sources use a surrogate key, when only one source can use business keys
Implement dimensions (http://www.jamesserra.com/archive/category/dimensions/)
- Conformed – Shared by multiple fact tables
- Non-Conformed – Table targeted for a single fact table, Used when the business uses different attributes
- Shared dimension – Used by multiple facts
- Degenerate dimension – Used by a single fact table, stored directly in fact table (http://technet.microsoft.com/en-us/library/ms175669.aspx)
- Time Dimensions – based on standard calendar or fiscal year (http://technet.microsoft.com/en-us/library/ms174832.aspx)
- Junk Dimensions – represent a dimensional value maintained as part of a fact table that does not have any other meaningful attributes
- Determining Dimension Granularity (http://msdn.microsoft.com/en-us/library/ms166573.aspx)
Implement data lineage of a dimension table
- Slowing Changing – The attributes are retained if updated (http://blogs.msdn.com/b/mattm/archive/2009/11/13/handling-slowly-changing-dimensions-in-ssis.aspx)
  - Type 1 – The history is not retain
  - Type 2 – Attribute change creates a new record, maintain complete historical record changes
  - Type 3 – Original attribute value recorded and latest value recorded with an effective date

Design and implement fact tables

Design a data warehouse that supports many to many relationship (http://msdn.microsoft.com/en-us/library/ms170463.aspx)
- Remember cannot be used in SSAS, will need to create intermediate table
Appropriately index a fact table using columnstore indexes (http://msdn.microsoft.com/en-us/library/gg492088.aspx)
- Used for read-mostly tables, updates that append data, partitioning permitted and index drop/rebuild, queries that reads lots of data
- Do not use if updated frequently, partition switching and rebuild indexes don’t fit your situation, or for small queries
- Only one per table
Partitioning (http://msdn.microsoft.com/en-us/library/ms188730(v=sql.120).aspx)
- Creating New Filegroups in DW (http://msdn.microsoft.com/en-us/library/ee796978)
- How to Implement Automatic Sliding Window in a Partitioned Table (http://msdn.microsoft.com/en-us/library/aa964122(SQL.90).aspx)
  - Create new empty partition for new data
  - Switch the first partition in the fact table into a temporary table one the same filegroup
  - Insert temp data into the archive data
  - Merge the deleted partition to remove it from the filegroup
  - Drop the temp table and remove the temporary filegroup
- Partitioning create function then schema
Additive measures– Can be summarized across all dimensions
Semi-additive measure – Can be summarized across some dimensions (http://technet.microsoft.com/en-us/library/ms175356.aspx)
Non-additive measures – Normally a calculated measure
Implement fact tables
- Grouping facts – make sure facts have the same dimension relationships in the fact table, no nulls to the dimension tables
- Fact tables should have all keys to the relating dimension tables
- Primary key, composite of dimension relationships
Determine the loading method for the fact tables
Implement data lineage of a fact table (see notes on dimension tables)
Design summary aggregation tables
- Record the data need to give you aggregations, ex. record count and sum, if you need an average

Preparation resources

Series Navigation<< Exam 70-458 – Objective 3 – Implement high availabilityExam 70-458 – Objective 5 – Extract and transform data >>

Exam 70-458 – Objective 4 – Design and implement a data warehouse

Like this:

Related

Leave a CommentCancel reply

Exam 70-458 – Objective 4 – Design and implement a data warehouse

Share this:

Like this:

Related

Related Posts

Exam 70-458 – Objective 7 – Configure and deploy SSIS solutions

Exam 70-458 – Objective 8 – Build Data Quality solutions

Leave a CommentCancel reply