- Exam 70-458 – Objective 1 – Manage Data
- Exam 70-458 – Objective 2 – Implement Security
- Exam 70-458 – Objective 3 – Implement high availability
- Exam 70-458 – Objective 4 – Design and implement a data warehouse
- Exam 70-458 – Objective 5 – Extract and transform data
- Exam 70-458 – Objective 6 – Load data
- Exam 70-458 – Objective 7 – Configure and deploy SSIS solutions
- Exam 70-458 – Objective 8 – Build Data Quality solutions
Design and implement dimensions
- Design shared/conformed dimensions
- Conformed dimensions have same meaning to each fact table (http://www.jamesserra.com/archive/2011/11/conformed-dimensions/)
- Determine whether you need support for slowly changing dimensions (http://msdn.microsoft.com/en-us/library/ms141715.aspx)
- If old values must be retained then you need this
- Determine attributes (http://msdn.microsoft.com/en-us/library/ms174760.aspx)
- Design hierarchies (http://msdn.microsoft.com/en-us/library/ms174935.aspx)
- Determine whether you need star or snowflake schema (http://dwhlaureate.blogspot.com/2012/06/difference-between-star-snowflake.html)
- Star schema
- Table for each dimension
- Each table has all attributes for that dimension
- More de-normalized
- Every dimension is tied to the fact tables
- Usually takes hit performance during ETL loads
- Snowflake schema
- More normalized
- Contains multiple tables per dimension
- Usually takes hit performance on Analysis Services cubes
- Star schema
- Determine the granularity of relationship with fact tables
- Fact tables hold the measures that the business uses
- Granularity defined as the the level of detail a fact has
- Should be the lowest level of detail that needs to examined, never load more detail than you need
- Determine the need for auditing or lineage
- Can audit each update, when it was made, and can save load times
- Lineage – record on table where data came from or when it was added
- Determine keys (business transactional or your own data warehouse/surrogate keys)
- Don’t change business keys, when from multiple sources use a surrogate key, when only one source can use business keys
- Implement dimensions (http://www.jamesserra.com/archive/category/dimensions/)
- Conformed – Shared by multiple fact tables
- Non-Conformed – Table targeted for a single fact table, Used when the business uses different attributes
- Shared dimension – Used by multiple facts
- Degenerate dimension – Used by a single fact table, stored directly in fact table (http://technet.microsoft.com/en-us/library/ms175669.aspx)
- Time Dimensions – based on standard calendar or fiscal year (http://technet.microsoft.com/en-us/library/ms174832.aspx)
- Junk Dimensions – represent a dimensional value maintained as part of a fact table that does not have any other meaningful attributes
- Determining Dimension Granularity (http://msdn.microsoft.com/en-us/library/ms166573.aspx)
- Implement data lineage of a dimension table
- Slowing Changing – The attributes are retained if updated (http://blogs.msdn.com/b/mattm/archive/2009/11/13/handling-slowly-changing-dimensions-in-ssis.aspx)
- Type 1 – The history is not retain
- Type 2 – Attribute change creates a new record, maintain complete historical record changes
- Type 3 – Original attribute value recorded and latest value recorded with an effective date
- Slowing Changing – The attributes are retained if updated (http://blogs.msdn.com/b/mattm/archive/2009/11/13/handling-slowly-changing-dimensions-in-ssis.aspx)
Design and implement fact tables
- Design a data warehouse that supports many to many relationship (http://msdn.microsoft.com/en-us/library/ms170463.aspx)
- Remember cannot be used in SSAS, will need to create intermediate table
- Appropriately index a fact table using columnstore indexes (http://msdn.microsoft.com/en-us/library/gg492088.aspx)
- Used for read-mostly tables, updates that append data, partitioning permitted and index drop/rebuild, queries that reads lots of data
- Do not use if updated frequently, partition switching and rebuild indexes don’t fit your situation, or for small queries
- Only one per table
- Partitioning (http://msdn.microsoft.com/en-us/library/ms188730(v=sql.120).aspx)
- Creating New Filegroups in DW (http://msdn.microsoft.com/en-us/library/ee796978)
- How to Implement Automatic Sliding Window in a Partitioned Table (http://msdn.microsoft.com/en-us/library/aa964122(SQL.90).aspx)
- Create new empty partition for new data
- Switch the first partition in the fact table into a temporary table one the same filegroup
- Insert temp data into the archive data
- Merge the deleted partition to remove it from the filegroup
- Drop the temp table and remove the temporary filegroup
- Partitioning create function then schema
- Additive measures– Can be summarized across all dimensions
- Semi-additive measure – Can be summarized across some dimensions (http://technet.microsoft.com/en-us/library/ms175356.aspx)
- Non-additive measures – Normally a calculated measure
- Implement fact tables
- Grouping facts – make sure facts have the same dimension relationships in the fact table, no nulls to the dimension tables
- Fact tables should have all keys to the relating dimension tables
- Primary key, composite of dimension relationships
- Determine the loading method for the fact tables
- Implement data lineage of a fact table (see notes on dimension tables)
- Design summary aggregation tables
- Record the data need to give you aggregations, ex. record count and sum, if you need an average
Preparation resources