- Exam 70-458 – Objective 1 – Manage Data
- Exam 70-458 – Objective 2 – Implement Security
- Exam 70-458 – Objective 3 – Implement high availability
- Exam 70-458 – Objective 4 – Design and implement a data warehouse
- Exam 70-458 – Objective 5 – Extract and transform data
- Exam 70-458 – Objective 6 – Load data
- Exam 70-458 – Objective 7 – Configure and deploy SSIS solutions
- Exam 70-458 – Objective 8 – Build Data Quality solutions
Design data flow
- Define data sources and destinations (http://technet.microsoft.com/en-us/library/ms140080%28v=sql.110%29.aspx)
- Distinguish blocking and non-blocking transformations
- Use different methods to pull out changed data from data sources
-
Change Data Capture
- Run sys.sp_cdc_enable_db on database then for each table run sys.sp_cdc_enable_table
- There are CDC transforms in SSIS
- Change Tracking
- Only keeps track that record has changed, no details
- ALTER DATABASE <db_name> SET CHANGE_TRACKING = ON
- Alter table <tablename> ENABLE CHANGE_TRACKING
- Uses extraction log to keep versions
-
- Determine appropriate data flow components (http://msdn.microsoft.com/en-us/library/ms137612(v=sql.105).aspx)
- Determine the need for supporting Slowly Changing Dimensions (SCD) (http://msdn.microsoft.com/en-us/library/ms141715.aspx)
- Determine whether to use SQL Joins or SSIS lookup or merge join transformations
-
Merge Tranforms
- Merge (similar to UNION)
- Input must be sorted in same order
- Sort in data source when possible
- Mark source as sorted in properties
- Input column metadata must be same
- Merge join
- Supports inner, left, and full
- Requires both inputs to sorted in the same order
- Join columns must have same metadata
- Merge (similar to UNION)
- Lookup Transforms
- Can use OLEDB connection or cache connection
- Can treat unmatched rows as errors or regular output
- Can use caching for rows matched or to pre-load to reduce trips to data source
- Data Modification Transforms
- Derived column
- Data conversion
- Character map
-
- Batch processing vs. row by row processing
- Determine the appropriate transform to use for a specific task
- Slowly Changing Dimension Transformation (http://msdn.microsoft.com/en-us/ms141715)
- Changing attribute – overwrite existing records
- Historical attribute – create new records for changes
- Fixed attribute – column cannot change
- Inferred member – reference dimension value not loaded yet
- Slowly Changing Dimension Transformation (http://msdn.microsoft.com/en-us/ms141715)
- Determine the need and method for identity mapping and deduplicating
- Fuzzy lookup, fuzzy grouping, and Data Quality Services (DQS) transformation
- Fuzzy Lookup Transformation (http://msdn.microsoft.com/en-us/library/ms137786.aspx)
- Data cleaning tasks such as standardizing data, correcting data, and providing missing values.
- Fuzzy Grouping Transformation (http://msdn.microsoft.com/en-us/library/ms141764.aspx)
- Data cleaning tasks by identifying rows of data that are likely to be duplicates and selecting a canonical row of data to use in standardizing the data
- DQS Cleansing Transformation (http://msdn.microsoft.com/en-us/library/ee677619.aspx)
- Correct data from a connected data source, by applying approved rules that were created for the connected data source or a similar data source.
- Fuzzy Lookup Transformation (http://msdn.microsoft.com/en-us/library/ms137786.aspx)
- Determine the need for text mining
- Determine the need for custom data sources, destinations, and transforms
- Determine what to do with erroneous rows (http://msdn.microsoft.com/en-us/library/ms140083.aspx)
- Every transform can output an error
- Types of errors include data truncation, no matches, and type conversions
- Each type of error can be handled with different responses
- OnError Event
- Determine auditing needs
- Determine sampling needs for data mining
- Trusted/authoritative data sources, including warehouse metadata
Implement data flow
- Debug data flow (http://msdn.microsoft.com/en-us/library/ms137944.aspx)
- Data Viewers – display data between two components
- Row Counts
- Progress Reporting – Color coded for success, errors, running, or waiting
- Use the appropriate data flow component
- SQL/SSIS data transformation
- Split
- Mulit-cast – allows you to send the same data to multiple places
- Conditional – allows split data to different destinations
- Split
- Create SSIS packages that support slowly changing dimensions (http://msdn.microsoft.com/en-us/library/ms141662.aspx)
- Use the Lookup task in SSIS
- Map identities using SSIS Fuzzy Lookup
- Specify a data source and destination
- Use data flows
- Different categories of transformations (http://msdn.microsoft.com/en-us/library/ms141713(v=sql.110).aspx)
- Business Intelligence
- Row
- Rowset
- Split and Join
- Auditing
- Custom
- Read, transform, and load data
- Understand which transforms to use to accomplish a specific business task
- Data correction transformation
- Performance tune an SSIS dataflow
- Optimize Integration Services packages for speed of execution
- Maintain data integrity, including good data flow
Implement script tasks in SSIS
- Determine whether it is appropriate to use a script task (http://msdn.microsoft.com/en-us/library/ms141752.aspx)
- Use when you need to use data sources not available in Connection Manager
- Extend the capability of a control flow (http://msdn.microsoft.com/en-us/library/ms136127.aspx)
- Perform a custom action as needed (not on every row) during a control flow
Preparation resources