Exam 70-458 – Objective 5 – Extract and transform data

This entry is part 5 of 8 in the series Exam 70-458

 

Design data flow

  • Define data sources and destinations (http://technet.microsoft.com/en-us/library/ms140080%28v=sql.110%29.aspx)
  • Distinguish blocking and non-blocking transformations
  • Use different methods to pull out changed data from data sources
    • Change Data Capture

      • Run sys.sp_cdc_enable_db on database then for each table run sys.sp_cdc_enable_table
      • There are CDC transforms in SSIS
    • Change Tracking
      • Only keeps track that record has changed, no details
      • ALTER DATABASE <db_name> SET CHANGE_TRACKING = ON
      • Alter table <tablename> ENABLE CHANGE_TRACKING
      • Uses extraction log to keep versions 
  • Determine appropriate data flow components (http://msdn.microsoft.com/en-us/library/ms137612(v=sql.105).aspx)
  • Determine the need for supporting Slowly Changing Dimensions (SCD) (http://msdn.microsoft.com/en-us/library/ms141715.aspx)
  • Determine whether to use SQL Joins or SSIS lookup or merge join transformations
    • Merge Tranforms

      • Merge (similar to UNION)
        • Input must be sorted in same order
        • Sort in data source when possible
        • Mark source as sorted in properties
        • Input column metadata must be same
      • Merge join
        • Supports inner, left, and full
        • Requires both inputs to sorted in the same order
        • Join columns must have same metadata
    • Lookup Transforms
      • Can use OLEDB connection or cache connection
      • Can treat unmatched rows as errors or regular output
      • Can use caching for rows matched or to pre-load to reduce trips to data source
    • Data Modification Transforms
      • Derived column
      • Data conversion
      • Character map 
  • Batch processing vs. row by row processing
  • Determine the appropriate transform to use for a specific task
    • Slowly Changing Dimension Transformation (http://msdn.microsoft.com/en-us/ms141715)
      • Changing attribute – overwrite existing records
      • Historical attribute – create new records for changes
      • Fixed attribute – column cannot change
      • Inferred member – reference dimension value not loaded yet
  • Determine the need and method for identity mapping and deduplicating
  • Fuzzy lookup, fuzzy grouping, and Data Quality Services (DQS) transformation
  • Determine the need for text mining
  • Determine the need for custom data sources, destinations, and transforms
  • Determine what to do with erroneous rows (http://msdn.microsoft.com/en-us/library/ms140083.aspx)
    • Every transform can output an error
    • Types of errors include data truncation, no matches, and type conversions
    • Each type of error can be handled with different responses
    • OnError Event 
  • Determine auditing needs
  • Determine sampling needs for data mining
  • Trusted/authoritative data sources, including warehouse metadata

 

Implement data flow

  • Debug data flow (http://msdn.microsoft.com/en-us/library/ms137944.aspx)
    • Data Viewers – display data between two components
    • Row Counts
    • Progress Reporting – Color coded for success, errors, running, or waiting
  • Use the appropriate data flow component
  • SQL/SSIS data transformation
    • Split
      • Mulit-cast – allows you to send the same data to multiple places
      • Conditional – allows split data to different destinations
  • Create SSIS packages that support slowly changing dimensions (http://msdn.microsoft.com/en-us/library/ms141662.aspx)
  • Use the Lookup task in SSIS
  • Map identities using SSIS Fuzzy Lookup
  • Specify a data source and destination
  • Use data flows
  • Different categories of transformations (http://msdn.microsoft.com/en-us/library/ms141713(v=sql.110).aspx)
    • Business Intelligence
    • Row
    • Rowset
    • Split and Join
    • Auditing
    • Custom
  • Read, transform, and load data
  • Understand which transforms to use to accomplish a specific business task
  • Data correction transformation
  • Performance tune an SSIS dataflow
  • Optimize Integration Services packages for speed of execution
  • Maintain data integrity, including good data flow

 

Implement script tasks in SSIS

 

Preparation resources

Series Navigation<< Exam 70-458 – Objective 4 – Design and implement a data warehouseExam 70-458 – Objective 6 – Load data >>

Related Posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.