Task Factory Dimension Merge SCD Transform
Dimension Merge SCD Transform
Description
Transform Icon | Transform Description |
---|---|
The Dimension Merge SCD Transform is a custom data flow component used within SSIS to handle slowly changing dimension processing commonly used in data warehouses. |
While SSDT/BIDS provides a standard SCD component, you may experience the following limitations:
- Slow performance
- Re-running the SCD Wizard can destroy parts of the data flow
- The component can't be used against all data providers
- Difficulty in troubleshooting the standard component’s decision making abilities
Task Factory’s Dimension Merge SCD Transform maintains the functionality of the original SCD component while also providing:
- Better performance
- Non-destructive editing capabilities
- The ability to use a variety of data providers
- Improved UI that empowers you to customize the component’s configuration and view/troubleshoot decision-making
- Input/Output auditing
- Improved logging capabilities
Setup
When using the Dimension Merge SCD Transform, you begin by connecting two of the following:
Connector | Description |
---|---|
Existing Dimension | This input provides data from a data warehouse's dimension table and must include the housekeeping columns (SCD2, effective, and expiry). Note: Although optional, it's recommended that you sort this input on the business key. |
Source System | This input provides data from the source OLTP system needed to compare and insert/update to the data warehouse's dimension table. Source system and existing dimension data types must match. Note: Although optional, it's recommended that you sort this input on the business key. |
Special Members | (Optional) In typical dimensions, special or unknown members may be used to link fact table rows to NULL or unknown dimension values. The special members input allows you to specify and maintain these special members in a data warehouse dimension table. (Special member rows should be treated as Type 1 changes to avoid duplicate versions of unknown values). Like the previous two inputs, it's recommended the Special Members input also be sorted on the business key. |
Memory Optimized Property
As of version 4.2.0.402, the Dimension Merge Slowly Changing Dimensions component includes a new Memory Optimized property. This new property was introduced due to the retention of a large number of records that remained in memory in the original behavior. This caused memory usage of the package to grow significantly over the package execution time. By requiring the inputs to be sorted, this new model can process and remove rows from memory much earlier which reduces the overall memory footprint, and speeds up execution.
Note: The following additional information about the Memory Optimized property:
- When set to false, the component operates as it did in the past. Upgraded packages have this property set to False by default.
- New packages have this property set to True and have to meet certain restrictions (listed below). Otherwise you may encounter errors.
- If the inputs are not marked as sorted, the component uses the original behavior, even if Memory Optimized is set to True.
- If the property is set to True, all input rows must be sorted by the Business Key using a binary sort order. For example:
Existing Dimension Input Column Definitions Tab
From this tab, you can identify how each column in the existing dimension participates in the SCD transform. Columns can be defined as one of the following:
Column | Description |
---|---|
Not Used | Declares the column to not be used in the SCD processing. |
Business Key | Identifies the business key column. |
Surrogate Key | Identifies the surrogate key column. |
SCD0 (Send To Invalid Input) | Identifies a column for Type 0 changes. Any change is sent to the Invalid Input's output. |
SCD0 (Ignore Changes) | Identifies a column for Type 0 changes. Any change is ignored and sent to the Unchanged output. |
SCD1 | Identifies a column for Type 1 changes (dimension is overwritten with new values). |
SCD2 | Identifies a column for Type 2 changes (retains historical data). |
SCD2 Current Record | A boolean column that can be used to identify the current record among related historical data. |
Audit Column - Last Changed | A column that can be used to identify the moment a row was last changed. |
Audit Column - Row Added | A column that can be used to identify a row that has been added. |
Audit Column - SCD1 Update | A column that can be used to identify an SCD1 updated row. |
Audit Column - SCD2 Change | A column that can be used to identify a row that has undergone an SCD2 change. |
Inferred Member Indicator | A DT_WSTR column used to identify fact table references to a dimension that hasn't been loaded. |
Column Mapping Tab
In the Column Mapping tab, you can specify the mapping between the existing dimension and source system.
Row Change Detection Tab
In the Row Change Detection tab, you can define how each column detects changes including case sensitivity, leading/trailing space, and columns with NULL values.
SCD2 Date Handling Tab
In the SCD2 Date Handling tab, you are given a variety of options for handling SCD2 dates including:
- Assigning now and expiry to system or user-created variables.
- The ability to select a granularity that ranges from day to millisecond (whose precision is also configurable).
- The ability to define a fixed-input expiry date.
For example: setting the granularity to millisecond with a last record's expiry date set to MS SQL DateTime Max requires a datetime entry of 9999/12/31 23:59:59.997. If a developer mistakenly uses .999, this causes a mismatch and sends unchanged records to the New output.
Surrogate Key Handling Tab
In this tab, you are given the option to manage surrogate key assignments to new rows or pass through the existing surrogate key and let the SQL identity column manage the assignment.
Inferred Member Behavior Tab
Use the Inferred Member Behaviour tab to confirm any inferred members that you have defined in the Existing Dimension Input Column Definitions tab. In SSIS, an inferred member acts as a placeholder value for currently unknown, or unloaded information. Inferred members are often used to replace these unknown or unloaded values that you don't want to treat as a NULL value. Inferred member values are replaced once the desired data loads into the package.
If you have tagged any column as an inferred member, those columns will display in the Inferred Member Behavior tab. Enter Y to indicate the value as True and confirm that the selected column is an inferred member, or select enter N to indicate that the value is False and that the column is not an inferred member.
Output Column Selection Tab
In the Output Column Selection Tab, you can select which outputs to attach to the component and which columns are included/excluded in the output.
The final step in configuring the Dimension Merge SCD is to attach all of the outputs configured in the Output Column Selection tab. The following is a list of all available outputs as well as a brief description:
Output | Description |
---|---|
Auditing | The Auditing output processes Input/output counter information configured in the Auditing tab. If you don't want to include the Auditing output, you can dismiss the warning by selecting the Auditing output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Deleted | The Deleted output manages the records that are present and active within the Existing Dimension, but no longer appear within the Source System data stream. Records passed down the Deleted output can either be expired or deleted from the Existing Dimension. Expiration/deletion from the Existing Dimension are based on the Surrogate Key column(s). If you want your Deleted output to be expired, you can redirect your deleted rows to the Expired SCD2 with SCD1 Changes output. Additionally, if you don't want to include the Deleted output, you can dismiss the warning by selecting the Deleted output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Expired SCD2 with SCD1 Changes | The Expired SCD2 with SCD1 Changes output expires current SCD2 records that are being updated in the existing dimension. Records passed down the Expired SCD2 with SCD1 Changes are expired from the Existing Dimension based on the Surrogate Key column(s). If you don't want to include the Expired SCD2 with SCD1 Changes output, you can dismiss the warning by selecting the Expired SCD2 with SCD1 Changes output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Invalid Input | The Invalid Input output handles any invalid rows that match the configured rules found on the Row Change Detection tab (such as NULL values in columns not configured as Nullable). If you don't want to include the Invalid Input output, you can dismiss the warning by selecting the Invalid Input output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
New | The New output handles records that are not present within the Existing Dimension, but found within the Source System. Any record passed down the New output needs to be inserted into the Existing Dimension. If you don't want to include the New output, you can dismiss the warning by selecting the New output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
New SCD2 | The New SCD2 output works along side the Expired SCD2 with SCD1 Changes output to insert a new record for historical records. If you don't want to have an additional output, you can redirect the New SCD2 output to the New output. If you don't want to include the New SCD2 output, you can dismiss the warning by selecting the New SCD2 output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Statistics | The Statistics output processes statistical information for the current package execution such as the number of Source System Input records, Existing Dimension Input records, and Interval Start/End times. If you don't want to include the Statistics output, you can dismiss the warning by selecting the Statistics output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Unchanged | The Unchanged output is an optional output that outputs all rows that have no changes between the Existing Dimension and the Source System data stream, or rows, that do not appear in the Source System data stream and are expired in the Existing Dimension. If you don't want to include the Unchanged output, you can dismiss the warning by selecting the Unchanged output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Updated SCD1 | The Updated SCD1 output records that contain updated values to non-historical columns. If you don't want to include the Updated SCD1 output, you can dismiss the warning by selecting the Updated SCD1 output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button. |
Auditing Tab
In the Auditing tab, you can set variables to display dimension, source system, and output counters that can be analyzed post-execution.
Logging Tab
Use the Logging tab to provide more descriptive information of the execution process viewed in the Execution Results/Progress tab within SSDT/BIDS.
Performance Tab
In this tab, you can alter performance settings including:
Option | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Warning Suppression | Dismisses warnings when circumstances require a condition be ignored (such as not sorting input business keys). | ||||||||||
Timeout | Developers can define a time (in seconds) to end execution for debugging. | ||||||||||
Threading | Sets the number of threads available in two pools - one for matching keys and the other for processing rows. Users can maximize processing speed by configuring the component as follows:
|
||||||||||
Hashing | Security feature that allows you to determine which hash algorithm to use during processing. | ||||||||||
Use Memory Optimized Mode | Sets the Memory Optimized property. For more information, see Memory Optimized Property under the Setup tab. | ||||||||||
Guid Key Sort Algorithm | Identifies the method in which the input data is sorted when using Guid as relationship keys. (This option only appears when Guid keys are used.) The following methods can be used:
|
Support Articles
Table Locking with SCD Merge
The above article provides a solution for slowly changing Dimension Merge components that appear to hang and not complete execution.
Additional Videos
Dimension Merge SCD - Configure Native SCD Example
Dimension Merge SCD - How to use it
Dimension Merge SCD - Intro to Sample Package
Dimension Merge SCD - Intro to test data
Dimension Merge SCD - T-SQL Merge Example