Documentation forTask Factory

Task Factory Dimension Merge SCD Transform

Dimension Merge SCD Transform

Description

Note:  Version 4.2.0.405 introduced the memory optimized property. This is a significant change, and we encourage you to read the memory optimized property section below before configuring the component.
Transform Icon Transform Description
Task Factory Dimension Merge SCD Transform Icon The Dimension Merge SCD Transform is a custom data flow component used within SSIS to handle slowly changing dimension processing commonly used in data warehouses.

While SSDT/BIDS provides a standard SCD component, you may experience  the following limitations:

  • Slow performance
  • Re-running the SCD Wizard can destroy parts of the data flow
  • The component can't be used against all data providers
  • Difficulty in troubleshooting the standard component’s decision making abilities

Task Factory’s Dimension Merge SCD Transform maintains the functionality of the original SCD component while also providing:

  • Better performance
  • Non-destructive editing capabilities
  • The ability to use a variety of data providers
  • Improved UI that empowers you to customize the component’s configuration and view/troubleshoot decision-making
  • Input/Output auditing
  • Improved logging capabilities

Setup

When using the Dimension Merge SCD Transform, you begin by connecting two of the following:

Connector  Description
Existing Dimension This input provides data from a data warehouse's dimension table and must include the housekeeping columns (SCD2, effective, and expiry).

Note:  Although optional, it's recommended that you sort this input on the business key.
Source System This input provides data from the source OLTP system needed to compare and insert/update to the data warehouse's dimension table. Source system and existing dimension data types must match.

Note:  Although optional, it's recommended that you sort this input on the business key.
Special Members (Optional) In typical dimensions, special or unknown members may be used to link fact table rows to NULL or unknown dimension values. The special members input allows you to specify and maintain these special members in a data warehouse dimension table. (Special member rows should be treated as Type 1 changes to avoid duplicate versions of unknown values). Like the previous two inputs, it's recommended the Special Members input also be sorted on the business key.

Task Factory Input Output Selection window

Memory Optimized Property

Task Factory Dimension Merge Slowly Changing Dimension Memory Optimized

As of version 4.2.0.402, the Dimension Merge Slowly Changing Dimensions component includes a new Memory Optimized property. This new property was introduced due to the retention of a large number of records that remained in memory in the original behavior. This caused memory usage of the package to grow significantly over the package execution time. By requiring the inputs to be sorted, this new model can process and remove rows from memory much earlier which reduces the overall memory footprint, and speeds up execution.

Note:  The following additional information about the Memory Optimized property:

  • When set to false, the component operates as it did in the past. Upgraded packages have this property set to False by default.
  • New packages have this property set to True and have to meet certain restrictions (listed below). Otherwise you may encounter errors.
  • If the inputs are not marked as sorted, the component uses the original behavior, even if Memory Optimized is set to True.
  • If the property is set to True, all input rows must be sorted by the Business Key using a binary sort order. For example:
    Task Factory Binary sort order example
Note:  The ORDER BY and COLLATE Latin1_general_bin2 clauses on this product dimension. This should be used when sorting within the source query and setting the Is Sorted/Sort Key properties in the Advanced Editor. When using a file as input, you need to add the SSIS sort component between the source and Dimension Merge component. This is a significant processing change, and it's recommended that you test your packages with both settings before deployment to ensure performance is optimized based on your dataset and configuration.

Existing Dimension Input Column Definitions Tab

From this tab, you can identify how each column in the existing dimension participates in the SCD transform. Columns can be defined as one of the following:

Note:  You may see an error at the bottom stating there is no Business Key selected, the Business Key can be selected from the SCD Column Type column.
Column Description
Not Used Declares the column to not be used in the SCD processing.
Business Key Identifies the business key column.
Surrogate Key Identifies the surrogate key column.
SCD0 (Send To Invalid Input) Identifies a column for Type 0 changes. Any change is sent to the Invalid Input's output.
SCD0 (Ignore Changes) Identifies a column for Type 0 changes. Any change is ignored and sent to the Unchanged output.
SCD1 Identifies a column for Type 1 changes (dimension is overwritten with new values).
SCD2 Identifies a column for Type 2 changes (retains historical data).
SCD2 Current Record A boolean column that can be used to identify the current record among related historical data.
Audit Column - Last Changed A column that can be used to identify the moment a row was last changed.
Audit Column - Row Added A column that can be used to identify a row that has been added.
Audit Column - SCD1 Update A column that can be used to identify an SCD1 updated row.
Audit Column - SCD2 Change A column that can be used to identify a row that has undergone an SCD2 change.
Inferred Member Indicator A DT_WSTR column used to identify fact table references to a dimension that hasn't been loaded.

Task Factory Dimension Merge SCD Existing Dimension Input Column Definitions

Column Mapping Tab

In the Column Mapping tab, you can specify the mapping between the existing dimension and source system. Task Factory Dimension Merge SCD Column Mapping

Row Change Detection Tab

In the Row Change Detection tab, you can define how each column detects changes including case sensitivity, leading/trailing space, and columns with NULL values. 

Note:  The Dimension Merge SCD redirects columns containing NULL values to the Invalid Input output unless that column is configured as Nullable.

Task Factory Dimension Merge SCD Row Change Detection

SCD2 Date Handling Tab

In the SCD2 Date Handling tab, you are given a variety of options for handling SCD2 dates including:

  • Assigning now and expiry to system or user-created variables.
  • The ability to select a granularity that ranges from day to millisecond (whose precision is also configurable).
  • The ability to define a fixed-input expiry date.
Important:  You should pay careful attention to the example located to the right of the last record's expiry date. If the expiry column located in your dimension does not match the format of the example, this can cause unchanged rows to be sent to the New output due to the mismatch. 

For example: setting the granularity to millisecond with a last record's expiry date set to MS SQL DateTime Max requires a datetime entry of 9999/12/31 23:59:59.997. If a developer mistakenly uses .999, this causes a mismatch and sends unchanged records to the New output.

Task Factory Dimension Merge SCD SCD 2 Date Handling

Surrogate Key Handling Tab

In this tab, you are given the option to manage surrogate key assignments to new rows or pass through the existing surrogate key and let the SQL identity column manage the assignment. 

Note:  It's recommended to manage the surrogate key assignment somewhere else (first option) for optimal performance.

Task Factory Dimension Merge SCD Surrogate Key Handling

Inferred Member Behavior Tab

Use the Inferred Member Behaviour tab to confirm any inferred members that you have defined in the Existing Dimension Input Column Definitions tab. In SSIS, an inferred member acts as a placeholder value for currently unknown, or unloaded information. Inferred members are often used to replace these unknown or unloaded values that you don't want to treat as a NULL value. Inferred member values are replaced once the desired data loads into the package.

If you have tagged any column as an inferred member, those columns will display in the Inferred Member Behavior tab. Enter Y  to indicate the value as True and confirm that the selected column is an inferred member, or select enter to indicate that the value is False and that the column is not an inferred member. 

Additional Information: For more information about Inferred Members in SSIS, see the Inferred Dimension Members MSDN article.

Task Factory Dimension Merge SCD Inferred Member Behaviour

Output Column Selection Tab

In the Output Column Selection Tab, you can select which outputs to attach to the component and which columns are included/excluded in the output.

Task Factory Dimension Merge SCD Output Column Selection

The final step in configuring the Dimension Merge SCD is to attach all of the outputs configured in the Output Column Selection tab. The following is a list of all available outputs as well as a brief description:

Note:  Once outputs are attached, re-open the Dimension Merge SCD Transform to dismiss warnings referencing unattached outputs. Keep in mind that warnings remain for any output that has been configured in the Output Column Selection tab that does not have the Don't warn that this output isn't attached button selected.
Output Description
Auditing The Auditing output processes Input/output counter information configured in the Auditing tab. If you don't want to include the Auditing output, you can dismiss the warning by selecting the Auditing output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Deleted The Deleted output manages the records that are present and active within the Existing Dimension, but no longer appear within the Source System data stream. Records passed down the Deleted output can either be expired or deleted from the Existing Dimension. Expiration/deletion from the Existing Dimension are based on the Surrogate Key column(s). If you want your Deleted output to be expired, you can redirect your deleted rows to the Expired SCD2 with SCD1 Changes output. Additionally, if you don't want to include the Deleted output, you can dismiss the warning by selecting the Deleted output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Expired SCD2 with SCD1 Changes The Expired SCD2 with SCD1 Changes output expires current SCD2 records that are being updated in the existing dimension. Records passed down the Expired SCD2 with SCD1 Changes are expired from the Existing Dimension based on the Surrogate Key column(s). If you don't want to include the Expired SCD2 with SCD1 Changes output, you can dismiss the warning by selecting the Expired SCD2 with SCD1 Changes output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Invalid Input The Invalid Input output handles any invalid rows that match the configured rules found on the Row Change Detection tab (such as NULL values in columns not configured as Nullable). If you don't want to include the Invalid Input output, you can dismiss the warning by selecting the Invalid Input output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
New The New output handles records that are not present within the Existing Dimension, but found within the Source System. Any record passed down the New output needs to be inserted into the Existing Dimension. If you don't want to include the New output, you can dismiss the warning by selecting the New output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
New SCD2 The New SCD2 output works along side the Expired SCD2 with SCD1 Changes output to insert a new record for historical records. If you don't want to have an additional output, you can redirect the New SCD2 output to the New output. If you don't want to include the New SCD2 output, you can dismiss the warning by selecting the New SCD2 output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Statistics The Statistics output processes statistical information for the current package execution such as the number of Source System Input records, Existing Dimension Input records, and Interval Start/End times. If you don't want to include the Statistics output, you can dismiss the warning by selecting the Statistics output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Unchanged The Unchanged output is an optional output that outputs all rows that have no changes between the Existing Dimension and the Source System data stream, or rows, that do not appear in the Source System data stream and are expired in the Existing Dimension. If you don't want to include the Unchanged output, you can dismiss the warning by selecting the Unchanged output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.
Updated SCD1 The Updated SCD1 output records that contain updated values to non-historical columns. If you don't want to include the Updated SCD1 output, you can dismiss the warning by selecting the Updated SCD1 output found on the Output tab, select the Exclude All button, and then select the Don't warn that this output isn't attached button.

Task Factory Dimension Merge Slowly Changing Dimension Package

Auditing Tab

In the Auditing tab, you can set variables to display dimension, source system, and output counters that can be analyzed post-execution.

Task Factory Dimension Merge SCD Auditing

Logging Tab

Use the Logging tab to provide more descriptive information of the execution process viewed in the Execution Results/Progress tab within SSDT/BIDS.

Task Factory Dimension Merge SCD Logging

Performance Tab

In this tab, you can alter performance settings including:

Option Description
Warning Suppression Dismisses warnings when circumstances require a condition be ignored (such as not sorting input business keys).
Timeout Developers can define a time (in seconds) to end execution for debugging.
Threading Sets the number of threads available in two pools - one for matching keys and the other for processing rows. Users can maximize processing speed by configuring the component as follows:
Setting Description
Disabled Sets the value to 1.
Automatically manage thread countsValue is set to a number based on the processor cores of the machine it is being run on. Note:  This option should be considered for smaller environments.
Manually specify thread count per workload typeShould be considered when running on an environment with multiple core processors. Additionally, the automatic option doesn't take into account other processes or packages that might be running, therefore, manually specifying the thread count might be a better option for larger or busy environments.
Hashing Security feature that allows you to determine which hash algorithm to use during processing.
Use Memory Optimized Mode Sets the Memory Optimized property. For more information, see Memory Optimized Property under the Setup tab.
Guid Key Sort Algorithm Identifies the method in which the input data is sorted when using Guid as relationship keys. (This option only appears when Guid keys are used.) The following methods can be used:
SettingDescription
SQL Server Order ByPerformed by using an ORDER BY clause within a query. Additionally, the IsSorted property needs to be set to true and the key sort order set. These are found in the sources Advanced Editors (eg Ole DB Source and ADO.Net Source.)
Sort TransformConfigured when using the SSIS native Sort Transform component between the source(s) and Dimension Merge SCD transform. (Sorting is done by adding an ORDER BY clause, however, the IsSorted and SortKeyPosition should not be set within the advanced editor.)
.NET GuidConfigured when using .NET Guid sorting within an upstream Script Task.
StringSort Guids as strings. Note:  Does not work with Memory Optimized mode set to true.

Task Factory Dimension Merge SCD Performance

Support Articles

Table Locking with SCD Merge

The above article provides a solution for slowly changing Dimension Merge components that appear to hang and not complete execution. 

Additional Videos

Dimension Merge SCD - Configure Native SCD Example

Dimension Merge SCD - How to use it

Dimension Merge SCD - Intro to Sample Package

Dimension Merge SCD - Intro to test data

Dimension Merge SCD - T-SQL Merge Example