Share post
Share post
Share post
Blog: Technical Guides
Omnata has you covered with SQL Server replication strategies
Omnata has you covered with SQL Server replication strategies
Snowflake |
Database connectors
By: Chris Chandler
16 January 2025
Omnata’s SQL Server Plugin is architected to be flexible for a wide range of use cases when pulling data into Snowflake.
In our last post, we outlined the expanded database connectivity options between Snowflake and SQL Server to now include AWS and Azure Private link, secure tunnels by ngrok and direct connections.
In this post, we’ll talk through the different sync strategies supported by Omnata and provide a decision tree on when you should choose them. *Scroll down to the bottom to see the diagram*
Start with your SQL Server
How is it structured?
How much data are you trying to move? How often?
What do you have available for your use case? CDC, CT, ID columns?
What will be the most efficient for data replication?
In general, there are trade-offs with replication strategies and bottlenecks can appear in different places depending on the use case. CDC and CT are more efficient than direct table/view access syncs, but these features are not available in many scenarios.
Sync Strategies
Change Data Capture (CDC)
Change Data Capture (CDC) is a feature in SQL Server designed to capture insert, update, and delete activity in a table. CDC leverages the SQL Server transaction log to record changes, allowing users to retrieve these changes in a structured format.
Changes are captured asynchronously, and the details are stored in system tables with the key benefits:
Minimal impact on the source system’s performance.
Granular change data with a historical perspective.
Easy integration with external data pipelines.
However, CDC requires additional configuration and can consume extra storage, as it maintains system tables to track changes. In addition, CDC processes need to be restarted when change to the table are made so it may not suit deployments undergoing frequent changes.
Change Tracking (CT)
Change Tracking (CT) is a lightweight mechanism for identifying changes to data without capturing the specific details of each change. It provides a straightforward way to determine whether a row has been inserted, updated, or deleted since the last sync.
Unlike CDC, CT does not record the actual data changes or maintain a historical trail. Instead, it maintains version information for each table and row, allowing applications to query for changes since a specific version.
While CT is efficient and easy to set up, its limitations include the lack of historical data and the inability to capture details about the changes themselves.
Direct table access: High Watermark and Full-Refresh
When CDC or CT are not suitable, Omnata can directly query tables and views using a high-watermark or a full-refresh. Both of these are less efficient sync strategies, and they may be the only method available for some deployments.
High Watermark
This strategy uses a column (cursor field), such as a timestamp or a sequential ID, to identify changes. Omnata keeps track of the maximum value (the "high watermark") from the previous run to retrieve only new or updated rows. This method doesn't require any extra database config, but you do need an appropriate column, so it can require involve some extra data work on the source tables. We recommend you have an index on the cursor field for optimal performance.
Benefits:
No additional database configuration is required.
Flexible and adaptable to various schemas.
Drawbacks:
Relies on well-maintained timestamp or ID columns.
Cannot easily detect deletes unless additional mechanisms are used.
Full-Refresh
As a final fall-back, the entire table can be queried and reloaded by each sync run. While straightforward, this method can be resource-intensive and is best suited for smaller datasets or situations where the overhead of tracking changes is too high.
Benefits:
Simple to set up and maintain.
Ensures complete consistency between source and target.
Captures deletes
Drawbacks:
Inefficient for large datasets due to high data transfer and processing costs.
May result in unnecessary overwrites and increased load on the source system.
What about the deletes? Are they urgent or not?
Hybrid: High-Watermark with Periodic Full-Refresh
A hybrid approach combines high-watermark with periodic full-refreshes to capture deletes and ensure full consistency. This can be configured using two syncs running on different schedules, for example, the high-watermark sync runs intraday, with a full-refresh running overnight to capture deletes for the day.
Benefits:
Ensures deleted records are eventually captured.
Combines the efficiency of incremental updates with the completeness of a full-refresh.
Reduces the overhead of frequent full-refresh queries.
Drawbacks:
Adds complexity to the implementation, requiring both incremental and full-refresh logic.
Full-refresh queries can still be resource-intensive, though less frequent.
Pick your strategy (or strategies) and configure it in Omnata
The choice of replication strategy depends on several factors, including the size of the dataset, the need for detailed change information, source database performance, and the target latency.
CDC when detailed change tracking and historical data are required.
Change Tracking as a fall-back from CDC.
High-watermark or full-refresh as a fall-back for every other scenario.
To make it easier for you, we built this decision tree that lines up to the Ommata Sync configuration flow:
Pick your connection (that you’ve already created)
Configure Source - This is where you set the Sync Type (the blue circles)
Configure Sync - This is where you choose the fetch and storage behaviours, called Sync Strategy (purple circles).
CDC and CT can only be set for all tables in the sync using ‘Auto’ and the option to Keep or Merge Changes, otherwise known as, Append or Merge.
Direct table queries are usually customized for each table (streams) in the sync, but you can also set one strategy for all tables.
When customizing sync strategy for each table, they are named Incremental and Full-Refresh, with storage behaviours Append and Merge.
Deploy sync - This is where you pick the sync scheduling method and complete the flow.
As with all of our connectors, you can configure as many syncs and run it as often as you'd like for the same flat fee. So you're free to run as many combinations as you need to suit your use case.
Try Omnata's SQL Server Plugin
You can install the Omnata Sync Engine for free and start a 7-day free trial of the SQL Server connector on the Snowflake Marketplace, or get in touch to learn more and arrange a demo or POC.
Omnata’s SQL Server Plugin is architected to be flexible for a wide range of use cases when pulling data into Snowflake.
In our last post, we outlined the expanded database connectivity options between Snowflake and SQL Server to now include AWS and Azure Private link, secure tunnels by ngrok and direct connections.
In this post, we’ll talk through the different sync strategies supported by Omnata and provide a decision tree on when you should choose them. *Scroll down to the bottom to see the diagram*
Start with your SQL Server
How is it structured?
How much data are you trying to move? How often?
What do you have available for your use case? CDC, CT, ID columns?
What will be the most efficient for data replication?
In general, there are trade-offs with replication strategies and bottlenecks can appear in different places depending on the use case. CDC and CT are more efficient than direct table/view access syncs, but these features are not available in many scenarios.
Sync Strategies
Change Data Capture (CDC)
Change Data Capture (CDC) is a feature in SQL Server designed to capture insert, update, and delete activity in a table. CDC leverages the SQL Server transaction log to record changes, allowing users to retrieve these changes in a structured format.
Changes are captured asynchronously, and the details are stored in system tables with the key benefits:
Minimal impact on the source system’s performance.
Granular change data with a historical perspective.
Easy integration with external data pipelines.
However, CDC requires additional configuration and can consume extra storage, as it maintains system tables to track changes. In addition, CDC processes need to be restarted when change to the table are made so it may not suit deployments undergoing frequent changes.
Change Tracking (CT)
Change Tracking (CT) is a lightweight mechanism for identifying changes to data without capturing the specific details of each change. It provides a straightforward way to determine whether a row has been inserted, updated, or deleted since the last sync.
Unlike CDC, CT does not record the actual data changes or maintain a historical trail. Instead, it maintains version information for each table and row, allowing applications to query for changes since a specific version.
While CT is efficient and easy to set up, its limitations include the lack of historical data and the inability to capture details about the changes themselves.
Direct table access: High Watermark and Full-Refresh
When CDC or CT are not suitable, Omnata can directly query tables and views using a high-watermark or a full-refresh. Both of these are less efficient sync strategies, and they may be the only method available for some deployments.
High Watermark
This strategy uses a column (cursor field), such as a timestamp or a sequential ID, to identify changes. Omnata keeps track of the maximum value (the "high watermark") from the previous run to retrieve only new or updated rows. This method doesn't require any extra database config, but you do need an appropriate column, so it can require involve some extra data work on the source tables. We recommend you have an index on the cursor field for optimal performance.
Benefits:
No additional database configuration is required.
Flexible and adaptable to various schemas.
Drawbacks:
Relies on well-maintained timestamp or ID columns.
Cannot easily detect deletes unless additional mechanisms are used.
Full-Refresh
As a final fall-back, the entire table can be queried and reloaded by each sync run. While straightforward, this method can be resource-intensive and is best suited for smaller datasets or situations where the overhead of tracking changes is too high.
Benefits:
Simple to set up and maintain.
Ensures complete consistency between source and target.
Captures deletes
Drawbacks:
Inefficient for large datasets due to high data transfer and processing costs.
May result in unnecessary overwrites and increased load on the source system.
What about the deletes? Are they urgent or not?
Hybrid: High-Watermark with Periodic Full-Refresh
A hybrid approach combines high-watermark with periodic full-refreshes to capture deletes and ensure full consistency. This can be configured using two syncs running on different schedules, for example, the high-watermark sync runs intraday, with a full-refresh running overnight to capture deletes for the day.
Benefits:
Ensures deleted records are eventually captured.
Combines the efficiency of incremental updates with the completeness of a full-refresh.
Reduces the overhead of frequent full-refresh queries.
Drawbacks:
Adds complexity to the implementation, requiring both incremental and full-refresh logic.
Full-refresh queries can still be resource-intensive, though less frequent.
Pick your strategy (or strategies) and configure it in Omnata
The choice of replication strategy depends on several factors, including the size of the dataset, the need for detailed change information, source database performance, and the target latency.
CDC when detailed change tracking and historical data are required.
Change Tracking as a fall-back from CDC.
High-watermark or full-refresh as a fall-back for every other scenario.
To make it easier for you, we built this decision tree that lines up to the Ommata Sync configuration flow:
Pick your connection (that you’ve already created)
Configure Source - This is where you set the Sync Type (the blue circles)
Configure Sync - This is where you choose the fetch and storage behaviours, called Sync Strategy (purple circles).
CDC and CT can only be set for all tables in the sync using ‘Auto’ and the option to Keep or Merge Changes, otherwise known as, Append or Merge.
Direct table queries are usually customized for each table (streams) in the sync, but you can also set one strategy for all tables.
When customizing sync strategy for each table, they are named Incremental and Full-Refresh, with storage behaviours Append and Merge.
Deploy sync - This is where you pick the sync scheduling method and complete the flow.
As with all of our connectors, you can configure as many syncs and run it as often as you'd like for the same flat fee. So you're free to run as many combinations as you need to suit your use case.
Try Omnata's SQL Server Plugin
You can install the Omnata Sync Engine for free and start a 7-day free trial of the SQL Server connector on the Snowflake Marketplace, or get in touch to learn more and arrange a demo or POC.
Omnata’s SQL Server Plugin is architected to be flexible for a wide range of use cases when pulling data into Snowflake.
In our last post, we outlined the expanded database connectivity options between Snowflake and SQL Server to now include AWS and Azure Private link, secure tunnels by ngrok and direct connections.
In this post, we’ll talk through the different sync strategies supported by Omnata and provide a decision tree on when you should choose them. *Scroll down to the bottom to see the diagram*
Start with your SQL Server
How is it structured?
How much data are you trying to move? How often?
What do you have available for your use case? CDC, CT, ID columns?
What will be the most efficient for data replication?
In general, there are trade-offs with replication strategies and bottlenecks can appear in different places depending on the use case. CDC and CT are more efficient than direct table/view access syncs, but these features are not available in many scenarios.
Sync Strategies
Change Data Capture (CDC)
Change Data Capture (CDC) is a feature in SQL Server designed to capture insert, update, and delete activity in a table. CDC leverages the SQL Server transaction log to record changes, allowing users to retrieve these changes in a structured format.
Changes are captured asynchronously, and the details are stored in system tables with the key benefits:
Minimal impact on the source system’s performance.
Granular change data with a historical perspective.
Easy integration with external data pipelines.
However, CDC requires additional configuration and can consume extra storage, as it maintains system tables to track changes. In addition, CDC processes need to be restarted when change to the table are made so it may not suit deployments undergoing frequent changes.
Change Tracking (CT)
Change Tracking (CT) is a lightweight mechanism for identifying changes to data without capturing the specific details of each change. It provides a straightforward way to determine whether a row has been inserted, updated, or deleted since the last sync.
Unlike CDC, CT does not record the actual data changes or maintain a historical trail. Instead, it maintains version information for each table and row, allowing applications to query for changes since a specific version.
While CT is efficient and easy to set up, its limitations include the lack of historical data and the inability to capture details about the changes themselves.
Direct table access: High Watermark and Full-Refresh
When CDC or CT are not suitable, Omnata can directly query tables and views using a high-watermark or a full-refresh. Both of these are less efficient sync strategies, and they may be the only method available for some deployments.
High Watermark
This strategy uses a column (cursor field), such as a timestamp or a sequential ID, to identify changes. Omnata keeps track of the maximum value (the "high watermark") from the previous run to retrieve only new or updated rows. This method doesn't require any extra database config, but you do need an appropriate column, so it can require involve some extra data work on the source tables. We recommend you have an index on the cursor field for optimal performance.
Benefits:
No additional database configuration is required.
Flexible and adaptable to various schemas.
Drawbacks:
Relies on well-maintained timestamp or ID columns.
Cannot easily detect deletes unless additional mechanisms are used.
Full-Refresh
As a final fall-back, the entire table can be queried and reloaded by each sync run. While straightforward, this method can be resource-intensive and is best suited for smaller datasets or situations where the overhead of tracking changes is too high.
Benefits:
Simple to set up and maintain.
Ensures complete consistency between source and target.
Captures deletes
Drawbacks:
Inefficient for large datasets due to high data transfer and processing costs.
May result in unnecessary overwrites and increased load on the source system.
What about the deletes? Are they urgent or not?
Hybrid: High-Watermark with Periodic Full-Refresh
A hybrid approach combines high-watermark with periodic full-refreshes to capture deletes and ensure full consistency. This can be configured using two syncs running on different schedules, for example, the high-watermark sync runs intraday, with a full-refresh running overnight to capture deletes for the day.
Benefits:
Ensures deleted records are eventually captured.
Combines the efficiency of incremental updates with the completeness of a full-refresh.
Reduces the overhead of frequent full-refresh queries.
Drawbacks:
Adds complexity to the implementation, requiring both incremental and full-refresh logic.
Full-refresh queries can still be resource-intensive, though less frequent.
Pick your strategy (or strategies) and configure it in Omnata
The choice of replication strategy depends on several factors, including the size of the dataset, the need for detailed change information, source database performance, and the target latency.
CDC when detailed change tracking and historical data are required.
Change Tracking as a fall-back from CDC.
High-watermark or full-refresh as a fall-back for every other scenario.
To make it easier for you, we built this decision tree that lines up to the Ommata Sync configuration flow:
Pick your connection (that you’ve already created)
Configure Source - This is where you set the Sync Type (the blue circles)
Configure Sync - This is where you choose the fetch and storage behaviours, called Sync Strategy (purple circles).
CDC and CT can only be set for all tables in the sync using ‘Auto’ and the option to Keep or Merge Changes, otherwise known as, Append or Merge.
Direct table queries are usually customized for each table (streams) in the sync, but you can also set one strategy for all tables.
When customizing sync strategy for each table, they are named Incremental and Full-Refresh, with storage behaviours Append and Merge.
Deploy sync - This is where you pick the sync scheduling method and complete the flow.
As with all of our connectors, you can configure as many syncs and run it as often as you'd like for the same flat fee. So you're free to run as many combinations as you need to suit your use case.
Try Omnata's SQL Server Plugin
You can install the Omnata Sync Engine for free and start a 7-day free trial of the SQL Server connector on the Snowflake Marketplace, or get in touch to learn more and arrange a demo or POC.
subscribe
Deliver the goods to your inbox
subscribe
Deliver the goods to your inbox
subscribe