AWS Redshift Data sharing: Cluster-to-Cluster / Sharing to a Unified Cluster

Abdul Rafee Wahab
4 min readApr 24, 2023
Photo by Maxim Hopman on Unsplash

Background

Amazon Redshift is a cloud-based data warehousing solution that allows organizations to store and analyze large volumes of data efficiently.

One of the key offerings of Redshift is its ability to handle data sharing between different Redshift clusters, which is critical for organizations that need to share data between multiple teams or business units.

Without having to copy or move data. This is a game-changer, since it removes the dependency for having to copy/move data each time the underlying data/schema evolves.

Approaches for Redshift Data Sharing

From my experience working with data sharing for the last 8 months, there are two main approaches to data sharing via Redshift:

  1. Cluster-to-Cluster data sharing
  2. Sharing data to a Unified Redshift cluster

In this segment, we’ll explore together the advantages and disadvantages of each approach, and provide guidance on when to implement each one.

Cluster-to-Cluster Data Sharing

Arch. Diagram Source: Abdul R. Wahab

Cluster-to-Cluster data sharing in Redshift involves setting up one or more “producer” clusters that share data with one (or more) “consumer” clusters.

Producer clusters can push data to consumer clusters in near real-time using a variety of data sharing methods, such as snapshot sharing, cross-region snapshot sharing, and enhanced VPC routing.

Advantages of Cluster-to-Cluster Data Sharing

Performance

Cluster-to-Cluster data sharing allows for the highest performance because data is shared directly between clusters without the need for an intermediary.

Security

Cluster-to-Cluster data sharing provides a high level of security because data is encrypted in transit and at rest.

Flexibility

Cluster-to-Cluster data sharing allows for greater flexibility because each cluster can have its own schema, security settings, and data retention policies.

Use Cases to Implement Cluster-to-Cluster Data Sharing

High-Volume Data Sharing

Cluster-to-Cluster data sharing is best suited for organizations that need to share large volumes of data between clusters in near real-time.

Multiple Teams/Business Units

Cluster-to-Cluster data sharing is ideal for organizations with multiple teams or business units that need access to the same data.

High Performance Requirements

Cluster-to-Cluster data sharing is best suited for organizations with high performance requirements, such as real-time analytics or machine learning applications.

Sharing Data to a Unified Redshift Cluster

Arch. Diagram Source: Abdul R. Wahab

Sharing data to a Unified Redshift cluster involves setting up a single, unified cluster that serves as a data repository for all interested business/data producer areas.

This approach involves extracting data from various data producer clusters and loading it into the Unified Redshift cluster.

Advantages of Sharing Data to a Unified Redshift Cluster

Unified Control

Sharing data to a Unified Redshift cluster provides unified control over data access, security, and governance.

Cost Savings

Sharing data to a Unified Redshift cluster can be more cost-effective because it eliminates the need for multiple clusters and reduces the amount of data processing.

Data Consistency

Sharing data to a Unified Redshift cluster can improve data consistency because all interested parties are accessing the same data.

Use Cases to Implement Sharing Data to a Unified Redshift Cluster

Data Governance

Sharing data to a Unified Redshift cluster is best suited for organizations that require strict data governance and control over data access.

Cost-Effectiveness

Sharing data to a Unified Redshift cluster is ideal for organizations that want to minimize costs by reducing the number of clusters and data movement.

Data Consistency

Sharing data to a Unified Redshift cluster is best suited for organizations that require data consistency across different teams or business units.

Conclusion

When it comes to data sharing via Redshift, there is no one-size-fits-all solution.

Cluster-to-Cluster data sharing is best suited for organizations that need to share large volumes of data in near real-time and have high-performance requirements.

Sharing data to a Unified Redshift cluster is ideal for organizations that require unified control over data access, security, and governance, and want to minimize costs and improve data consistency.

Ultimately, the decision on whether to implement Cluster-to-Cluster data sharing, or sharing data to a unified Redshift depends on the use case requirements, and the data strategy direction of the organization.

--

--

Abdul Rafee Wahab

Tech guy. I like building cool software, & also leading others in building cool things. All views shared are my own.