AWS Redshift - INSERT & COPY Commands

Abdul Rafee Wahab
2 min readJun 4, 2023

Background

In Redshift, there are 2 primary methods for loading data into a Redshift table: INSERT and COPY commands.

One might ask, what’s the difference? When is it best to use which one?

INSERT Command

Per the official AWS docs, INSERT is a SQL command used to, well, insert data row-by-row into a Redshift table.

When using INSERT, every individual row is inserted as a separate transaction, which can result in slower performance compared to COPY for large data sets.

When to use INSERT?

INSERT is typically used for small-scale data updates / inserts, or for inserting data incrementally.

COPY Command

COPY is a command used to load data in bulk from one or more data sources into a Redshift table.

COPY supports loading data from several sources, including S3, Amazon EMR, DynamoDB, and more.

  • It can handle various file formats such as CSV, JSON, Avro, and more.
  • COPY can perform parallel data loading, leveraging the Redshift cluster's resources for efficient data ingestion.
  • It automatically manages data distribution and parallelism to optimize the loading process.

When to use COPY?

It’s optimized for high-speed data ingestion, and for larger datasets is generally much faster than INSERT statements.

Key Factors when deciding between INSERT & COPY

When deciding between INSERT and COPY, consider the following factors:

Data size

For large data sets, COPY is generally the preferred method due to its faster performance. INSERT is better suited for smaller data volumes.

Data source

If the data is already in a format that Redshift can directly ingest from, such as CSV or JSON files in S3, COPY can streamline the process.

Transactional integrity

If transactional integrity is critical and data needs to be inserted in a row-by-row manner, INSERT provides more control to ensure data integrity.

Data transformation

If data requires transformation or preprocessing before loading it into Redshift, INSERT allows for more flexibility in applying custom logic.

Closing thoughts 👏

In summary, COPY is the recommended method for high-performance bulk data loading into Redshift, especially for larger data sets.

For smaller-scale or incremental data inserts, INSERT provides more control and flexibility.

Rule of 👍 is to consider the specific requirements and characteristics of your data loading process when choosing between INSERT and COPY.

--

--

Abdul Rafee Wahab

Tech guy. I like building cool software, & also leading others in building cool things. All views shared are my own.