AWS Data Wrangler

9 - Redshift - Append, Overwrite and Upsert

Wrangler’s copy/to_sql function has three different mode options for Redshift.

1 - append

2 - overwrite

3 - upsert

[2]:
import awswrangler as wr
import pandas as pd
from datetime import date

con = wr.redshift.connect("aws-data-wrangler-redshift")

Enter your bucket name:

[3]:
import getpass
bucket = getpass.getpass()
path = f"s3://{bucket}/stage/"
 ···········································

Enter your IAM ROLE ARN:

[4]:
iam_role = getpass.getpass()
 ····················································································

Creating the table (Overwriting if it exists)

[10]:
df = pd.DataFrame({
    "id": [1, 2],
    "value": ["foo", "boo"],
    "date": [date(2020, 1, 1), date(2020, 1, 2)]
})

wr.redshift.copy(
    df=df,
    path=path,
    con=con,
    schema="public",
    table="my_table",
    mode="overwrite",
    iam_role=iam_role,
    primary_keys=["id"]
)

wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[10]:
id value date
0 2 boo 2020-01-02
1 1 foo 2020-01-01

Appending

[11]:
df = pd.DataFrame({
    "id": [3],
    "value": ["bar"],
    "date": [date(2020, 1, 3)]
})

wr.redshift.copy(
    df=df,
    path=path,
    con=con,
    schema="public",
    table="my_table",
    mode="append",
    iam_role=iam_role,
    primary_keys=["id"]
)

wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[11]:
id value date
0 1 foo 2020-01-01
1 2 boo 2020-01-02
2 3 bar 2020-01-03

Upserting

[12]:
df = pd.DataFrame({
    "id": [2, 3],
    "value": ["xoo", "bar"],
    "date": [date(2020, 1, 2), date(2020, 1, 3)]
})

wr.redshift.copy(
    df=df,
    path=path,
    con=con,
    schema="public",
    table="my_table",
    mode="upsert",
    iam_role=iam_role,
    primary_keys=["id"]
)

wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[12]:
id value date
0 1 foo 2020-01-01
1 2 xoo 2020-01-02
2 3 bar 2020-01-03

Cleaning Up

[13]:
with con.cursor() as cursor:
    cursor.execute("DROP TABLE public.my_table")
con.close()