awswrangler.redshift.to_sql¶
- awswrangler.redshift.to_sql(df: DataFrame, con: Connection, table: str, schema: str, mode: str = 'append', overwrite_method: str = 'drop', index: bool = False, dtype: Optional[Dict[str, str]] = None, diststyle: str = 'AUTO', distkey: Optional[str] = None, sortstyle: str = 'COMPOUND', sortkey: Optional[List[str]] = None, primary_keys: Optional[List[str]] = None, varchar_lengths_default: int = 256, varchar_lengths: Optional[Dict[str, int]] = None, use_column_names: bool = False, lock: bool = False, chunksize: int = 200, commit_transaction: bool = True) Any ¶
Write records stored in a DataFrame into Redshift.
Note
For large DataFrames (1K+ rows) consider the function wr.redshift.copy().
Note
This function has arguments which can be configured globally through wr.config or environment variables:
chunksize
Check out the Global Configurations Tutorial for details.
- Parameters
df (pandas.DataFrame) – Pandas DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
con (redshift_connector.Connection) – Use redshift_connector.connect() to use ” “credentials directly or wr.redshift.connect() to fetch it from the Glue Catalog.
table (str) – Table name
schema (str) – Schema name
mode (str) – Append, overwrite or upsert.
overwrite_method (str) –
Drop, cascade, truncate, or delete. Only applicable in overwrite mode.
”drop” -
DROP ... RESTRICT
- drops the table. Fails if there are any views that depend on it. “cascade” -DROP ... CASCADE
- drops the table, and all views that depend on it. “truncate” -TRUNCATE ...
- truncates the table, but immediatly commits current transaction & starts a new one, hence the overwrite happens in two transactions and is not atomic. “delete” -DELETE FROM ...
- deletes all rows from the table. Slow relative to the other methods.index (bool) – True to store the DataFrame index as a column in the table, otherwise False to ignore it.
dtype (Dict[str, str], optional) – Dictionary of columns names and Redshift types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘VARCHAR(10)’, ‘col2 name’: ‘FLOAT’}) diststyle : str Redshift distribution styles. Must be in [“AUTO”, “EVEN”, “ALL”, “KEY”]. https://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html
distkey (str, optional) – Specifies a column name or positional number for the distribution key.
sortstyle (str) – Sorting can be “COMPOUND” or “INTERLEAVED”. https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html
sortkey (List[str], optional) – List of columns to be sorted.
primary_keys (List[str], optional) – Primary keys.
varchar_lengths_default (int) – The size that will be set for all VARCHAR columns not specified with varchar_lengths.
varchar_lengths (Dict[str, int], optional) – Dict of VARCHAR length by columns. (e.g. {“col1”: 10, “col5”: 200}).
use_column_names (bool) – If set to True, will use the column names of the DataFrame for generating the INSERT SQL Query. E.g. If the DataFrame has two columns col1 and col3 and use_column_names is True, data will only be inserted into the database columns col1 and col3.
lock (bool) – True to execute LOCK command inside the transaction to force serializable isolation.
chunksize (int) – Number of rows which are inserted with each SQL query. Defaults to inserting 200 rows per query.
commit_transaction (bool) – Whether to commit the transaction. True by default.
- Returns
None.
- Return type
None
Examples
Writing to Redshift using a Glue Catalog Connections
>>> import awswrangler as wr >>> con = wr.redshift.connect("MY_GLUE_CONNECTION") >>> wr.redshift.to_sql( ... df=df, ... table="my_table", ... schema="public", ... con=con ... ) >>> con.close()