awswrangler.db.unload_redshift

awswrangler.db.unload_redshift(sql: str, path: str, con: sqlalchemy.engine.base.Engine, iam_role: str, categories: List[str] = None, chunked: bool = False, keep_files: bool = False, use_threads: bool = True, boto3_session: Optional[boto3.session.Session] = None, s3_additional_kwargs: Optional[Dict[str, str]] = None) → Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame]]

Load Pandas DataFrame from a Amazon Redshift query result using Parquet files on s3 as stage.

This is a HIGH latency and HIGH throughput alternative to wr.db.read_sql_query()/wr.db.read_sql_table() to extract large Amazon Redshift data into a Pandas DataFrames through the UNLOAD command.

This strategy has more overhead and requires more IAM privileges than the regular wr.db.read_sql_query()/wr.db.read_sql_table() function, so it is only recommended to fetch +1MM rows at once.

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Note

In case of use_threads=True the number of threads that will be spawned will be get from os.cpu_count().

Parameters
  • sql (str) – SQL query.

  • path (Union[str, List[str]]) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)

  • con (sqlalchemy.engine.Engine) – SQLAlchemy Engine. Please use, wr.db.get_engine(), wr.db.get_redshift_temp_engine() or wr.catalog.get_engine()

  • iam_role (str) – AWS IAM role with the related permissions.

  • categories (List[str], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.

  • keep_files (bool) – Should keep the stage files?

  • chunked (bool) – If True will break the data in smaller DataFrames (Non deterministic number of lines). Otherwise return a single DataFrame with the whole data.

  • use_threads (bool) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • s3_additional_kwargs – Forward to s3fs, useful for server side encryption https://s3fs.readthedocs.io/en/latest/#serverside-encryption

Returns

Pandas DataFrame

Return type

pandas.DataFrame

Examples

>>> import awswrangler as wr
>>> import pandas as pd
>>> df = wr.db.unload_redshift(
...     sql="SELECT * FROM public.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     con=wr.catalog.get_engine(connection="my_glue_connection"),
...     iam_role="arn:aws:iam::XXX:role/XXX"
... )