awswrangler.db.unload_redshift_to_files

awswrangler.db.unload_redshift_to_files(sql: str, path: str, con: sqlalchemy.engine.base.Engine, iam_role: str, use_threads: bool = True, manifest: bool = False, partition_cols: Optional[List] = None, boto3_session: Optional[boto3.session.Session] = None) → List[str]

Unload Parquet files from a Amazon Redshift query result to parquet files on s3 (Through UNLOAD command).

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Note

In case of use_threads=True the number of threads that will be spawned will be get from os.cpu_count().

Parameters
  • sql (str) – SQL query.

  • path (Union[str, List[str]]) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)

  • con (sqlalchemy.engine.Engine) – SQLAlchemy Engine. Please use, wr.db.get_engine(), wr.db.get_redshift_temp_engine() or wr.catalog.get_engine()

  • iam_role (str) – AWS IAM role with the related permissions.

  • use_threads (bool) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads.

  • manifest (bool) – Unload a manifest file on S3.

  • partition_cols (List[str], optional) – Specifies the partition keys for the unload operation.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns

Paths list with all unloaded files.

Return type

List[str]

Examples

>>> import awswrangler as wr
>>> paths = wr.db.unload_redshift_to_files(
...     sql="SELECT * FROM public.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     con=wr.catalog.get_engine(connection="my_glue_connection"),
...     iam_role="arn:aws:iam::XXX:role/XXX"
... )