awswrangler.db.unload_redshift_to_files

awswrangler.db.unload_redshift_to_files(sql: str, path: str, con: sqlalchemy.engine.base.Engine, iam_role: str, region: Optional[str] = None, max_file_size: Optional[float] = None, kms_key_id: Optional[str] = None, use_threads: bool = True, manifest: bool = False, partition_cols: Optional[List] = None, boto3_session: Optional[boto3.session.Session] = None) → List[str]

Unload Parquet files from a Amazon Redshift query result to parquet files on s3 (Through UNLOAD command).

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Note

In case of use_threads=True the number of threads that will be spawned will be get from os.cpu_count().

Parameters
  • sql (str) – SQL query.

  • path (Union[str, List[str]]) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)

  • con (sqlalchemy.engine.Engine) – SQLAlchemy Engine. Please use, wr.db.get_engine(), wr.db.get_redshift_temp_engine() or wr.catalog.get_engine()

  • iam_role (str) – AWS IAM role with the related permissions.

  • region (str, optional) – Specifies the AWS Region where the target Amazon S3 bucket is located. REGION is required for UNLOAD to an Amazon S3 bucket that isn’t in the same AWS Region as the Amazon Redshift cluster. By default, UNLOAD assumes that the target Amazon S3 bucket is located in the same AWS Region as the Amazon Redshift cluster.

  • max_file_size (float, optional) – Specifies the maximum size (MB) of files that UNLOAD creates in Amazon S3. Specify a decimal value between 5.0 MB and 6200.0 MB. If None, the default maximum file size is 6200.0 MB.

  • kms_key_id (str, optional) – Specifies the key ID for an AWS Key Management Service (AWS KMS) key to be used to encrypt data files on Amazon S3.

  • use_threads (bool) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads.

  • manifest (bool) – Unload a manifest file on S3.

  • partition_cols (List[str], optional) – Specifies the partition keys for the unload operation.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns

Paths list with all unloaded files.

Return type

List[str]

Examples

>>> import awswrangler as wr
>>> paths = wr.db.unload_redshift_to_files(
...     sql="SELECT * FROM public.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     con=wr.catalog.get_engine(connection="my_glue_connection"),
...     iam_role="arn:aws:iam::XXX:role/XXX"
... )