awswrangler.athena.unload

awswrangler.athena.unload(sql: str, path: str, database: str, file_format: str = 'PARQUET', compression: str | None = None, field_delimiter: str | None = None, partitioned_by: list[str] | None = None, workgroup: str = 'primary', encryption: str | None = None, kms_key: str | None = None, boto3_session: Session | None = None, data_source: str | None = None, params: dict[str, Any] | list[str] | None = None, paramstyle: Literal['qmark', 'named'] = 'named', athena_query_wait_polling_delay: float = 1.0) _QueryMetadata

Write query results from a SELECT statement to the specified data format using UNLOAD.

https://docs.aws.amazon.com/athena/latest/ug/unload.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • athena_query_wait_polling_delay

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – SQL query.

  • path (str, optional) – Amazon S3 path.

  • database (str) – AWS Glue/Athena database name - It is only the origin database from where the query will be launched. You can still using and mixing several databases writing the full table name within the sql (e.g. database.table).

  • file_format (str) – File format of the output. Possible values are ORC, PARQUET, AVRO, JSON, or TEXTFILE

  • compression (str, optional) – This option is specific to the ORC and Parquet formats. For ORC, possible values are lz4, snappy, zlib, or zstd. For Parquet, possible values are gzip or snappy. For ORC, the default is zlib, and for Parquet, the default is gzip.

  • field_delimiter (str) – A single-character field delimiter for files in CSV, TSV, and other text formats.

  • partitioned_by (list[str], optional) – An array list of columns by which the output is partitioned.

  • workgroup (str) – Athena workgroup. Primary by default.

  • encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Notice: ‘CSE_KMS’ is not supported.

  • kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

  • params (Dict[str, any] | List[str], optional) –

    Parameters that will be used for constructing the SQL query. Only named or question mark parameters are supported. The parameter style needs to be specified in the paramstyle parameter.

    For paramstyle="named", this value needs to be a dictionary. The dict needs to contain the information in the form {'name': 'value'} and the SQL query needs to contain :name. The formatter will be applied client-side in this scenario.

    For paramstyle="qmark", this value needs to be a list of strings. The formatter will be applied server-side. The values are applied sequentially to the parameters in the query in the order in which the parameters occur.

  • paramstyle (str, optional) –

    Determines the style of params. Possible values are:

    • named

    • qmark

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

Returns:

Query metadata including query execution id, dtypes, manifest & output location.

Return type:

_QueryMetadata

Examples

>>> import awswrangler as wr
>>> res = wr.athena.unload(
...     sql="SELECT * FROM my_table WHERE name=:name AND city=:city",
...     params={"name": "filtered_name", "city": "filtered_city"}
... )