awswrangler.athena.start_query_execution

awswrangler.athena.start_query_execution(sql: str, database: Optional[str] = None, s3_output: Optional[str] = None, workgroup: Optional[str] = None, encryption: Optional[str] = None, kms_key: Optional[str] = None, params: Optional[Dict[str, Any]] = None, boto3_session: Optional[Session] = None, max_cache_seconds: int = 0, max_cache_query_inspections: int = 50, max_remote_cache_entries: int = 50, max_local_cache_entries: int = 100, data_source: Optional[str] = None, wait: bool = False) Any

Start a SQL Query against AWS Athena.

Note

Create the default Athena bucket if it doesn’t exist and s3_output is None. (E.g. s3://aws-athena-query-results-ACCOUNT-REGION/)

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • max_cache_query_inspections

  • max_cache_seconds

  • max_remote_cache_entries

  • max_local_cache_entries

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters
  • sql (str) – SQL query.

  • database (str, optional) – AWS Glue/Athena database name.

  • s3_output (str, optional) – AWS S3 path.

  • workgroup (str, optional) – Athena workgroup.

  • encryption (str, optional) – None, ‘SSE_S3’, ‘SSE_KMS’, ‘CSE_KMS’.

  • kms_key (str, optional) – For SSE-KMS and CSE-KMS , this is the KMS key ARN or ID.

  • params (Dict[str, any], optional) – Dict of parameters that will be used for constructing the SQL query. Only named parameters are supported. The dict needs to contain the information in the form {‘name’: ‘value’} and the SQL query needs to contain :name;. Note that for varchar columns and similar, you must surround the value in single quotes.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • max_cache_seconds (int) – Wrangler can look up in Athena’s history if this query has been run before. If so, and its completion time is less than max_cache_seconds before now, wrangler skips query execution and just returns the same results as last time. If cached results are valid, wrangler ignores the s3_output, encryption and kms_key params. If reading cached data fails for any reason, execution falls back to the usual query run path.

  • max_cache_query_inspections (int) – Max number of queries that will be inspected from the history to try to find some result to reuse. The bigger the number of inspection, the bigger will be the latency for not cached queries. Only takes effect if max_cache_seconds > 0.

  • max_remote_cache_entries (int) – Max number of queries that will be retrieved from AWS for cache inspection. The bigger the number of inspection, the bigger will be the latency for not cached queries. Only takes effect if max_cache_seconds > 0 and default value is 50.

  • max_local_cache_entries (int) – Max number of queries for which metadata will be cached locally. This will reduce the latency and also enables keeping more than max_remote_cache_entries available for the cache. This value should not be smaller than max_remote_cache_entries. Only takes effect if max_cache_seconds > 0 and default value is 100.

  • data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

  • wait (bool, default False) – Indicates whether to wait for the query to finish and return a dictionary with the query execution response.

Returns

Query execution ID if wait is set to False, dictionary with the get_query_execution response otherwise.

Return type

Union[str, Dict[str, Any]]

Examples

Querying into the default data source (Amazon s3 - ‘AwsDataCatalog’)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...')

Querying into another data source (PostgreSQL, Redshift, etc)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...', data_source='...')