AWS Data Wrangler

21 - Global Configurations

Wrangler has two ways to set global configurations that will override the regular default arguments configured in functions signatures.

  • Environment variables

  • wr.config

P.S. Check thefunction API docto see if your function has some argument that can be configured through Global configurations.

P.P.S. One exception to the above mentioned rules is the ``botocore_config`` property. It cannot be set through environment variables but only via ``wr.config``. It will be used as the ``botocore.config.Config`` for all underlying ``boto3`` calls. The default config is ``botocore.config.Config(retries={“max_attempts”: 5}, connect_timeout=10, max_pool_connections=10)``. If you only want to change the retry behavior, you can use the environment variables ``AWS_MAX_ATTEMPTS`` and ``AWS_RETRY_MODE``. (seeBoto3 documentation)

Environment Variables

[1]:
%env WR_DATABASE=default
%env WR_CTAS_APPROACH=False
%env WR_MAX_CACHE_SECONDS=900
%env WR_MAX_CACHE_QUERY_INSPECTIONS=500
%env WR_MAX_REMOTE_CACHE_ENTRIES=50
%env WR_MAX_LOCAL_CACHE_ENTRIES=100
env: WR_DATABASE=default
env: WR_CTAS_APPROACH=False
env: WR_MAX_CACHE_SECONDS=900
env: WR_MAX_CACHE_QUERY_INSPECTIONS=500
env: WR_MAX_REMOTE_CACHE_ENTRIES=50
env: WR_MAX_LOCAL_CACHE_ENTRIES=100
[2]:
import awswrangler as wr
import botocore
[3]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[3]:
foo
0 1

Resetting

[4]:
# Specific
wr.config.reset("database")
# All
wr.config.reset()

wr.config

[5]:
wr.config.database = "default"
wr.config.ctas_approach = False
wr.config.max_cache_seconds = 900
wr.config.max_cache_query_inspections = 500
wr.config.max_remote_cache_entries = 50
wr.config.max_local_cache_entries = 100
# Set botocore.config.Config that will be used for all boto3 calls
wr.config.botocore_config = botocore.config.Config(
    retries={"max_attempts": 10},
    connect_timeout=20,
    max_pool_connections=20
)
[6]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[6]:
foo
0 1

Visualizing

[7]:
wr.config
[7]:
name Env. Variable type nullable enforced configured value
0 catalog_id WR_CATALOG_ID <class 'str'> True False False None
1 concurrent_partitioning WR_CONCURRENT_PARTITIONING <class 'bool'> False False False None
2 ctas_approach WR_CTAS_APPROACH <class 'bool'> False False True False
3 database WR_DATABASE <class 'str'> True False True default
4 max_cache_query_inspections WR_MAX_CACHE_QUERY_INSPECTIONS <class 'int'> False False True 500
5 max_cache_seconds WR_MAX_CACHE_SECONDS <class 'int'> False False True 900
6 max_remote_cache_entries WR_MAX_REMOTE_CACHE_ENTRIES <class 'int'> False False True 50
7 max_local_cache_entries WR_MAX_LOCAL_CACHE_ENTRIES <class 'int'> False False True 100
8 s3_block_size WR_S3_BLOCK_SIZE <class 'int'> False True False None
9 workgroup WR_WORKGROUP <class 'str'> False True False None
10 s3_endpoint_url WR_S3_ENDPOINT_URL <class 'str'> True True True None
11 athena_endpoint_url WR_ATHENA_ENDPOINT_URL <class 'str'> True True True None
12 sts_endpoint_url WR_STS_ENDPOINT_URL <class 'str'> True True True None
13 glue_endpoint_url WR_GLUE_ENDPOINT_URL <class 'str'> True True True None
14 redshift_endpoint_url WR_REDSHIFT_ENDPOINT_URL <class 'str'> True True True None
15 kms_endpoint_url WR_KMS_ENDPOINT_URL <class 'str'> True True True None
16 emr_endpoint_url WR_EMR_ENDPOINT_URL <class 'str'> True True True None
17 botocore_config WR_BOTOCORE_CONFIG <class 'botocore.config.Config'> True False True <botocore.config.Config object at 0x000002D3A362E760>