AWS Data Wrangler

21 - Global Configurations

Wrangler has two ways to set global configurations that will override the regular default arguments configured in functions signatures.

  • Environment variables

  • wr.config

P.S. Check the function API doc to see if your function has some argument that can be configured through Global configurations.

P.P.S. One exception to the above mentioned rules is the ``botocore_config`` property. It cannot be set through environment variables but only via ``wr.config``. It will be used as the ``botocore.config.Config`` for all underlying ``boto3`` calls. The default config is ``botocore.config.Config(retries={“max_attempts”: 5}, connect_timeout=10, max_pool_connections=10)``. If you only want to change the retry behavior, you can use the environment variables ``AWS_MAX_ATTEMPTS`` and ``AWS_RETRY_MODE``. (see Boto3 documentation)

Environment Variables

[1]:
%env WR_DATABASE=default
%env WR_CTAS_APPROACH=False
%env WR_MAX_CACHE_SECONDS=900
%env WR_MAX_CACHE_QUERY_INSPECTIONS=500
%env WR_MAX_REMOTE_CACHE_ENTRIES=50
%env WR_MAX_LOCAL_CACHE_ENTRIES=100
env: WR_DATABASE=default
env: WR_CTAS_APPROACH=False
env: WR_MAX_CACHE_SECONDS=900
env: WR_MAX_CACHE_QUERY_INSPECTIONS=500
env: WR_MAX_REMOTE_CACHE_ENTRIES=50
env: WR_MAX_LOCAL_CACHE_ENTRIES=100
[2]:
import awswrangler as wr
import botocore
[3]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[3]:
foo
0 1

Resetting

[4]:
# Specific
wr.config.reset("database")
# All
wr.config.reset()

wr.config

[5]:
wr.config.database = "default"
wr.config.ctas_approach = False
wr.config.max_cache_seconds = 900
wr.config.max_cache_query_inspections = 500
wr.config.max_remote_cache_entries = 50
wr.config.max_local_cache_entries = 100
# Set botocore.config.Config that will be used for all boto3 calls
wr.config.botocore_config = botocore.config.Config(
    retries={"max_attempts": 10},
    connect_timeout=20,
    max_pool_connections=20
)
[6]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[6]:
foo
0 1

Visualizing

[7]:
wr.config
[7]:
name Env. Variable type nullable enforced configured value
0 catalog_id WR_CATALOG_ID <class 'str'> True False False None
1 concurrent_partitioning WR_CONCURRENT_PARTITIONING <class 'bool'> False False False None
2 ctas_approach WR_CTAS_APPROACH <class 'bool'> False False True False
3 database WR_DATABASE <class 'str'> True False True default
4 max_cache_query_inspections WR_MAX_CACHE_QUERY_INSPECTIONS <class 'int'> False False True 500
5 max_cache_seconds WR_MAX_CACHE_SECONDS <class 'int'> False False True 900
6 max_remote_cache_entries WR_MAX_REMOTE_CACHE_ENTRIES <class 'int'> False False True 50
7 max_local_cache_entries WR_MAX_LOCAL_CACHE_ENTRIES <class 'int'> False False True 100
8 s3_block_size WR_S3_BLOCK_SIZE <class 'int'> False True False None
9 workgroup WR_WORKGROUP <class 'str'> False True False None
10 s3_endpoint_url WR_S3_ENDPOINT_URL <class 'str'> True True True None
11 athena_endpoint_url WR_ATHENA_ENDPOINT_URL <class 'str'> True True True None
12 sts_endpoint_url WR_STS_ENDPOINT_URL <class 'str'> True True True None
13 glue_endpoint_url WR_GLUE_ENDPOINT_URL <class 'str'> True True True None
14 redshift_endpoint_url WR_REDSHIFT_ENDPOINT_URL <class 'str'> True True True None
15 kms_endpoint_url WR_KMS_ENDPOINT_URL <class 'str'> True True True None
16 emr_endpoint_url WR_EMR_ENDPOINT_URL <class 'str'> True True True None
17 botocore_config WR_BOTOCORE_CONFIG <class 'botocore.config.Config'> True False True <botocore.config.Config object at 0x000002D3A362E760>