AWS Data Wrangler

21 - Global Configurations

Wrangler has two ways to set global configurations that will override the regular default arguments configured in functions signatures.

  • Environment variables

  • wr.config

P.S. Check the function API doc to see if your function has some argument that can be configured through Global configurations.

P.P.S. One exception to the above mentioned rules is the ``botocore_config`` property. It cannot be set through environment variables but only via ``wr.config``. It will be used as the ``botocore.config.Config`` for all underlying ``boto3`` calls. The default config is ``botocore.config.Config(retries={“max_attempts”: 5}, connect_timeout=10, max_pool_connections=10)``. If you only want to change the retry behavior, you can use the environment variables ``AWS_MAX_ATTEMPTS`` and ``AWS_RETRY_MODE``. (see Boto3 documentation)

Environment Variables

[1]:
%env WR_DATABASE=default
%env WR_CTAS_APPROACH=False
%env WR_MAX_CACHE_SECONDS=900
%env WR_MAX_CACHE_QUERY_INSPECTIONS=500
%env WR_MAX_REMOTE_CACHE_ENTRIES=50
%env WR_MAX_LOCAL_CACHE_ENTRIES=100
env: WR_DATABASE=default
env: WR_CTAS_APPROACH=False
env: WR_MAX_CACHE_SECONDS=900
env: WR_MAX_CACHE_QUERY_INSPECTIONS=500
env: WR_MAX_REMOTE_CACHE_ENTRIES=50
env: WR_MAX_LOCAL_CACHE_ENTRIES=100
[1]:
import awswrangler as wr
import botocore
[3]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[3]:
foo
0 1

Resetting

[4]:
# Specific
wr.config.reset("database")
# All
wr.config.reset()

wr.config

[5]:
wr.config.database = "default"
wr.config.ctas_approach = False
wr.config.max_cache_seconds = 900
wr.config.max_cache_query_inspections = 500
wr.config.max_remote_cache_entries = 50
wr.config.max_local_cache_entries = 100
# Set botocore.config.Config that will be used for all boto3 calls
wr.config.botocore_config = botocore.config.Config(
    retries={"max_attempts": 10},
    connect_timeout=20,
    max_pool_connections=20
)
[6]:
wr.athena.read_sql_query("SELECT 1 AS FOO")
[6]:
foo
0 1

Visualizing

[2]:
wr.config
[2]:
name Env. Variable type nullable enforced configured value
0 catalog_id WR_CATALOG_ID <class 'str'> True False False None
1 concurrent_partitioning WR_CONCURRENT_PARTITIONING <class 'bool'> False False False None
2 ctas_approach WR_CTAS_APPROACH <class 'bool'> False False False None
3 database WR_DATABASE <class 'str'> True False False None
4 max_cache_query_inspections WR_MAX_CACHE_QUERY_INSPECTIONS <class 'int'> False False False None
5 max_cache_seconds WR_MAX_CACHE_SECONDS <class 'int'> False False False None
6 max_remote_cache_entries WR_MAX_REMOTE_CACHE_ENTRIES <class 'int'> False False False None
7 max_local_cache_entries WR_MAX_LOCAL_CACHE_ENTRIES <class 'int'> False False False None
8 s3_block_size WR_S3_BLOCK_SIZE <class 'int'> False True False None
9 workgroup WR_WORKGROUP <class 'str'> False True False None
10 chunksize WR_CHUNKSIZE <class 'int'> False True False None
11 s3_endpoint_url WR_S3_ENDPOINT_URL <class 'str'> True True True None
12 athena_endpoint_url WR_ATHENA_ENDPOINT_URL <class 'str'> True True True None
13 sts_endpoint_url WR_STS_ENDPOINT_URL <class 'str'> True True True None
14 glue_endpoint_url WR_GLUE_ENDPOINT_URL <class 'str'> True True True None
15 redshift_endpoint_url WR_REDSHIFT_ENDPOINT_URL <class 'str'> True True True None
16 kms_endpoint_url WR_KMS_ENDPOINT_URL <class 'str'> True True True None
17 emr_endpoint_url WR_EMR_ENDPOINT_URL <class 'str'> True True True None
18 lakeformation_endpoint_url WR_LAKEFORMATION_ENDPOINT_URL <class 'str'> True True True None
19 dynamodb_endpoint_url WR_DYNAMODB_ENDPOINT_URL <class 'str'> True True True None
20 secretsmanager_endpoint_url WR_SECRETSMANAGER_ENDPOINT_URL <class 'str'> True True True None
21 botocore_config WR_BOTOCORE_CONFIG <class 'botocore.config.Config'> True False True None
22 verify WR_VERIFY <class 'str'> True False True None