awswrangler.catalog.add_parquet_partitions

awswrangler.catalog.add_parquet_partitions(database: str, table: str, partitions_values: Dict[str, List[str]], catalog_id: Optional[str] = None, compression: Optional[str] = None, boto3_session: Optional[boto3.session.Session] = None, columns_types: Optional[Dict[str, str]] = None) → Any

Add partitions (metadata) to a Parquet Table in the AWS Glue Catalog.

Note

This functions has arguments that can has default values configured globally through wr.config or environment variables:

  • catalog_id

  • database

Check out the Global Configurations Tutorial for details.

Parameters
  • database (str) – Database name.

  • table (str) – Table name.

  • partitions_values (Dict[str, List[str]]) – Dictionary with keys as S3 path locations and values as a list of partitions values as str (e.g. {‘s3://bucket/prefix/y=2020/m=10/’: [‘2020’, ‘10’]}).

  • catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.

  • compression (str, optional) – Compression style (None, snappy, gzip, etc).

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • columns_types (Optional[Dict[str, str]]) – Only required for Hive compability. Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). P.S. Only materialized columns please, not partition columns.

Returns

None.

Return type

None

Examples

>>> import awswrangler as wr
>>> wr.catalog.add_parquet_partitions(
...     database='default',
...     table='my_table',
...     partitions_values={
...         's3://bucket/prefix/y=2020/m=10/': ['2020', '10'],
...         's3://bucket/prefix/y=2020/m=11/': ['2020', '11'],
...         's3://bucket/prefix/y=2020/m=12/': ['2020', '12']
...     }
... )