awswrangler.catalog.create_parquet_table(database: str, table: str, path: str, columns_types: Dict[str, str], partitions_types: Optional[Dict[str, str]] = None, compression: Optional[str] = None, description: Optional[str] = None, parameters: Optional[Dict[str, str]] = None, columns_comments: Optional[Dict[str, str]] = None, mode: str = 'overwrite', catalog_versioning: bool = False, boto3_session: Optional[boto3.session.Session] = None) → None

Create a Parquet Table (Metadata Only) in the AWS Glue Catalog.

  • database (str) – Database name.

  • table (str) – Table name.

  • path (str) – Amazon S3 path (e.g. s3://bucket/prefix/).

  • columns_types (Dict[str, str]) – Dictionary with keys as column names and vales as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}).

  • partitions_types (Dict[str, str], optional) – Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).

  • compression (str, optional) – Compression style (None, snappy, gzip, etc).

  • description (str, optional) – Table description

  • parameters (Dict[str, str], optional) – Key/value pairs to tag the table.

  • columns_comments (Dict[str, str], optional) – Columns names and the related comments (e.g. {‘col0’: ‘Column 0.’, ‘col1’: ‘Column 1.’, ‘col2’: ‘Partition.’}).

  • mode (str) – ‘overwrite’ to recreate any possible existing table or ‘append’ to keep any possible existing table.

  • catalog_versioning (bool) – If True and mode=”overwrite”, creates an archived version of the table catalog before updating it.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.



Return type



>>> import awswrangler as wr
>>> wr.catalog.create_parquet_table(
...     database='default',
...     table='my_table',
...     path='s3://bucket/prefix/',
...     columns_types={'col0': 'bigint', 'col1': 'double'},
...     partitions_types={'col2': 'date'},
...     compression='snappy',
...     description='My own table!',
...     parameters={'source': 'postgresql'},
...     columns_comments={'col0': 'Column 0.', 'col1': 'Column 1.', 'col2': 'Partition.'}
... )