awswrangler.catalog.create_parquet_table

awswrangler.catalog.create_parquet_table(database: str, table: str, path: str, columns_types: Dict[str, str], partitions_types: Optional[Dict[str, str]] = None, catalog_id: Optional[str] = None, compression: Optional[str] = None, description: Optional[str] = None, parameters: Optional[Dict[str, str]] = None, columns_comments: Optional[Dict[str, str]] = None, mode: str = 'overwrite', catalog_versioning: bool = False, projection_enabled: bool = False, projection_types: Optional[Dict[str, str]] = None, projection_ranges: Optional[Dict[str, str]] = None, projection_values: Optional[Dict[str, str]] = None, projection_intervals: Optional[Dict[str, str]] = None, projection_digits: Optional[Dict[str, str]] = None, boto3_session: Optional[boto3.session.Session] = None) → Any

Create a Parquet Table (Metadata Only) in the AWS Glue Catalog.

https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Note

This functions has arguments that can has default values configured globally through wr.config or environment variables:

  • catalog_id

  • database

Check out the Global Configurations Tutorial for details.

Parameters
  • database (str) – Database name.

  • table (str) – Table name.

  • path (str) – Amazon S3 path (e.g. s3://bucket/prefix/).

  • columns_types (Dict[str, str]) – Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}).

  • partitions_types (Dict[str, str], optional) – Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).

  • catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.

  • compression (str, optional) – Compression style (None, snappy, gzip, etc).

  • description (str, optional) – Table description

  • parameters (Dict[str, str], optional) – Key/value pairs to tag the table.

  • columns_comments (Dict[str, str], optional) – Columns names and the related comments (e.g. {‘col0’: ‘Column 0.’, ‘col1’: ‘Column 1.’, ‘col2’: ‘Partition.’}).

  • mode (str) – ‘overwrite’ to recreate any possible existing table or ‘append’ to keep any possible existing table.

  • catalog_versioning (bool) – If True and mode=”overwrite”, creates an archived version of the table catalog before updating it.

  • projection_enabled (bool) – Enable Partition Projection on Athena (https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html)

  • projection_types (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections types. Valid types: “enum”, “integer”, “date”, “injected” https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘enum’, ‘col2_name’: ‘integer’})

  • projection_ranges (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections ranges. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘0,10’, ‘col2_name’: ‘-1,8675309’})

  • projection_values (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections values. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘A,B,Unknown’, ‘col2_name’: ‘foo,boo,bar’})

  • projection_intervals (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections intervals. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘5’})

  • projection_digits (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections digits. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘2’})

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns

None.

Return type

None

Examples

>>> import awswrangler as wr
>>> wr.catalog.create_parquet_table(
...     database='default',
...     table='my_table',
...     path='s3://bucket/prefix/',
...     columns_types={'col0': 'bigint', 'col1': 'double'},
...     partitions_types={'col2': 'date'},
...     compression='snappy',
...     description='My own table!',
...     parameters={'source': 'postgresql'},
...     columns_comments={'col0': 'Column 0.', 'col1': 'Column 1.', 'col2': 'Partition.'}
... )