awswrangler.catalog.create_csv_table

awswrangler.catalog.create_csv_table(database: str, table: str, path: str, columns_types: Dict[str, str], partitions_types: Optional[Dict[str, str]] = None, bucketing_info: Optional[Tuple[List[str], int]] = None, compression: Optional[str] = None, description: Optional[str] = None, parameters: Optional[Dict[str, str]] = None, columns_comments: Optional[Dict[str, str]] = None, mode: str = 'overwrite', catalog_versioning: bool = False, sep: str = ',', skip_header_line_count: Optional[int] = None, serde_library: Optional[str] = None, serde_parameters: Optional[Dict[str, str]] = None, boto3_session: Optional[boto3.session.Session] = None, projection_enabled: bool = False, projection_types: Optional[Dict[str, str]] = None, projection_ranges: Optional[Dict[str, str]] = None, projection_values: Optional[Dict[str, str]] = None, projection_intervals: Optional[Dict[str, str]] = None, projection_digits: Optional[Dict[str, str]] = None, catalog_id: Optional[str] = None)Any

Create a CSV Table (Metadata Only) in the AWS Glue Catalog.

https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • catalog_id

  • database

Check out the Global Configurations Tutorial for details.

Parameters
  • database (str) – Database name.

  • table (str) – Table name.

  • path (str) – Amazon S3 path (e.g. s3://bucket/prefix/).

  • columns_types (Dict[str, str]) – Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}).

  • partitions_types (Dict[str, str], optional) – Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).

  • bucketing_info (Tuple[List[str], int], optional) – Tuple consisting of the column names used for bucketing as the first element and the number of buckets as the second element. Only str, int and bool are supported as column data types for bucketing.

  • compression (str, optional) – Compression style (None, gzip, etc).

  • description (str, optional) – Table description

  • parameters (Dict[str, str], optional) – Key/value pairs to tag the table.

  • columns_comments (Dict[str, str], optional) – Columns names and the related comments (e.g. {‘col0’: ‘Column 0.’, ‘col1’: ‘Column 1.’, ‘col2’: ‘Partition.’}).

  • mode (str) – ‘overwrite’ to recreate any possible axisting table or ‘append’ to keep any possible axisting table.

  • catalog_versioning (bool) – If True and mode=”overwrite”, creates an archived version of the table catalog before updating it.

  • sep (str) – String of length 1. Field delimiter for the output file.

  • skip_header_line_count (Optional[int]) – Number of Lines to skip regarding to the header.

  • serde_library (Optional[str]) – Specifies the SerDe Serialization library which will be used. You need to provide the Class library name as a string. If no library is provided the default is org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

  • serde_parameters (Optional[str]) – Dictionary of initialization parameters for the SerDe. The default is {“field.delim”: sep, “escape.delim”: “\”}.

  • projection_enabled (bool) – Enable Partition Projection on Athena (https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html)

  • projection_types (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections types. Valid types: “enum”, “integer”, “date”, “injected” https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘enum’, ‘col2_name’: ‘integer’})

  • projection_ranges (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections ranges. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘0,10’, ‘col2_name’: ‘-1,8675309’})

  • projection_values (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections values. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘A,B,Unknown’, ‘col2_name’: ‘foo,boo,bar’})

  • projection_intervals (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections intervals. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘5’})

  • projection_digits (Optional[Dict[str, str]]) – Dictionary of partitions names and Athena projections digits. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘2’})

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.

Returns

None.

Return type

None

Examples

>>> import awswrangler as wr
>>> wr.catalog.create_csv_table(
...     database='default',
...     table='my_table',
...     path='s3://bucket/prefix/',
...     columns_types={'col0': 'bigint', 'col1': 'double'},
...     partitions_types={'col2': 'date'},
...     compression='gzip',
...     description='My own table!',
...     parameters={'source': 'postgresql'},
...     columns_comments={'col0': 'Column 0.', 'col1': 'Column 1.', 'col2': 'Partition.'}
... )