awswrangler.catalog.extract_athena_types

awswrangler.catalog.extract_athena_types(df: DataFrame, index: bool = False, partition_cols: Optional[List[str]] = None, dtype: Optional[Dict[str, str]] = None, file_format: str = 'parquet') Tuple[Dict[str, str], Dict[str, str]]

Extract columns and partitions types (Amazon Athena) from Pandas DataFrame.

https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Parameters
  • df (pandas.DataFrame) – Pandas DataFrame.

  • index (bool) – Should consider the DataFrame index as a column?.

  • partition_cols (List[str], optional) – List of partitions names.

  • dtype (Dict[str, str], optional) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘bigint’, ‘col2 name’: ‘int’})

  • file_format (str, optional) – File format to be consided to place the index column: “parquet” | “csv”.

Returns

columns_types: Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). / partitions_types: Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).

Return type

Tuple[Dict[str, str], Dict[str, str]]

Examples

>>> import awswrangler as wr
>>> columns_types, partitions_types = wr.catalog.extract_athena_types(
...     df=df, index=False, partition_cols=["par0", "par1"], file_format="csv"
... )