awswrangler.catalog.sanitize_dataframe_columns_names

awswrangler.catalog.sanitize_dataframe_columns_names(df: DataFrame, handle_duplicate_columns: Optional[str] = 'warn') DataFrame

Normalize all columns names to be compatible with Amazon Athena.

https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html

Possible transformations: - Strip accents - Remove non alphanumeric characters

Note

After transformation, some column names might not be unique anymore. Example: the columns [“A”, “a”] will be sanitized to [“a”, “a”]

Parameters
  • df (pandas.DataFrame) – Original Pandas DataFrame.

  • handle_duplicate_columns (str, optional) – How to handle duplicate columns. Can be “warn” or “drop” or “rename”. “drop” will drop all but the first duplicated column. “rename” will rename all duplicated columns with an incremental number. Defaults to “warn”.

Returns

Original Pandas DataFrame with columns names normalized.

Return type

pandas.DataFrame

Examples

>>> import awswrangler as wr
>>> df_normalized = wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({"A": [1, 2]}))
>>> df_normalized_drop = wr.catalog.sanitize_dataframe_columns_names(
        df=pd.DataFrame({"A": [1, 2], "a": [3, 4]}), handle_duplicate_columns="drop"
    )
>>> df_normalized_rename = wr.catalog.sanitize_dataframe_columns_names(
        df=pd.DataFrame({"A": [1, 2], "a": [3, 4], "a_1": [4, 6]}), handle_duplicate_columns="rename"
    )