awswrangler.catalog.sanitize_dataframe_columns_names¶

awswrangler.catalog.sanitize_dataframe_columns_names(df: DataFrame, handle_duplicate_columns: str | None = 'warn') → DataFrame¶

Normalize all columns names to be compatible with Amazon Athena.

https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html

Possible transformations: - Strip accents - Remove non alphanumeric characters

Note

After transformation, some column names might not be unique anymore. Example: the columns [“A”, “a”] will be sanitized to [“a”, “a”]

Parameters:

df (pandas.DataFrame) – Original Pandas DataFrame.
handle_duplicate_columns (str, optional) – How to handle duplicate columns. Can be “warn” or “drop” or “rename”. “drop” will drop all but the first duplicated column. “rename” will rename all duplicated columns with an incremental number. Defaults to “warn”.

Returns:

Original Pandas DataFrame with columns names normalized.

Return type:

pandas.DataFrame

Examples

>>> import awswrangler as wr
>>> df_normalized = wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({"A": [1, 2]}))
>>> df_normalized_drop = wr.catalog.sanitize_dataframe_columns_names(
        df=pd.DataFrame({"A": [1, 2], "a": [3, 4]}), handle_duplicate_columns="drop"
    )
>>> df_normalized_rename = wr.catalog.sanitize_dataframe_columns_names(
        df=pd.DataFrame({"A": [1, 2], "a": [3, 4], "a_1": [4, 6]}), handle_duplicate_columns="rename"
    )