awswrangler.catalog.sanitize_dataframe_columns_names¶
- awswrangler.catalog.sanitize_dataframe_columns_names(df: DataFrame, handle_duplicate_columns: str | None = 'warn') DataFrame ¶
Normalize all columns names to be compatible with Amazon Athena.
https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
Possible transformations: - Strip accents - Remove non alphanumeric characters
Note
After transformation, some column names might not be unique anymore. Example: the columns [“A”, “a”] will be sanitized to [“a”, “a”]
- Parameters:
df (pandas.DataFrame) – Original Pandas DataFrame.
handle_duplicate_columns (str, optional) – How to handle duplicate columns. Can be “warn” or “drop” or “rename”. “drop” will drop all but the first duplicated column. “rename” will rename all duplicated columns with an incremental number. Defaults to “warn”.
- Returns:
Original Pandas DataFrame with columns names normalized.
- Return type:
pandas.DataFrame
Examples
>>> import awswrangler as wr >>> df_normalized = wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({"A": [1, 2]})) >>> df_normalized_drop = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4]}), handle_duplicate_columns="drop" ) >>> df_normalized_rename = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4], "a_1": [4, 6]}), handle_duplicate_columns="rename" )