awswrangler.catalog.sanitize_dataframe_columns_names¶
- awswrangler.catalog.sanitize_dataframe_columns_names(df: DataFrame, handle_duplicate_columns: Optional[str] = 'warn') DataFrame ¶
Normalize all columns names to be compatible with Amazon Athena.
https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
Possible transformations: - Strip accents - Remove non alphanumeric characters
Note
After transformation, some column names might not be unique anymore. Example: the columns [“A”, “a”] will be sanitized to [“a”, “a”]
- Parameters
df (pandas.DataFrame) – Original Pandas DataFrame.
handle_duplicate_columns (str, optional) – How to handle duplicate columns. Can be “warn” or “drop” or “rename”. “drop” will drop all but the first duplicated column. “rename” will rename all duplicated columns with an incremental number. Defaults to “warn”.
- Returns
Original Pandas DataFrame with columns names normalized.
- Return type
pandas.DataFrame
Examples
>>> import awswrangler as wr >>> df_normalized = wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({"A": [1, 2]})) >>> df_normalized_drop = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4]}), handle_duplicate_columns="drop" ) >>> df_normalized_rename = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4], "a_1": [4, 6]}), handle_duplicate_columns="rename" )