awswrangler.data_quality.create_recommendation_ruleset¶

awswrangler.data_quality.create_recommendation_ruleset(database: str, table: str, iam_role_arn: str, name: str | None = None, catalog_id: str | None = None, connection_name: str | None = None, additional_options: dict[str, Any] | None = None, number_of_workers: int = 5, timeout: int = 2880, client_token: str | None = None, boto3_session: Session | None = None) → DataFrame¶

Create recommendation Data Quality ruleset.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

catalog_id
database

Check out the Global Configurations Tutorial for details.

Parameters:

database (str) – Glue database name.
table (str) – Glue table name.
iam_role_arn (str) – IAM Role ARN.
name (str, optional) – Ruleset name.
catalog_id (str, optional) – Glue Catalog id.
connection_name (str, optional) – Glue connection name.
additional_options (dict, optional) – Additional options for the table. Supported keys: pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset. catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
number_of_workers (int, optional) – The number of G.1X workers to be used in the run. The default is 5.
timeout (int, optional) – The timeout for a run in minutes. The default is 2880 (48 hours).
client_token (str, optional) – Random id used for idempotency. Is automatically generated if not provided.
boto3_session (boto3.Session, optional) – Boto3 Session. If none, the default boto3 session is used.

Returns:

Data frame with recommended ruleset details.

Return type:

pd.DataFrame

Examples

>>> import awswrangler as wr

>>> df_recommended_ruleset = wr.data_quality.create_recommendation_ruleset(
>>>     database="database",
>>>     table="table",
>>>     iam_role_arn="arn:...",
>>>)