awswrangler.db.read_sql_query(sql: str, con: sqlalchemy.engine.base.Engine, index_col: Union[str, List[str], None] = None, params: Union[List, Tuple, Dict, None] = None, chunksize: Optional[int] = None, dtype: Optional[Dict[str, pyarrow.lib.DataType]] = None) → Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame]]

Return a DataFrame corresponding to the result set of the query string.

Support for Redshift, PostgreSQL and MySQL.


Redshift: For large extractions (1MM+ rows) consider the function wr.db.unload_redshift().

  • sql (str) – Pandas DataFrame

  • con (sqlalchemy.engine.Engine) – SQLAlchemy Engine. Please use, wr.db.get_engine(), wr.db.get_redshift_temp_engine() or wr.catalog.get_engine()

  • index_col (Union[str, List[str]], optional) – Column(s) to set as index(MultiIndex).

  • params (Union[List, Tuple, Dict], optional) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249’s paramstyle, is supported. Eg. for psycopg2, uses %(name)s so use params={‘name’ : ‘value’}.

  • chunksize (int, optional) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • dtype (Dict[str, pyarrow.DataType], optional) – Specifying the datatype for columns. The keys should be the column names and the values should be the PyArrow types.


Result as Pandas DataFrame(s).

Return type

Union[pandas.DataFrame, Iterator[pandas.DataFrame]]


Reading from Redshift with temporary credentials

>>> import awswrangler as wr
>>> df = wr.db.read_sql_query(
...     sql="SELECT * FROM public.my_table",
...     con=wr.db.get_redshift_temp_engine(cluster_identifier="...", user="...")
... )

Reading from Redshift from Glue Catalog Connections

>>> import awswrangler as wr
>>> df = wr.db.read_sql_query(
...     sql="SELECT * FROM public.my_table",
...     con=wr.catalog.get_engine(connection="...")
... )