AWS Data Wrangler

1 - Introduction

What is AWS Data Wrangler?

An open-source Python package that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon Timestream, Amazon EMR, etc).

Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases.

Check our list of functionalities.

How to install?

The Wrangler runs almost anywhere over Python 3.6, 3.7 and 3.8, so there are several different ways to install it in the desired enviroment.

Some good practices for most of the above methods are: - Use new and individual Virtual Environments for each project (venv) - On Notebooks, always restart your kernel after installations.

Let’s Install it!

[ ]:
!pip install awswrangler

Restart your kernel after the installation!

[1]:
import awswrangler as wr

wr.__version__
[1]:
'2.0.0'