AWS Data Wrangler

20 - Spark Table Interoperability

Wrangler has no difficults to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.

But if you want to do the oposite (Spark interacting with a table created by Wrangler) you should be aware that Wrangler follows the Hive’s format and you must be explicit when using the Spark’s saveAsTable method:

[1]:
spark_df.write.format("hive").saveAsTable("database.table")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-662f92e01d7d> in <module>
----> 1 spark_df.write.format("hive").saveAsTable("database.table")

NameError: name 'spark_df' is not defined

Or just move forward using the insertInto alternative:

[2]:
spark_df.write.insertInto("database.table")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-2b2598260ddf> in <module>
----> 1 spark_df.write.insertInto("database.table")

NameError: name 'spark_df' is not defined