Overview of the Snowflake Connector for Spark

The Snowflake Connector for Spark enables you to use Snowflake as a Spark data source, similar to other data sources (PostgreSQL, HDFS, S3, etc.).

Snowflake as a data source for Spark

In this Topic:

Interaction Between Snowflake and Spark

The connector supports bi-directional data movement between a Snowflake cluster and a Spark cluster. Using the connector, you can perform the following operations:

  • Populate a Spark DataFrame from a table (or query) in Snowflake.
  • Write the contents of a Spark DataFrame to a table in Snowflake.
Interaction between Snowflake and Spark

The connector uses Scala 2.1x to perform these operations and uses the Snowflake JDBC driver to communicate with Snowflake.

Note

For optimal performance transferring non-trivial amounts of data, we recommend using the Snowflake Connector for Spark rather than the Apache Spark JDBC driver.

Data Exchange

The exchange of data between the two systems is facilitated through a Snowflake internal stage that the connector creates and manages:

  • Upon connecting to Snowflake and initializing a session, the connector creates the internal stage.
  • Throughout the duration of the session, the connector uses the stage to store data while transferring it to its destination.
  • At the end of the session, the connector drops the stage, thereby removing all the temporary data in the stage.