Overview of the Snowflake Connector for Spark¶
The Snowflake Connector for Spark enables you to use Snowflake as a Spark data source, similar to other data sources (PostgreSQL, HDFS, S3, etc.).
In this Topic:
Interaction Between Snowflake and Spark¶
The connector supports bi-directional data movement between a Snowflake cluster and a Spark cluster. Using the connector, you can perform the following operations:
- Populate a Spark DataFrame from a table (or query) in Snowflake.
- Write the contents of a Spark DataFrame to a table in Snowflake.
The connector uses Scala 2.1x to perform these operations and uses the Snowflake JDBC driver to communicate with Snowflake.
For optimal performance transferring non-trivial amounts of data, we recommend using the Snowflake Connector for Spark rather than the Apache Spark JDBC driver.
The exchange of data between the two systems is facilitated through a Snowflake internal stage that the connector creates and manages:
- Upon connecting to Snowflake and initializing a session in Snowflake, the connector creates the internal stage.
- Throughout the duration of the Snowflake session, the connector uses the stage to store data while transferring it to its destination.
- At the end of the Snowflake session, the connector drops the stage, thereby removing all the temporary data in the stage.