Configuring Snowflake for Spark in Qubole¶
To configure Snowflake for Spark in Qubole, you simply add Snowflake as a Qubole data store. This topic provides step-by-step instructions for performing this task using the Qubole Data Service (QDS) UI.
In this Topic:
You can also use the QDS REST API to add Snowflake as a data store. For step-by-step instructions, see Adding a Snowflake Data Warehouse as a Data Store (Qubole Documentation).
- You must be a QDS system administrator to add a data store.
- You must have a Qubole Enterprise edition account.
Preparing an AWS S3 Location¶
If your jobs regularly exceed 36 hours in length, consider preparing an AWS S3 bucket/path to use to exchange data between Snowflake and Spark. For details, see Preparing an AWS S3 Location.
Adding Snowflake as a Data Store in the QDS UI¶
From the Home menu, click Explore.
In the dropdown list on the Explore page, select + Add Data Store.
Enter the required information in the following fields:
- Data Store Name: Enter the name of the data store to be created.
- Database Type: Select ‘Snowflake’.
- Catalog Name: Enter the name of the Snowflake catalog.
- Database Name: Enter the name of the database in Snowflake where the data is stored.
- Warehouse Name: Enter the name of the Snowflake virtual warehouse to use for queries.
- Host Address: Enter the base URL of your Snowflake account (e.g.
- Username: Enter the login name for your Snowflake user (used to connect to the host).
- Password: Enter the password for your Snowflake user (used to connect to the host).
Note that all the values are case-sensitive, except for Host Address.
Click Save to create the data store.
Repeat these steps for each Snowflake database that you want to add as a data store. Or you can edit the data store to change the Snowflake database or any other properties for the data store (e.g. change the virtual warehouse used for queries).
After adding a Snowflake data store, restart the Spark cluster (if you are using an already-running Spark cluster). Restarting the Spark cluster installs the
.jar files for the Snowflake
Connector for Spark and the Snowflake JDBC Driver.
Verifying the Snowflake Data Store in Qubole¶
To verify that the Snowflake data store was created and has been activated, click on the dropdown list in the upper-left of the Explore page. A green dot indicates that the data store has been activated.
You should also verify that the table explorer widget in the left pane of the Explore page displays all of the tables in the Snowflake database specified in the data store.
Enabling Query Pushdown in Qubole¶
By default, Snowflake query pushdown for Spark is disabled in Qubole. Query pushdown can have significant performance benefits because it allows large/complex Spark logical plans to be processed in Snowflake.
To enable pushdown in a Spark session, invoke the
SnowflakeConnectorUtils.enablePushdownSession() static method call. For example, to enable it using Python, after instantiating a
SparkSession object, invoke:
sc is your
For more details about query pushdown, see Pushing Spark Query Processing to Snowflake (Snowflake Blog).