Create an Apache Spark external data source connection

This feature is experimental and not ready for use in production. It is only available as part of an Early Access Program, and can go under breaking changes until general availability.

  1. Navigate to BigQuery SQL Workspace Explorer and click Add.

    Add new external data source
    Figure 1. Add new external data source
  2. Enter details for your new data source connection and create the connection. Make sure to set the connection type to Apache Spark and the region close to your AuraDS or Neo4j GDS instance.

    Enter data source details
    Figure 2. Enter data source details
  3. The new connection will be listed under External Connections in BigQuery Explorer. Make note of both the "Connection Name" (1) and "Service Account ID" (2) information as we will use it later.

    View created data source
    Figure 3. View created data source

Grant roles to Service Account ID

Next, you need to grant required permissions for the Service Account ID to access the external data source above.

Select Add principal, then add the following roles for the Service Account ID:

  1. Artifact Registry Reader

  2. BigQuery Data Viewer (if read-only) or BigQuery Data Editor (if writing back to BigQuery)

  3. BigQuery Read Session User

  4. Secret Manager Secret Accessor

  5. Storage Object Viewer

Additionally, grant the BigQuery Data Viewer or BigQuery Data Editor roles on your BigQuery Dataset to the same service account, through Dataset Permissions page.

Dataset permissions
Figure 4. Dataset permissions