Docker image and Artifact Registry

This feature is experimental and not ready for use in production. It is only available as part of an Early Access Program, and can go under breaking changes until general availability.

Graph Data Science for BigQuery is based on Stored Procedures for Apache Spark and requires a container image to be published in Artifact Registry in the same Google Cloud project as your stored procedure will run. This is why it is required that a Docker image needs to be pulled or built and pushed to your Artifact Registry repository.

Configure Artifact Registry

Create a new standard repository in Artifact Registry that you will push the Docker images to. The new repository needs to be of Docker format. Select rest of the properties based on your environment, but ideally the new repository’s region should be near to where your BigQuery resources will reside.

Once the new repository is created, follow the Setup Instructions on the repository details page and have your local Docker tooling ready to push images.

gcloud auth login

gcloud auth configure-docker <region>-docker.pkg.dev/<gcp-project-id>/<repository-name>

The new repository’s path will be similar to <region>-docker.pkg.dev/<gcp-project-id>/<repository-name> and will be used with these placeholders for the rest of the documentation.

Getting the Docker image

The source code for this connector resides at this GitHub repository. In order to have the Docker image ready, you can either pull one of the pre-built images or build the image from scratch from your own environment.

Stored Procedures for Apache Spark spawn a serverless Spark environment in linux/amd64 platform. Your Docker images must be built/pulled specifically for this platform.

Pulling pre-built images

In order to provide an easier onboarding, we publish a pre-built Docker image which can be pulled from either of the following locations.

Pull from Docker Hub
docker image pull --platform linux/amd64 neo4j/bigquery-connector:<version>
docker image tag neo4j/bigquery-connector:<version> <region>-docker.pkg.dev/<gcp-project-id>/<repository-name>/neo4j-bigquery-connector:<version>
Pull from GitHub Packages
docker image pull --platform linux/amd64 ghcr.com/neo4j-field/bigquery-connector:<version>
docker image tag ghcr.com/neo4j-field/bigquery-connector:<version> <region>-docker.pkg.dev/<gcp-project-id>/<repository-name>/neo4j-bigquery-connector:<version>

Building it yourself

If you prefer to build the image from scratch in your own environment, you need to execute the following commands in order.

git clone -b <version> https://github.com/neo4j-field/bigquery-connector
docker build --tag <region>-docker.pkg.dev/<gcp-project-id>/<repository-name>/neo4j-bigquery-connector:<version> --platform linux/amd64 .

Pushing the Docker image

Now that we have the required Docker image ready on our local environment, we can now push it to Artifact Registry.

Push Docker image
docker image push <region>-docker.pkg.dev/<gcp-project-id>/<repository-name>/neo4j-bigquery-connector:<version>