From Zero to Hero with Databricks on Google Cloud (GCP)

Frank Munz
Google Cloud - Community
3 min readNov 22, 2021

--

Databricks turned into the favorite platform for many data engineers, data scientists, and ML experts. It combines data, analytics, and AI. It’s multi-cloud and now you can also use it on GCP.

This article will walk you through the main steps to become efficient with Databricks on Google Cloud.

Databricks on GCP: From Zero to Hero

1. Get the Foundation Right — From Subscription to User Creation

To get started, let me link to a step-by-step tutorial that covers everything on video from creating a subscription, pre-requisites, creating a Databricks workspace, adding users to the workspace, to running your first job.

Make sure to get this right. Even if you are like me, i.e. someone who is not reading the instructions for IKEA furniture, make sure to get this right (it will save you trouble later, if you e.g. set quotas correctly from the very beginning)

Also, check the official documentation.

2. The Persona View

All of your Databricks assets are assessed using the sidebar. The sidebar’s contents depend on the selected persona: e.g. Data Science & Engineering, or Machine Learning.

By default, the sidebar appears in a collapsed state and only the icons are visible. Move your cursor over the sidebar to expand to the full view.

3. Explore the Quickstart Notebook

Ok, you passed all the setup steps swimmingly, but you are not a seasoned programmer and you wouldn’t know how to write code in a notebook? No worries, not everyone is a data engineer or data scientist.

From every GCP workspace, you can start exploring a quick-start notebook. Quick start notebooks are a great way to explore and run short snippets of easy-to-understand code. For aspiring data scientists, this is a great way to learn how to implement core functionality.

4. Notebook Gallery

The Databricks notebook gallery showcases some of the possibilities through Notebooks which can easily be imported into your own Databricks environment.

5. Solution Accelerators

Solution accelerators are Databricks notebooks that tackle common, high-impact use cases. They are designed to help Databricks customers go from idea to PoC in less than 2 weeks. Check them out and discuss them with your solution architecture team or watch the quick YouTube introduction.

6. Technical Resources You Should Know

There are many more technical articles that help you to get up to speed with Databricks on Google Cloud:

Please clap for this article if you enjoyed reading it. For more cloud-based data science, data engineering, and AI/ML follow me on Twitter (or LinkedIn).

Big thanks to Silviu Tofan for supporting this article and Databricks on GCP.
Shoutout to
Jon Tyson on unsplash for the photo used in this article. Great shot!

--

--

Frank Munz
Google Cloud - Community

Cloudy things, large-scale data & compute. Twitter @frankmunz. Former Tech Evangelist @awscloud, Principal @Databricks now. personal opinions here. #devrel ❤️.