Monday, December 3, 2018

Databricks/Spark Hands-on lab

This past week I organized a 4-day technical training for internal teams called LearnAI. I had some time to spare in the agenda in one of the days, so hacked together a simple challenge using Azure Databricks/Spark and Python.

Being a fan of Astronomy, I based this off a personal pet project of mine - explore ESA's Gaia satellite data using Spark. A few months back I completed Coursera's Data-driven Astronomy, and felt this was an amazing way of exploring big-data challenges while also learning some Astronomy along the way.

Anyway, I have put the resulting Notebooks and Word document with setup instructions on github. The exercises have the format of a Notebook where you fill in the missing Python code, and I've also included a solutions' notebook. The exercises are mostly introductory and should take max 3 hours to complete. You'll also need access to an Azure subscription, as I'm using Azure Blob Storage to store the data.

PS: it feels good to code once in a while ;-)

No comments:

Post a Comment