Today’s pick refers to a recent Databricks acquisition, that of 8080labs and their bamboolib product. You may be familiar with Python libraries like Pandas Profiling, which creates a profile of a tabular dataset - think like Azure Machine Learning's Dataset profile, but with more sophisticated information, good for a first feel of what’s in the data.
Bamboolib is also a python library, including a community (free) and a paid tier, that includes an interactive UI for data exploration (pandas dataframes), including capabilities like creating calculated columns, applying filters, generation of Python code based on UI configurations, charting, and more. Parts of it remind me of the data reshaping capabilities present in PowerBI.
A short video on how it works is here: https://youtu.be/Qni8kX4hSOM , and the homepage is https://bamboolib.8080labs.com/ . I assume this will be fully integrated with Databricks/Spark Dataframes/Koalas soon, but on the meantime, for tabular data exploration, it’s worth checking out the free tier.
A final note about the Databricks acquisition of 8080labs, described here: https://databricks.com/blog/2021/10/06/bringing-lakehouse-to-the-citizen-data-scientist-announcing-the-acquisition-of-8080-labs.html. Databricks/Spark at heart does not target citizen data scientists, but this acquisition (much like Redash before it – now “Databricks SQL”) shows them clearly trying to go into that space.