Wednesday, December 8, 2021

Jota’s ML Advent Calendar – 08/December

 The topic – or reflection - of the day is on MLOps, a term whose popularity comes probably from the fact that it’s a natural step when setting up enterprise ML/DS solutions. Unlike DevOps, which may be considered a solved problem, MLOps sits at the interception of 3 different skillsets, and that makes implementations longer and more complex:

  • Data science, including not only the ML/DS knowledge, but in platforms like Azure Machine Learning or others, also the ability to set up Machine Learning pipelines that go beyond a pipeline as “a set of consecutive calls of Python functions”, and also packages like mlflow for models and metadata tracking.
  • DevOps, and I’m thinking here of the ability to set up things like GitHub or Azure DevOps, from knowing how to do source control from commits to merges (a nightmare with Jupyter notebooks, by the way), but also the ability to set up multi-stage train/build/deploy pipelines across the different environments. As there are specificities to doing this for data science, whoever does this must also have some of the previous skills – it’s not just about building and deploying code, but also training/ deploying/monitoring models.
  • Infrastructure and Security– deployments of the environments must be protected against aspects like data exfiltration, network-level security has to be in place, also firewalls, access control, etc. If skills in Infrastructure/Security for DevOps deployments do exist – e.g., setting up private build agents inside virtual networks, if one adds the specificities of securing Machine Learning services, that’s much rarer.

In my view, part of the complexity of MLOps deployments comes from the fact that these are different skill sets, people talking different languages and with limited knowledge of each other’s worlds. And one more thing that makes it complex, is that data/code/models are “source controlled” in different repositories. Data is in Data lakes or databases, models go to model repos, code goes to source control. But a model can be trained, its metrics stored along with it, and the code is still sitting in a Jupyter Notebook, not checked-in and with no connection to the model, plus rarely is data versioned. There goes reproducibility.

There may be approaches in the market that already solve this with 5-clicks to set up, but in my view we’re not yet in “simple and quick” territory.

To close, I could mention the set of best of breed complementary and interconnected services for the above, from GitHub to Azure Machine Learning and Functions or Kubernetes, and all the security mechanisms of an Enterprise Cloud. :-) But I’ll leave you instead with an excellent post on MLOps from the point of view of a Data Scientist, written by a colleague - Maggie Mhanna: .

Tuesday, December 7, 2021

Jota’s ML Advent Calendar – 07/December

Today’s note is about data, the base of the pyramid on top of which [most] AI/ML sits. I’ll start by saying I’m not a fan of the “data is the new oil” expression, which leaves a lot unsaid (try searching for ‘data is not the new oil’ to find many different reasons), but the topic is actually inspired by this post by Rachel Thomas: , that goes into several perspectives I also hold – on having context on how the data was collected and what it means, data work being undervalued, or the need to consider how people are impacted. A good read, that led me to two additional thoughts.

One -- in my limited in “solving” tabular machine learning problems, it always paid off to invest time in data exploration/feature generation. Using AutoML or ensembles of models, is probably what gets you over the edge to the best possible quality model, but understanding the data and creating new features that ‘help’ the models do their work is more likely to get you a bump in the quality of what you get. But this is painstaking work, like a crime mystery to be cracked, and for some mysteries to be solved at all you’ll have to interview witnesses again to collect more data. Many people are also familiar with research like this from Anaconda (reported by Datanami here -, according to which 45% of the Data Science time is used in data preparation tasks, versus 21% on model selection/training/scoring. I won’t be surprised to see this last number go down, with further improvements in AutoML.

Second – Rachel ’s post includes this quote from Professor Berk Ustun, “when someone asks why their loan was denied, usually what they want is not just an explanation but to know what they could change in order to get a loan.”, and this again made me think of the Responsible AI Toolbox ( and EconML (, which allow you to identify the “minimum change to lead to a different model prediction”. It’s almost as if some ML problems are now easier to solve, but what happens before and after they are trained (data collection/preparation and exploring it/understanding the predictions) take ever more time.

PS: this just came out today, on the Responsible AI Toolbox: .

Monday, December 6, 2021

Jota’s ML Advent Calendar – 06/December

 My highlight for the day is a publication that came out a few weeks ago, the 2021 State of AI Report ( ), an attempt at compiling the most interesting developments in AI/ML of the year, in (mostly) not too much technical detail. It still runs at 188 slides, so there’s a lot there to see. A couple of the topics that specially interest me are the uses in biology, the impact of transformer models not only in NLP but also Computer Vision, and the discussion on the trend for larger and larger language models. AI chips also make an appearance, but possibly because of the global chip shortage, with less prominence than usual.

As a somewhat related topic, something that I found missing was the lack of mention to either Causal Reasoning (think Judea Pearl’s “Book of Why”) or hybrid approaches to AI - e.g. the “Symbolic + Connectionist” approach (think Gary Marcus + Ernest Davis’ “Rebooting AI”, as a popular recent reference). Maybe we’re just still amazed with the achievements of modern large language models.

On the above topic, on December 23rd, if the time and timezones work, it could be a good idea to register to the free “AI Debate #3” ( ), co-organized by MONTREAL.AI and Gary Marcus and with a great list of speakers.

As a very final note – when I was in Uni and briefly studied the topic of AI, Neural Networks were just a theoretical concept, and had to hand-draw a SNEPS network to represent a domain of knowledge, and code A* in LISP for a checkers game. How far things have come.

Saturday, December 4, 2021

Jota’s ML Advent Calendar – 04/December

 Today’s post is a short note on adversarial attacks for Computer Vision, prompted specifically by a new dataset and article on ArXiv from researchers at Scale AI/Allen Institute for AI/ML Collective - [2111.04204v1] Natural Adversarial Objects (, “Natural Adversarial Objects”. The word “Natural” comes from the fact that these 7934 photos were not artificially/intentionally created to cause problems in detection but were selected because they are mislabeled by 7 object detection models (including Yolov3). And “Objects” in the title refers to the fact that the analysis focuses on object detection scenarios – not image classification.

The authors then measured the mean average precision (mAP) against this dataset versus the MSCOCO dataset, and the difference in performance is huge (eg, 74.5% worse)!

The NAO dataset is available here, for anyone interested: Natural Adversarial Objects - Google Drive.

And continuing onto a related topic, what the above also shows is that when evaluating trained models, it isn’t enough to see how good a certain metric is (like mAP/F1/AUC), but also important to look at the distributions of the errors. And to look at this (surprise!) Microsoft actually has a Python package, namely the Error Analysis component of the Responsible AI Widgets, including capabilities to do a visual analysis/exploration of the errors. More information and sample notebooks are available here: .

Friday, December 3, 2021

Jota’s ML Advent Calendar – 03/December

Today’s post is about Model Explainability, techniques to try to understand why models give certain results – often blackbox/neural networks, and what features are more important.

There are two reasons I post this. One, that we have first class/leading approaches in use that were either created or are owned by people currently at Microsoft, and second, because not many people actually know this

As one example, check out this link from June this year, looking at 7 packages for Explainability in Python: . The page features at #1 SHAP, probably the most popular of all, but also at #2 LIME and also EBM/InterpretML, this one described as “an open source package from Microsoft”. What it doesn’t say however, is that the creators of both SHAP (Scott Lundberg) and LIME (Marco Tulio Ribeiro) are researchers at MSR. So out of a list of 7 packages, 3 were either created at Microsoft, or are owned by current employees. And if we add Fairlearn in a related area, and the recent developments on top of SHAP for Explainability on NLP and Vision, we are (in my view) unbeatable in this area.

Because EBM’s is probably the less well known of these, I’m also leaving a link to an article just focusing on it: , and what I’d suggest is this: next time you are training a model with XGBoost or LightGBM, try also EBMs. You may be surprised with how good it is.

Thursday, December 2, 2021

Jota’s ML Advent Calendar – 02/December

 Today’s pick refers to a recent Databricks acquisition, that of 8080labs and their bamboolib product. You may be familiar with Python libraries like Pandas Profiling, which creates a profile of a tabular dataset - think like Azure Machine Learning's Dataset profile, but with more sophisticated information, good for a first feel of what’s in the data.

Bamboolib is also a python library, including a community (free) and a paid tier, that includes an interactive UI for data exploration (pandas dataframes), including capabilities like creating calculated columns, applying filters, generation of Python code based on UI configurations, charting, and more. Parts of it remind me of the data reshaping capabilities present in PowerBI.

A short video on how it works is here: , and the homepage is . I assume this will be fully integrated with Databricks/Spark Dataframes/Koalas soon, but on the meantime, for tabular data exploration, it’s worth checking out the free tier.

A final note about the Databricks acquisition of 8080labs, described here: Databricks/Spark at heart does not target citizen data scientists, but this acquisition (much like Redash before it – now “Databricks SQL”) shows them clearly trying to go into that space.

Wednesday, December 1, 2021

Jota's ML Advent Calendar - 01/December

 Hopping on the idea of the “advent calendar” that is very typical in Germany (at least in Bavaria), during this month I’ll be sharing every day one link/info I’ve read recently in the field of AI/ML that I found interesting.

For a start, my suggestion of the day is FLAML ( This is an AutoML python library available on GitHub and created by people at MSR, based on research published end of last year. Why another one, especially considering we already have this on Azure Machine Learning? A few reasons I like it:

  • It’s fully Python based with a super-simple API. You have control of all the parameters in your experiments in code that you can source control.
  • In my experiments (for tabular data + classification/regression), I got consistently good results.
  • You can set a training budget – for how long you want it to train.
  • You can pick the algorithms you want to use in the training (most common being LightGBM, XGBoost and Catboost) – and if you pick only one, you’re effectively doing hyper-param tuning.
  • Supports sklearn pipelines (i.e., you can for example do Automl as the final step of a training pipeline).
  • You can do an optimization run based on a previous run, to further optimize results you’ve already obtained.
  • Has support for ensembles/stacks, where a set of models are trained to do a 1st prediction, and then a final estimator builds on the outputs of the previous predictors to make a final prediction.
  • And obviously, runs in AML (albeit not benefiting from clusters/parallelization).

Two other relevant links I’d also include are Optuna (, a library specifically for hyperparam tuning (similar to HyperOpt), and LightAutoML (, both widely used in Kaggle competitions.


Monday, October 5, 2020

Adding data to Azure Synapse table from Azure Databricks

Recently I put together a prototype of using Python code in Azure Databricks to clean-up data and then efficiently insert it into Azure Synapse Analytics (previously known as Azure Data Warehouse) tables. This post documents the relevant context information and then the technical detail on how to do it. There is plenty of published documentation on how to do this integration, but I wanted to also introduce the two main mechanisms to do it (PolyBase and the COPY statement), and specifically do append of rows, instead of rewriting Synapse tables, which is what you find in most links.

Relevant links

  • Use virtual network service endpoints and rules for servers in Azure SQL Database (Microsoft docs)
    • This page covers the case of when you have your data lake (Azure Storage) with accesses restricted to a VNet, and the steps in the section "Azure Synapse PolyBase and COPY statement" actually also give you some  relevant context: "PolyBase and the COPY statement is commonly used to load data into Azure Synapse Analytics from Azure Storage accounts for high throughput data ingestion. If the Azure Storage account that you are loading data from limits access only to a set of VNet-subnets, connectivity when using PolyBase and the COPY statement to the storage account will break". Down in the example, it goes into creating External tables, and that's not something you actually need to create when you do it from Databricks with the COPY statement.
  • Azure Synapse Analytics (Databricks documentation)
    • This is perhaps the most complete page in terms of explaining how this works, but also more complex. Again it refers PolyBase and the COPY statement, and includes code, but the code provided creates a new table, instead of adding to existing tables. If you use the provided code to write to an existing table, it will actually overwrite the table and its schema.
  • Loading from Azure Data Lake Store Gen 2 into Azure Synapse Analytics (Azure SQL DW) via Azure Databricks (medium post)
    • A good post, simpler to understand than the Databricks one, and including info on how use OAuth 2.0 with Azure Storage, instead of using the Storage Key. Again the code overwrites data/rewrites existing Synapse tables.
  • Tutorial: Extract, transform, and load data by using Azure Databricks (Microsoft docs)
    • Finally, this is a step-by-step tutorial of how to do the end-to-end process. It uses Scala instead of Python, and again overwrites the destination tables. I followed the steps from here, specifically the section "Load data into Azure Synapse", and then did some modifications.

PolyBase and the COPY statement

The two fastest ways to insert data into Synapse tables are PolyBase and new COPY statement. PolyBase is older technology, and COPY is its intended and simpler replacement. It requires fewer permissions and it's simpler to use - with PolyBase you may need to use Synapse External Tables, for example, which you don't need to do with COPY.

The Databricks documentation page above includes an overview of the differences:

«In addition to PolyBase, the Azure Synapse connector supports the COPY statement. The COPY statement offers a more convenient way of loading data into Azure Synapse without the need to create an external table, requires fewer permissions to load data, and provides an improved performance for high-throughput data ingestion into Azure Synapse.» 

And right after it, it includes the code used to control whether to use PolyBase or COPY:

spark.conf.set("spark.databricks.sqldw.writeSemantics", "<write-semantics>")

Here you can use "polybase" or "copy" as parameters to control how to do the data writes into Synapse. If you leave the line out, as I did in my code below, the default when using Azure Storage Gen2 + Databricks Runtime > 7.0 is "copy".

Finally, I specifically wanted to do the full ETL pipeline from Databricks, but the data transfer could actually also be orchestrated from Azure Data Factory. This documentation page goes into detail on how to do that, using either of the above alternatives.

Code details

The example code I found typically overwrote any existing table, instead of adding records to it, plus most of the examples use PolyBase + External Data Tables. In my scenario, I have the tables already created in Synapse, the Spark data frames filled with processed data, and just need to insert the rows into the destinations tables. So what are the steps?

First thing, I used the code in the tutorial link above to set up auxiliary variables, just converted from Scala to Python. This could be much more compact, but I left the original format for clarity:

Setup code

And then simply insert the data in my data frame into the target table in Azure Synapse:

In the call above, the first parameter of every option() is a keyword, so you really just need to change the respective database connection string (the "url" value) and the name of the table (the "dbTable" value). Also note that the connection string is not actually used to send the data to Synapse, but to tell it to go fetch the data from Azure Storage gen2.

And that's it. When you run this you get a message from Spark saying "Waiting for Azure Synapse Analytics to load intermediate data from wasbs://..... into "my-table-in-db" using COPY", and the data shows up in the table. The code above does not require any External Tables, and by using append you add to the table, while overwrite creates a new table from scratch (including overwriting the schema with that of the Dataframe). 

One caveat to the above -- it's your responsibility to make sure the types you use in your Spark data frame match those that you used in your Azure Synapse table definition. As an example, I had some columns as IntegerType() in my Spark Data frame, which you can't insert into TINYINT/SMALLINT columns, so the append failed.

One final note - while some places of the documentation refer to COPY as being in preview, this actually came out of Preview on September 23rd.

Thursday, June 25, 2020

Using the VisionAI DevKit to capture photos to Azure regularly


I've been playing lately with several IoT devices, from the Raspberry Pi to the Jetson Nano, and one (very simple) pet project I wanted to set up was to have one of the these devices capturing photos and uploading them to an Azure Storage Account, and then do either time lapses, or just use them for security home monitoring.
The issue with these small devices is that they're fiddly - they need boxes, power, be correctly positioned, can be fragile, etc., so I decided to use my Vision AI DevKit for this.

The camera is sturdy and has a wide-angle lens, the position is adjustable and the base is heavy/stable, it has built-in Wifi, runs Azure IoT Edge and can be powered with USB-C. It also has built-in ML capabilities (it can run neural network models on a Qualcomm Snapdragon chip pretty fast), but I don't actually want to run any ML, just upload photos regularly. It does use more power than my Raspberry Pi Zero with a camera, that's the main downside.

For my time-lapse use case, I need photos regularly, while for the security one I want to make sure photos are uploaded as fast as they are taken (for which I assume both power and Wifi are on). For this reason I decided to not do any local processing, just upload the photos ASAP and process them in Azure later. I'd save bandwidth by doing processing on the camera, that's not really an issue.

Starting point

I started with one of the Community Projects available for the camera, the Intelligent Alarm by Microsoft colleague Marek Lani. His project is entirely on GitHub, and he has a more complex setup than what I need -- he's doing object recognition on the edge as a trigger for a photo upload, which I don't want to do. He actually has a repo on GitHub for the image capture part of his project: . The reason this is relevant is because he is capturing images from the camera using ffmpeg over the built-in RTSP video feed (and calling it from NodeJS), instead of using the SDK's capabilities to take photos. Doing this later option can mess up local ML processing and require a reboot of the camera. So my work was simplified to adapting his code for my scenario.

Code changes

- Modify the capture code

The first thing I did was to look at Marek's app.js file. His code captures a photo whenever a message is received from IoT Hub (more correctly, from another container/module running on the camera). I just commented all of this block and replaced it with a call to the function that calls ffmpeg to capture the photo, TakeAndUploadCaptureFromStream(); . In more detail, I commented out the pipeMessage function and the whole IoT Hub block starting with Client.fromEnvironment.

The second thing was to find a way to call this code regularly. The classical solution to do this is to use cron, and that's what I did, following some hints from this thread on Stackoverflow. So here are the steps:

- Created a cronjobs file with content:

* * * * * node /app/app.js > /dev/stdout

This setup means the command is called once per minute. The redirect is actually not working, I want it to redirect the output to the docker log, but I get an error when I use "> /proc/1/fd/1 2>/proc/1/fd/2". Something to come back to.

- Modified the Dockerfile.arm32v7 to contain:

FROM arm32v7/node:12-alpine3.11


RUN apk add  --no-cache ffmpeg

COPY package*.json ./

RUN npm install --production

COPY app.js ./

# copy crontabs for root user
COPY cronjobs /etc/crontabs/root

USER root

# start crond with log level 8 in foreground, output to stderr
CMD ["crond", "-f", "-d", "8"]

The changes were:
  • changed the base image to one with alpine (which contains cron)
  • use apk to install ffmpeg instead of apt-get
  • changed the startup command to run cronjobs.
And that was it. I already had a provisioned IoT Hub and the camera registered as an IoT Edge device, as well as an Azure Container Registry to host my container images, and a Storage Account to drop the photos in, so I just had to:
  1. Build the container (docker build)
  2. Tag it with the ACR URL (docker tag)
  3. Push to my ACR (docker push)
  4. Add a module to the edge device configuration (Azure Portal > IoT Hub > my IoT Edge device > Set Modules), remembering to specify the required environment variables: RTSP_IP, RTSP_PORT, RTSP_PATH, STORAGE_CONTAINER and AZURE_STORAGE_CONNECTION_STRING.
After giving a few minutes for the IoT Edge runtime to download and get the container running, my Azure Storage Account now shows the incoming photos:

And this is what is running on the device:

Which matches the configuration on Azure IoT Edge:

Next Steps

After this base setup, my next step is to trigger the execution of an Azure Function or Azure Logic App on a schedule to compare the last two images to check for deltas, or to check if there's a missing photo (indicating camera is possibly off) and then triggering an email alarm. I already have some code to do image processing on an Azure Function (GitHub repo here), which will help.

Hope this helps anyone, and thanks to Marek Lani for his work on the Intelligent Alarm sample.


Turns out I had to iron out a couple of glitches in the last few days. 

The first was this: after 6-7 hours of image capturing, the AIVisoinDevKitGetStartedModule module would stop working, the container would die and restarting it didn't change the situation. Because the capture module depends on the RTSP stream this exposes, it would also stop. The problem turns out was disk space -- something is filling up the /tmp folder with core* files. My first thought was to again use a cronjob, but cron is read-only in the device, so I went the manual way:

- created a file in folder /run/user/0, with this content:


while [ : ]
rm -f /var/volatile/tmp/core*
sleep 5m

Simply delete the core* files every 5 minutes. I then did a chmod +x on the file, and ran it in the background with ./ & . This is not perfect... I'll have to run this every time the device reboots, however.

The second change I made was to organize the files by folder and use the date (in format yyyymmdd-hhmmss) in its name. The changes here were in app.js and included:

- in function TakeAndUploadCaptureFromStream() use the following for the first four lines:

function TakeAndUploadCaptureFromStream()
  var rightnow = new Date();
  var folder = rightnow.toISOString().replace('-''').replace('-','').replace('T''-').replace(':''').replace(':','').split('.')[0]; // returns eg 20200627-112802

  //-rtsp_transport tcp  parameter needed to obtain all the packets properly
  var fileName = `${folder}.jpg`

and in uploadImageToBlob() modify this block to calculate the right folder name and use it:

if (!error) {
      var justdate = fileName.split('-')[0]; // returns eg 20200627 from 20200627-112802.jpg
      blobService.createBlockBlobFromLocalFile(storageContainerjustdate + "/" + fileNamefileNamefunction(errorresultresponse) {

With these changes, I now have a new folder per day, and the files have names like 20200627-120401.jpg, much simpler to read and understand.

Monday, June 8, 2020

Setting up and configuring the Nvidia Jetson Nano

I recently purchased a Nvidia Jetson Nano to experiment more with Edge ML and Azure IoT Edge, in particular for image/video processing, and in this post I'll describe my setup steps. There are some excellent tutorials on how to do this, but I thought I still wanted to put this up -- a) in case someone else bumps into the same issues I did, and b) if I want to repeat this setup in the future.

Packing list
Here's what I bought:
  1. NVIDIA Jetson Nano Developer Kit B01 (latest version of the Nano)
  2. GST25E05-P1J MeanWell power supply 5V DC 4,0A mit Hohlstecker / 5,5x2,1mm (to power the Nano, as is widely recommended)
  3. RPI-2CAM Raspberry Pi Camera Module 8MP v2.1
  4. Intel Dual Band Wireless-AC 8265 M.2 network adapter (also has Bluetooth)
  5. SD Card: Samsung microSDXC UHS-I Card / Evo Plus 64Gb 
I quickly realized I should have bought an IPEX MHF4 antenna -- the wireless adapter sits under the heat sink, and has poor Wifi connectivity. I'll buy this, together with a fan and perhaps an enclosure/box.

Hardware setup
  1. Install the Wifi module as per the instructions here: 
  2. Write the OS Image to the SD Card with Balena Etcher, as described here:
    • Note: I skipped the "SD Card Formatter" part, my card didn't require this
  3. Insert the SD Card in the Jetson Nano with the contacts facing up, as shown here:
After doing this base setup I connected the Jetson Nano to a monitor with HDMI, USB Keyboard and mouse, and plugged in the charger, only to find that it didn't turn on. I had to use the mini-USB to power it, which proved unstable -- the Jetson would just turn off after a few minutes. I'm not sure why this was, as I wasn't doing any compute, but I speculate it may be related to the large resolution of the monitor.

I eventually found out that there's a jumper you have to set that configures the device to either use the microUSB or the external charger for power. This is jumper J48, which on the B01 sits behind the HDMI output and the place where you plug the external charger. 

As it turns out there's a lot of information on this configuration, how power for the Jetson Nano works,  and how to do this. For example:
On the B01 the plastic piece is provided but it's just sitting in one pin, I just had to take it out and put it over both of them.

After doing this I was able to keep the Jetson Nano stably turned on, configured it using the start-up user interface, and successfully tested SSH. 

Before continuing, and while connected via SSH, I installed Jetson Stats, following the instructions here: . You'll be able to see interesting stats on the use of the chip's resources (RAM, CPU, etc.) by running sudo jtop.

Configuring remote connection
I mostly want to use the Jetson Nano headless, i.e., without having the keyboard/mouse/monitor connected to it. Getting this to work took me down a path of some pain. 

The first suggestion I tried was to use VNC with Open Vino. There is a text document on the Jetson Nano on how to do this, but these instructions don't work. Anyway, on the meantime I also found out that VNC requires you to have a running UI session on the device, and then you connect to it. This means that every time I shut down the device, I would have to plug-in the HDMI, keyboard, etc., log in and then disconnect. Scratch VNC.

After VNC I went the RDP route, and installed xrdp. Considering I mostly use Windows 10, Remote Desktop is the perfect option. I followed the steps described here , in section Enabling Remote  Desktop. By the way, this article is one of the best guides on how to set up the Jetson Nano I've found, if you want step-by-step clear instructions.

When trying RDP, however, the session would start, show the Nvidia logo, and close again. Turns out that (according to this ), there's an issue with Gnome, the desktop windows manager, and one of the libraries it uses. Which led me to just install xfce4, as suggested in the previous link. This solved the issue and I could now connect using remote desktop.

The story wasn't over however. When connected, I noticed the terminal console wasn't opening, and apparently this is because it was trying to use the Gnome terminal. This link gave me the solution, I had to install the xfce4-terminal and configure it: .

So to summarize:
  1. Install xrdp from here:
  2. Install xfce4 and configure it as the default window manager, from here:
  3. Install xfce4-terminal and set it as the default terminal, from
Done? Not yet.

Wifi Connection Drops / Wifi Power Save
While doing the previous steps, I noticed my connection to the Jetson was very unstable. The SSH kept dropping and pings would fail. Initially I thought this was due to the lack of antennas on my Wifi adapter, but turns out it was due to the fact that it was being put to sleep by the OS.

The first solution was to do what is suggested here: , to change a setting in file /etc/NetworkManager/conf.d/default-wifi-powersave-on.conf . This doesn't work.

The second solution was from here and run "sudo iw dev wlan0 set power_save off". This actually runs, but it's not persistent across reboots.

The third and final solution was to follow the steps here: . These instructions are for the Nvidia Jetson TX2 but also work with the Jetson Nano. Note that the instructions assume gedit is used to create files, but I used vim instead -- so replace as appropriate with whatever text editor you have installed.

I don't know if this issue is specific to the Wifi adapter I'm using, and if it also happens with USB Wifi adapters, but I found quite a few people asking for help with this and several of the previous challenges I had. Have to admit this wasn't the best user experience I've ever had.

So what's next? Setting up Azure IoT Edge and trying it with DeepStream: . I'll write about how it goes.

Tuesday, January 7, 2020 podcast -- João Pedro Martins – AI and Machine Learning in Organizations

Note: refers to a recording done in Portuguese.

Earlier in November I recorded a podcast for (available here) which focused on AI/ML and its impact in society and organizations.

AI/ML in the last few years has undergone a transformation, with both increasing hype but also increasing problems and doubts being raised. Algorithmic discrimination/bias, problems with "bad data", challenges in going live with ML solutions, incoming regulation, discussions on neural networks/connectionist versus/with symbolic approaches (or "neuro-symbolic"), issues raised by face recognition, autonomous robots/warfare, etc. The field is very much in flux, but it's clear to me there's some backlash and more caution and realism are coming to the field.

The good part in what seems to be happening, is that interesting applications are making their way into our lives and organizations, in many areas from industry to science to medicine, and sometimes in "minor" features in things like development tools (VS Code/Studio comes to mind) or Office. Maybe full level 5 autonomous driving is many years away, but the investment will help it progress and find new solutions. While this happens, the Research field (just see the latest NeurIPS) seems to be diverging from Industry -- the former is now tackling other types of problems that are very technical (not in a CompSci sense only, but also Mathematical/Statistical) and in many cases not related to Deep Neural Networks.

Anyway, all this to say that - when I reviewed the recording - I felt the more pessimistic side of what's happening in the industry took more time than the side of the possibilities/real impact the tech is having, which was annoying and doesn't actually reflect my thoughts.

If you understand Portuguese, hear it and let me know what you think.

To close off, a word of thanks to the team for inviting me and doing great work with the podcast!

Friday, December 6, 2019

Quick career update

When you keep a blog for fun since 2003, you get used to the fact that every so often you’ll end up writing a "Hey, this is what's changed", usually followed by "I'll start posting more often (promise)". Well, this is one of those :-).

After 3 great years with Microsoft UK, I've relocated to Microsoft in Germany and started in a new role. In the UK, I started as a Cloud Solutions Architect in the Finance Industry, and 1,5 yrs later moved to CSA Manager for Data&AI focusing in Retail, Travel and Transport. While in this role, I had the privilege of leading a team of fantastic data architects, of hiring a generation of new Data Architects for the Customer Success Unit, and was also the AI Lead for the CSU. I got much closer to the world of Machine Learning/Artificial Intelligence, ranging from the technical to the societal/ethical sides.

Now in Germany, I am Global CSA in the US National CSA Team, meaning I work with US customers with presence in EMEA, supporting them in their transformation journeys to Azure. This meant getting back closer to technology and specifically to the Appdev space.

I still follow the AI/ML space every day, now seemingly entering the "disillusionment" phase of Gartner's hype cycle, with issues raised in bias, ethics, etc., while at the same time the technology keeps progressively entering our lives.

So to sum it up, and getting back to the promise I mentioned at the top: you can expect to see more activity here, and the topics will likely be a mix of Appdev on Azure (e.g.,  LogicApps,  API Management, DevOps or Kubernetes) and AI/ML. In this later topic, I post very often on Twitter on @lokijota on developments and topics I feel are more relevant or interest me - and always trying to avoid the over-hype that plagues the industry.

Monday, December 17, 2018

Royal Society's You and AI panel (11/Nov) - part 2

(part 1 of this post can be read here, and the video of the panel discussion is here)

How can Society Benefit from AI (cont)
After the discussion on explainability, the discussion went to another hot topic: the concentration of data in a very small number of companies. Some points discussed were:
  • "I'm worried about that, but I prefer to have centralization and improved health care, than not having anything"
  • "how do I exert the rights over my data, or on AIs targeting me? Do I as an individual have to go for a lawsuit under GDPR? "
  • "These companies have the Data, but they also have the Talent and the Infrastructure, and these three imply concentration, not many companies are in the same position."
  • "They operate under their own laws, even if they are present in our homes [reference to digital assistants like Alexa]"
All of these are worrying. There are constant news of data breaches in the news (including by the big ones, Google and Facebook), and there is the feeling that these companies are above the law. Having data is just one issue, having the data being used to make decisions which are not transparent is even more worrying. See this case, for example. Access to our data, plus an unexplainable report, people's lives affected, and we have a dystopian Sci-Fi universe with us. I'm sure there are better uses of Data/AI than this.

How do we get there?
This last section was varied, with some points being:
  • "How do we regulate maths?" -- a rethorical question obviously asked by Prof Saria when talking about reasons for AI/Machine Learning not to be regulated, quickly countered by Prof Donnelly who said that it was all about of the context for its use. [1]
  • Reference to the recent news about Amazon's AI-driven Hiring failure, as an example of how software developed by a big company can still fail in a way that reveals obvious biases. "You have an army of smart, male, young smart people building an hammer, but they have never seen an house" [2]
  • Addressing the ongoing conversation about Neural Networks/current Machine Learning approaches not being enough (see here, here and also here), "You can put a Deep Neural Network listening to the BBC 24 hours a day, it will never wake up, or form an opinion"
  • "You don't need Artificial General Intelligence (AGI) to build a killer robot or to keep an autocracy in power"
You could say that the topic for this last section was addressed indirectly, via the discussion on the current concerns around the use of AI (really, Machine "Learning", or even more correct, Data Science). The conversation finished off back in AGI's, and wrapped up with the Turing test (it actually made me go and read his original article).

All in all, an interesting night and conversation, if anything somewhat focused on the risks and failures of AI than the possibilities.

My Take
It's probably obvious from my summary that my opinions are very much on the side of Dr. Vivienne Ming.

In my view, we do have to worry about bias, explainability, fairness, regulation, impact on jobs, etc.. Working for one of the companies developing some of the technology mentioned in the panel, I really appreciate that it has a focus on Ethical AI, in some ways that are public and others which I can't share.

I also doubt the current "Artificial Intelligence" techniques will bring us Artificial General Intelligence. Following daily what comes out on Twitter or in blog aggregators, reding about fantastic advances in things like generating realistic faces (impressive and useless at the same time) and many articles describing tiny new advances, it does make me feel - quoting Judea Pearl, that "All the impressive achievements of deep learning amount to just curve fitting". It's probably a mistake to call it "Artificial Intelligence", or even "Machine Learning", when all it is is "Data Science" and crunching abstract numbers until we have models that represent data in a adequate way. [3]

Admittedly, the panel's discussion was skewed to the side of caution and the problems of AI. There are fantastic possibilities in fields like Health care, Farming, Medical Research, Science, Energy, etc. Again with impacts that we have to worry about, but with clear potential benefits. I try not to blog or post about how "AI will unleash/unlock" whatever it is, but it seems obvious to me that several of these areas will see gains.

A final note to address the "big corporations concentration" part of the discussion -- in October I attended a Instant Expert: Artificial Intelligence panel organized by the New Scientist where I heard a talk by David Runciman ("professor of politics, Leverhulme Centre for the Future of Intelligence, University of Cambridge. Author of How Democracy Ends"). One of his main points was that the powers that these private companies don't have is a) the power of the law and b) the military. When Zuckerberg repeatedly refuses to come to the UK's House of Commons, the House could legislate to make this illegal, and, as it happened in Brazil with WhatsApp, shut down the social network in the country. If it would or will, and what Facebook would do to fight it, is a different story. But the State has that power, and it could do it.

I know I'll return to these topics, but for now I close with Turing's words: "We can only see a short distance ahead, but we can see plenty there that needs to be done."

[1] This made me think of Project Manhattan and the atom bomb in WW2.
[2] But the attempts will continue. And one of these days you may have to give access to your social network profile in addition to your CV when applying for a job...
[3] But hey, what do I know?

Friday, December 14, 2018

Royal Society's You and AI panel (11/Nov) - part 1

The Royal Society is hosting a series of discussion panels on You an AI (this link also has the recordings), intended to be a reflection of different aspects of AI/Machine Learning and its impacts on society.

The most recent debate was hosted by well-known science communicator Professor Brian Cox, and also had as panelists Dr. Vivienne Ming, Prof Peter Donnelly and Prof Suchi Saria. What follows are my comments and personal highlights of the night, very much aligned with the views of Dr. Ming who totally "stole the show" and whose more careful approach towards AI is close to my own.

What is AI?

This was the first topic of the night, and it's a question that has been debated to exhaustion. The two main definitions presented were the general "Autonomous systems who can make decisions under uncertainty" and the Machine Learning-specific "form of AI that learns using examples/by identifying patterns in data". The "under uncertainty" detail is curious -- my reading is that if when is certainty, a simpler deterministic rules-based system could be used.

Something else I found curious was the historical note that "proving theorems automatically (one of the first uses of "AI" in the 1960s) is simpler than determining if people sitting here in the first row are smiling", which has only become feasible to computers in the last 5-6 years. For most of us, the second task is trivial and the first very complex -- and this does seem to imply that current Artificial Intelligence is very different from Human Intelligence.[1]

Who benefits from AI?

Things started to get more interesting from here onwards. Dr. Ming said that typically new technology starts by "benefiting those who need it the least", and relativizing that initial impact. "If you put an app [with AI] in an AppStore, you've solved NOTHING in the world". It's easy to contradict this (just think of Facebook), but what she meant was that you won't solve world hunger, or poverty, or go to Mars, with an app using AI. And in that, she's right.

The discussion then went briefly to the obvious possibilities in terms of healthcare and education, where the potential benefits are huge, but quickly steered into more sensitive topics, namely the impact on jobs. There are several books and frequent studies about this, usually with consulting companies predicting that more jobs will be created (Accenture, Mckinsey), and schollars on the opposite side/predicting the need for deep societal adaptations to cope with the upcoming changes (such as an universal living wage) . One thing is true, "Every CFO tries to reduce costs with wages [and if the opportunity is there to do it / s/he'll take it]".

(By this point in the discussion, it was clear there were two sides on stage: Dr. Ming on the side of need for moderation, and Prof Saria on the side of absolute optimism).

Another interesting point was again made by Dr. Ming: "It's not impossible to create a robot to pick up berries, or to drive a car, but it's much simpler to replace a finantial services analyst" (or a doctor?). The key message here was: AI will probably have more impact on middle-class qualified jobs than lower skilled jobs, just because they are simpler to replace. And in doing that, it will obviously increase social inequality. The argument is just obvious and simple. It's not just the menial/mindless tasks that will be automated, but also many jobs for which people today spend years studying in universities. And this does include software developers, by the way -- how much time is spent writing boilerplate code?

This section ended with something more speculative, "which jobs will be the last to automate?" The suggested answer was - those requiring creativity/creative problem solving (so not only artists, but engineers, etc.) But this may be antropocentric optimism: we see Creativity as being something uniquely human, so naturally we see it as our last bastion "against the machines" - even if animals also have it just to a lesser degree. Today we have AI's winning games like Go or Chess using unique strategies we had never considered, or creating works of art or music. So we shouldn't bet too much on this answer - maybe jobs dealing with the unexpected would be a better answer.

How can Society benefit from AI?

This seemed to be a simpler part of the panel, but it went straight into the topic of explainability, a complicated if not impossible task for the more complex approaches to AI such as Deep Neural Networks. Prof. Saria said she thought the need to explain should simply be replaced by trust. Prof. Donnelly then raised an interesting dilemma: if you suspected you had a health problem, and you could a) be seen by a doctor who gave you a diagnosis with 85% accuracy, and explain all of it properly; or b) be diagnosed by an AI with 95% accuracy but with no explanation, which would you pick? Most of the audience picked the second, but a better option would be c) have an AI augment the human diagnosis, increasing its accuracy and providing the explainability.

It seemed clear that in many cases we'll need some form of explainability (such as when beeing considered for a job, or getting a loan, or in healthcare -- and GDPR actually mandates it), and in others that's less relevant (like in face recognition or flying an airplane). My view is that if it's something that seriously impacts people's lives, it should be explainable. But there is a contradiction in this position: as the books "Strangers to Ourselves" by Timothy D. Wilson and "Thinking fast and Slow" by Daniel Kahneman explore, our brains actually make up explanations on the fly, we're less rational than we think. So there's a double standard at stake, when demanding it of machines. It may all come down to familiarity with humans vs AI, or simply to knowing a bit of how it works under the hood, and being umconfortable with the risk of blind delegation of medical diagnoses or trial decisions or on credit ratings to a complex number crunching/statistical process.

This post is already long, so I'll continue in a part 2. On the meantime, the video of the debate is available here.

[1] Gödel's incompleteness theorems, proving that there are theorems that are true but impossible to prove, were not mentioned, but it doesn't change the argument.

Thursday, December 6, 2018

Microsoft QuantumML tutorial

[this post was written for the 2018 Q# advent calendar]

Two colleagues recently went to Microsoft's internal Machine Learning and Data Science conference, and recommended a tutorial they did on-site, on Quantum Machine Learning. The materials for this lab have just been published on GitHub, and the following are my learnings while doing it.

The lab is implemented with Microsoft's Quantum SDK, using Q# for the Quantum code and C# for the driver code. The goal is to implement a Classifier, a discriminator able to classify a value into one of two classes -- just like a Logistic Regression classifier. Or in other words, a Quantum Perceptron. It is simple to implement if you know the core concepts of Quantum Computing, and most of it is very guided -- you just have to fill in the blanks with the right Quantum primitives, following the instructions in the comments.

Simplifying, what the algorithm does is as follows: imagine you have a pizza cut in two halves at a given angle 𝜃, and the angles higher than that are of class One, and Zero otherwise:

You also have a labeled training data set, specifying for a large number of angle values what the class/label is:

Note that the angles are represented in radians (i.e, the full circle is 0 to 2*PI) instead of 0-360º, but that is a detail. This is equivalent to having normalized data between 0 and 1.

The goal of the lab is first - to implement an algorithm that finds what the separation angle 𝜃, and second - classify new angle values as either class Zero or One. Finding the separation angle (which is equivalent to training a logistic regressor and finding its parameters) is achieved with a mixed of C# and Q# code, while doing the classification is purely Q# quantum code.

Some of the issues that confused me while doing the lab were the following:

Finding the angle - the underlying logic of Quantum Computing features a lof of linear algebra (matrixes) and trigonometry (angles). One important thing to keep in mind is that what the algorithm needs to find is not the separation angle at which the pizza was cut, but an angle perpendicular/90º to it. In the following code snippet, the separation angle is 2.0 (equivalent to 114.6º), but the angle that the algorithm needs to find is "correctAngle". By adding or subtracting PI/2 we get the perpendicular angle:

double separationAngle = 2.0; 
double correctAngle = separationAngle - Math.PI / 2;

The reason for this is related to the quantum transformations available to us. The slidedeck in the Github repo talks about this, but it wasn't immediately clear to me when I read it.

Success Rate - the Main() in the provided C# driver relies heavily on a Q# operator called QuantumClassifier_SuccessRate. What this does is find how well the quantum algorithm can classify the data points in the training data, for a given angle it is called with. The Q# operator returns this as a percentage.
The C# code then calls it multiple times with different angles using ternary search (imagine a binary tree search, but with 3 'halves'), until the error rate is low enough. This is the bulk of the training process, and when it ends it has found a good approximation of the "correctAngle" mentioned above (note that it's not looking for the separation angle 𝜃).

The QuantumClassifier_SuccessRate calls two other operators:
  • EncodeDataInQubits - as an analogy to "classical" machine learning, this can be seen as a sort of data preparation step, where you initialize the Qubits and generate a sort of "quantum feature", dataQubit. The output label is also encoded in a Qubit.
  • Validate - again as an analogy, this can be seen as applying the Hypothesis and check if we're doing the right prediction. It can be useful to think of the CNOT "truth table" to understand this code:

    CNOT(dataQubit, labelQubit);

     Remembering that the CNOT flips the second qubit if the first one is 1 ( |1> really), we have the following "truth table":

    CNOT(0,0) -> 0
    CNOT(1,0) -> 1
    CNOT(0,1) -> 1
    CNOT(1,1) -> 0

    Or, in other words: we have 0 as an output when the dataQubit == labelQubit, i.e., we have done the right prediction. 
Finally, the logic of the QuantumClassifier_SuccessRate itself includes two loops: one to iterate over all the values in the training dataset (0..N-1), and the second repeats each Validate operation several times (1..nSamples, where nSamples = 201 by default) to account for the probabilistic nature of Quantum Computing when you do a Measurement. Again note that nSamples is possibly a misleading name -- this doesn't refer to data samples, but to iterations of the algorithm. You can reduce this number to 100 for example, and you'll see the quality of the predictions will go down.
Doing predictions - as I mentioned in the beggining, a big part of the exercise is working on the training code, implementing the 3 operators mentioned above. For this second part, you have to implement:

a) C# code to generate a new dataset, the test dataset which you will ask your Quantum code to classify;

b) Q# code to actually do the classification. For both parts you can reuse/adapt code you have done before. This is what I ended up with:

operation QuantumClassifier (
    alpha : Double, 
    dataPoints : Double[]) : Int[] {
    let N = Length(dataPoints);
    mutable predictions = new Int[N];

    let nSamples = 201;

    // Allocate two qubits to be used in the classification
    using ((dataQubit, predictionQubit) = (Qubit(), Qubit())) {
        // Iterate over all points of the dataset
        for (i in 0 .. N - 1) {
            mutable zeroLabelCount = 0;
            // Classify i-th data point by running classification circuit nSamples times
            for (j in 1 .. nSamples) {

                // encode
                Ry(dataPoints[i], dataQubit);
                // classify
                Ry(-alpha, dataQubit); 

                CNOT(dataQubit, predictionQubit);

                let result = M(predictionQubit) == Zero;
                if(result == true)
                { // count the number of zeros
                    set zeroLabelCount = zeroLabelCount + 1;

            if(zeroLabelCount > nSamples/2) {
                // if the majority of classifications are zero, we say it's a Zero
                set predictions[i] = 0;
            else {
                set predictions[i] = 1;

        // Clean up both qubits before deallocating them using library operation Reset.
    return predictions;

This code then does the following correct predictions based on a new data set:

And that's it, your Quantum Perceptron is finished :). Now we only need hardware to run it!

While working on the lab I've gone and looked around on the web and found two articles that seem to be related to the approach followed here: "Quantum Perceptron Network" [paywall] and "Simulating a perceptron on a quantum computer", both possibly worth taking a look.

If you already have some basic knowledge of both Machine Learning and Q#, you should expect to spend maybe 2 hours on it.