Nick Arner

Visual Debugging Now! - Guest Post

Posted at — May 20, 2022

Guest post from Anton Troynikov

Over the last seven or so years of working in robotics and computer vision, the bitter lesson I’ve learned is that the bug is almost always in the part of the algorithm you didn’t visualize.

This is a call to fix that problem once and for all.

The Problem

Many domains, including robotics, foundationally rely on highly performant numerical software. But since humans are adapted to understand space visually, we don’t have good intuitions for numerical data. Things that are obvious to human vision like “does this have a realistic pose in space” are very unclear in numerical representations - can you picture a quaternion intuitively from its vector representation? Can you see from a Hessian matrix whether your optimization is converging?

Visualizing this type of data is the obvious thing to do to make it understandable by humans, but this is very difficult for a variety of reasons.

Code in these domains is usually written in Python (prototyping) or C++ (production).

There are plenty of additional problems this causes:

How do teams get around this?

Rendering out just the final output of the algorithms (e.g. robot behavior) or ROS messages or whatever, is too far removed from the inner loop of development. You have to build the whole thing end-to-end before you even start finding the bugs in the core loops, which might not even appear for a long time if the behavior is sufficiently constrained.

The real world is complex, and bugs in numerical computing can be very subtle. The system can still ‘basically work’. I have horror stories about absolutely incredible bugs in perception systems discovered years after the system was ‘in production’. I have seen hundreds of engineer-hours wasted chasing a simple sign change which would have been obvious if it was easy to look into the tight inner optimization loops.

Naturally the situation creates a huge drag on velocity and overall just a train wreck. Is my bug in my code or in the code for my viz? The bug is probably in the part we can’t see, so we have to infer something from the output about what’s broken. Hope you like printf.

What To Build

You know what doesn’t suck as an environment for numerical computing? MATLAB.

MATLAB kicks ass at viz. Every single object is a matrix, an array, or a struct which eventually recurses to either a matrix or an array at the leaves. MATLAB visualizes its basic data structures natively and interactively, and can display and update them in real-time. The execution environment is also the IDE and debugger, so you can add a breakpoint, pause execution conditionally, and call a viz function in the middle of your code running, then change some variables (but not the code) and see what happens. It’s not perfect - you can’t alter code in-place while debugging, and it’s not performant as a production language. For prototyping spatial computing algorithms it beats the shit out of Python (and has faster numerics and better libraries), but ML people use Python so everyone uses Python.

So build a MATLAB-like environment but for Python and C++. Here’s how you do it:

Combined with a library of shared visualization transforms, this ‘visual numerics IDE’ will accelerate a large fraction of the programming work for some of the most important disciplines in technology, and meets programmers where they are instead of forcing them to learn something new or creating another standard they must switch to.

There is probably initially very little money in trying to sell this but there could be huge and important developer adoption, and so it must be built as an open source project. If you are interested in building this, you can find me on twitter at @atroyn .