Augmented reality running in your browser - no app required! By combining some simple image processing algorithms and machine learning we can create something pretty cool.

If you found this video interesting please hit the subscribe button on the channel - there will be follow up videos on more machine learning topics.

A long time ago I wrote an app for the iPhone that let you take a grab a sudoku puzzle using your iPhone’s camera.

Recently when I was investigating my self-organising Christmas lights project I realised that the browser APIs and ecosystem had advanced to the point that it was probably possible to recreate the system running purely in the browser.

Self-organising lights: https://youtu.be/Ueim2Ko8VWo

Things like TensorFlow and TensorFlow.js make building the digit recognized straightforward.

As you can see it works pretty well - you can try it out for yourself here: https://sudoku.cmgresearch.com/

And of course, all the code is in GitHub: https://github.com/atomic14/ar-browser-sudoku

Hopefully, this video will give you a good idea of how the system works and the thinking behind what I’ve done.

We’re taking a feed from the camera on the device. This comes into us as an RGB image. We’re not really interested in colour as we’re working with printed puzzles which will typically be printed in black and white.

So our first step is to convert from RGB to greyscale.

Convert to greyscale: 01:58

We’re using morphological operations for locating the puzzle - typically these work on black and white binary images, so our next step is to binarise our image.

Thresholding: 02:33

Next, we need to identify the blob that is the puzzle and work out the coordinates of each corner of the puzzle grid.

Locating the puzzle: 03:42

Using these four corners we can compute a tomography between our camera image and an “ideal” image of the puzzle.

Puzzle Extraction: 05:17

You can see more details on the algorithm used for this here: http://www.cse.psu.edu/~rtc12/CSE486/lecture16.pdf

Once we’ve got the square puzzle image we need to extract the contents of each individual cell. We examine the connected region inside the box and use the bounds of this to extract an image of the digit.

Digit Extraction: 06:34

We now run a neural network that has been trained by TensorFlow using TensorFlow.js. The network is trained in an interactive Jupiter notebook available at the GitHub link.

Training the neural network: 7:12

To solve the puzzle we use Donald Knuth’s Dancing Links and Algorithm X - https://en.wikipedia.org/wiki/Knuth%27s_Algorithm_X

To do this we encode the puzzle as an exact cover problem.

Solving the puzzle: 11:44

Finally, we can display the results back on top of the camera feed to give us our Augmented Reality display.

Displaying the results: 15:36

I hope you’ve enjoyed this video - please hit the subscribe button and leave any thoughts you might have in the comments.


Want to help support the channel? I’m accepting coffee on https://ko-fi.com/atomic14