How to make your ESP32-S3 code faster.

View All Posts

read

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tools · Support

« AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic

Streaming Video From an SD Card on the ESP32. »

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

I put my ESP32-S3 dev board from PCBWay through a quick performance workout by decoding a baked-in animated GIF with Larry Bank’s decoder and tweaking ESP-IDF settings. Cranking the CPU to 240MHz gave the expected ~1.5× bump, -Os beat -O2, switching flash from DIO to QIO shaved a bit more, and turning the caches up to 11 pushed it further. Best combo: 240MHz, -Os, QIO, max caches (with a larger partition and watchdog off). Nice little speed win.

ESP-IDF 6 Setup in VS Code: Blink & Debug an ESP32-S3 - Time to get with the program and jump to ESP-IDF v6.0.1 in VS Code. I show you how to install the prerequisites (I’m on a Mac with Homebrew), run the ESP-IDF Installation Manager, spin up a Blink example for an ESP32-S3, tweak menuconfig (GPIO21, 500 ms), then build and flash—switching from stubborn USB JTAG to trusty UART when things fail. We check the monitor, bump CPU to 240 MHz, fix flash to 16 MB, peek at PSRAM, and give PCBWay a quick shout-out.

Super Easy ESP32-S3 Dev Board - Making an ESP32-S3 dev board is way easier than it looks. I simplify the datasheet reference: skip the external crystal, wire native USB D+/D− (pins 19/20) straight to a USB-C with 5.1k CC pulldowns, add a BOOT switch and an EN RC reset, and power it with an LD117 LDO that’s happy with ceramic caps. In KiCad I build the schematic with Espressif libraries, add LEDs for 5V, 3V3, and a blink GPIO, set up net classes, route a clean USB differential pair, stitch a solid ground plane, and label everything. It’s a bit wide—just gang breadboards together—and you end up with a neat, professional S3 dev board you can flash and debug over USB.

ESP32 SD Card Speedup With a Couple of Lines of Code - In this video, we explore the disappointingly slow data writing speed of the ESP32 when reading and writing to an SD card in our TinyTV project. With 500 kilobytes/sec reading and a dismal 270 kilobytes/sec writing, we embark on an adventure to find a solution. After ditching the Arduino code in favor of IDF functions, we discover incredible improvements. Seeing potential risks, I propose a truly bonkers plan: using a IC to interface SD cards with USB with a USB multiplexer switch and another switch to alternate between ESP32 and the GL823. This could be a total disaster, but I'm game for the challenge. Stay tuned to see if it works out!

I Feel the Need – The Need for Hardware SPI… - An insightful iteration on my Arduino Nano esp32 video. Despite criticism regarding the slow display update speed, a solution was found thanks to the helpful fellow, Nick. Turns out, the software SPI was the cause of the issue. A quick tweak in the code and voilà, we've got ourselves an SPI clock whizzing at 80 megahertz. Quite the speed boost for just a few lines of code alteration!

I Made My ESP32-S3 Faster by Optimizing for SIZE — Not Speed - I tuned a hot path on an ESP32-S3 and found the counter-intuitive winner: optimizing for size (-Os) beat optimizing for speed (-O2). On this chip, flash latency and instruction cache behavior dominate—tighter code stays in cache, misses less, and runs faster. Bumping the CPU to 240 MHz, switching flash to QIO, and maxing caches helped too, but -Os still edged out -O2. As always, benchmark your own workload—yours might differ.

A Faster ESP32 JPEG Decoder? - An intriguing issue appeared in the esp32-tv project that deals with speeding up JPEG file decoding using SIMD (Single Instruction Multiple Data) instructions, showing immense performance boost. However, there were some notable differences in speed when it comes to drawing the images versus simply decoding them. The problem was found to be with the DMA drawing mechanism and the way the new fast library decodes the image all at once. But despite this hiccup, by overlapped decoding and displaying process, a high frame rate can still be achieved. Joined me in this dissecting process and my initial tests showing approximately 40 frames per second display rate, on our journey to find the most efficient way to get images on screens.

Pong Cam - My ESP32S3 Thinks It's a WebCam! - I turned an ESP32-S3 into a plug-and-play USB webcam—with no camera attached. The ESP32 generates its own frames, encodes them as JPEGs, and streams them over UVC as MJPEG. I ramp it up from a static BBC test card, to animated GIF playback, and finally a fully playable Pong running at a solid 30 FPS. Under the hood: Espressif’s UVC device component (TinyUSB), Bulk mode for stability, AnimatedGIF for decoding, and esp_new_jpeg for fast JPEG encoding (~21–23 ms/frame). It’s a fun proof that the S3 can be a real-time “display” straight into your PC.

ESP32-S3 USBMSC - Can we make it faster? - After lots of tinkering, I've managed to improve the speed of writing to the SD Card of my ESP32-TV considerably, but it's still not as fast as I'd like. The Arduino 'readRaw' and 'writeRaw' functions were the culprits, they can only write one sector at a time! After bypassing this and using IDF functions, writing speed improved by 70%. I also experimented with writing to the SD Card in the background, which ironically yielded even better results. However, it's still slower than I'd like, so I've got a crazy new plan: using a cheap IC (GL823) for SD card interfacing and a USB multiplexer switch to swap connections between ESP32 and GL823. It's a wild ride, but that's how we make progress!

I made a 10 Cent MCU Talk - I taught a 10¢ CH32V003 RISC‑V MCU to talk. By bit-banging PWM as a DAC and using a tiny 2‑bit ADPCM decoder, I squeezed ~6–7 seconds of recognizable audio into 16 KB of flash. Then I went full retro with LPC speech synthesis (Think: Speak & Spell) using the Talkie library and a small web tool I built to generate LPC data—so this little 8‑pin chip can now play samples and ‘speak’ lots of words with only a few kilobytes.

The ESP32-S3 is a pretty amazing CPU.
It’s a dual core RISC-V processor.
I’m using my dev board that I got made by PCBWay.
It came out pretty well.
There’s a module on here.
It’s pretty normal.
There’s nothing special about it.
It does have eight megabytes of flash and eight megabytes of PS RAM.
I’ve been doing some projects recently where I really need to squeeze
the most bang for my buck out of the CPU.
I thought I’d try out some settings on the ESP-IDF
and see what gives us the most power out of the CPU.
I set up a project here.
It’s a very simple project.
All I’m doing is decoding a GIF.
I’ve baked this animated GIF into the flash
and I’m using Larry Bitbank’s amazing animated GIF decoder.
So at the moment my configuration has been reset to the out of the box
what you get with the Hello World application.
There’s just a few things I need to do to make sure our code actually runs.
The first thing is our animated GIF is actually quite big.
I need to set the partition table so it actually fits.
We’ll do a large application and I also need to turn off the watchdog timer
because we’re doing some performance code
so I don’t want to actually trigger the watchdog timer and kill my application.
So I’ll just turn that off.
If we save this and build in flash we get a nice baseline measurement
of how fast our code actually runs.
That’s our test run.
We’ve got our initial baseline value so let’s just record that.
So it’s like 1.4 seconds is the total time.
Look at the milliseconds.
That’s our baseline.
Now obviously the first thing you can do on the ESP32
is you can control the CPU clock speed.
An obvious thing to change is just switch to 240 megahertz instead of the default 160.
So let’s quickly run that and see how that works out.
Okay so we’ve got the results.
Let’s copy that into our spreadsheet.
240 megahertz and we should expect around a 1.5 times improvement which is what we see here.
Now there’s more options we can try so if we look at the compiler options
then we have our optimization level.
So currently we’re just compiling for debug.
Now there’s two interesting options here.
We have optimize for size and optimize for performance.
Now I had endless debates in my first job around which was best.
There is a strong argument that optimize for size is the best thing to choose.
Firstly we’re trying to fit our program into a small amount of flash.
Obviously my board has 8 megabytes of flash quite a lot but most modules have around 2 megabytes.
So you might want to optimize for size just so you can fit your program on the actual device.
The other reason for choosing optimize for size is that many CPUs have instruction caches.
If you make your code smaller it’s more likely to fit in the cache and you’ll get fewer cache misses.
Let’s optimize for size first and see how well that works.
There we go another improvement.
We’re even faster.
Let’s try out the optimize for speed and see if that does something else.
Optimize for performance.
That’s pretty interesting.
Our time’s actually gone back up.
Maybe the arguments for optimize for size are actually quite valid.
So let’s switch back to optimize for size.
That seems to give us the best result.
Now the other interesting thing we could do is we are reading quite a lot of erb data from flash.
So we have an embedded GIF and it’s quite large.
Now an interesting setting we can try changing is the flash SPI mode.
So by default this runs on DIO.
You can change this to QIO.
Now this is interesting because it controls how fast the chip is flashed
but also how fast code is run from it.
So if we save this let’s see what effect it has.
So it’s actually had a pretty significant effect.
We’ve shaved off about 1.2 percent from our time.
That’s pretty impressive.
Now there are some even more esoteric options.
So if we look down at the CPU section.
If I can find it.
So there’s all of these cache configuration options.
So we can bump up the instruction cache size.
It defaults to eight ways.
We can also do the data cache size so 64 kilobytes.
And we can do the data cache line size of 64 bytes.
So let’s see what effect these settings have.
So we’ve turned it up to number 11.
Very special because if you can see the numbers all go to 11.
Look right across the board.
11, 11, most of the amps go up to 10.
And it’s actually improved the performance even faster.
So that’s pretty cool.
Let’s just fix my spelling of number.
Now I’m kind of intrigued as to if we now switch back to 02
with our new cache sizes.
Does it have an improvement?
So let’s try that.
Go back to compiler options.
Optimize for performance.
So in theory bigger cache sizes should mean that we get the benefits
from our optimized for performance without needing to optimize for size.
Because our code should now fit in the larger caches.
But let’s see what actually happens.
So that’s very interesting.
I mean it has improved our 02 value.
So before it was 962 milliseconds.
It’s now 933 milliseconds.
But it’s still not as good as osmall plus all the other changes and the caches.
Which seems to be our winner.
So that’s pretty interesting.
So I think that’s our best combination of things.
So optimize for size.
Turn the caches up to maximum.
Obviously 240 megahertz.
And switch on QIO if you can.
So pretty interesting.
So I think this is our number one winner.
So I will use that for my performance critical code from now on.
So pretty interesting.
Thanks for watching.
Well I hope you found it interesting.
It was a bit code heavy but interesting for me.

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tools · Support

How to make your ESP32-S3 code faster.

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

How to make your ESP32-S3 code faster.

Related Videos

Related Posts

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...