DFU flashing unstability
In some cases (~3%), while flashing the main application through the DFU bootloader, the download progress halts.
GDB reports the stacks being corrupted, making it hard to debug.
Copying data from PC to DFU device
Download> [=========== ] 45% 13824 bytesdfu-util: Error during download
dfu-util: can't detach
dnload(altif=1, offset=13824, len=512)
D Translated 0x00407600 to page=118 and offset=0 D USBD_RequestHandler D type=0x1, recipient=0x1 val=0x0 len=6 I DFU: updstatus() D handle_getstatus(0, 5) D USBD_RequestHandler D type=0x1, recipient=0x1 val=0x1c len=512 D COMPLETE
dnload(altif=1, offset=14336, len=512)
D Translated 0x00407800 to page=120 and offset=0
0x00406764 in ?? ()
#0 0x00406764 in ?? ()
#1 0x0040586a in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
#1 Updated by tsaitgaist about 1 year ago
- Status changed from New to Resolved
- % Done changed from 10 to 100
the context when this error happens is quite random:
- it occurs 3% of the time, but only on average (it is not deterministic)
- it occurs at different flashing progress level
- the problem occurs more often when more debug messages are output over serial
- the problem occurs at different places in the code
- when JTAG is attached from the start it does not always occur
- when attached later on, the end state is never the same: sometimes the stack is broken, sometime the PC is outside the DFU flash range, ...
any idea what causes it and how to debug this mess?
hint: it is related to an external factor which can't be controlled by the micro-controller.
resolution: it has to do with the USB activity. when there is not a lot of traffic on the USB hub where the SIMtrace is plugged it, the USB transactions queue perfectly. This causes the micro-controller to always be in Interrupt Service Routines (the serial output is one of them, giving the USB host enough time to provide the next packet to be flashed). Thus the code never continues in the main loop, where the watchdog is reset.
After some time the watchdog triggers and causes a reset. This is why it can happen at any download progress level, and in any part of the code.
When debugging (JTAG attached from the start) the watchdog is not enabled.
Because the beginning of the application code is written, after a reset SIMtrace starts the application (not the DFU bootloader). That explains why the PC is pointing to code outside the DFU bootloader. And since the application is not written completely, it runs garbage, leading to the stack being corrupted.
The watchdog reset can be confirmed by reading the reset register after the error occurs.
To prevent the watchdog from triggering I increased the timeout from 500 ms to 2 seconds, and the watchdog is restarted before a chunk is flashed (https://git.osmocom.org/simtrace2/commit/?h=kredon/simtrace&id=6b38297e20eb31d837024926033f0abc20ddea66).