Skip to content

samd21/51: Hang issue when running tests on the board #900

@notro

Description

@notro

I'm unable to reproduce the problem in #894, but I was able to get another one.

I order to get the tests running at all I first had to do a:

storage.erase_filesystem()

Otherwise I got the same hung error mentioned further down.

I've run the tests 3 times in a row, no problem.

.../tests $ CPBOARD_EXEC_MODE=disk python3 run-tests --device=metro_m4_express --target=pyboard -d basics circuitpython
<snip>

413 tests performed (12423 individual testcases)
401 tests passed
24 tests skipped: array_micropython builtin_compile builtin_delattr builtin_pow3 builtin_pow3_intbig class_delattr_setattr class_descriptor class_new class_notimpl class_reverse_op class_store_class class_super_object exception_chain fun_name namedtuple1_cpython_compat nvm_not_present nvm_present object_dict object_new ordereddict1 ordereddict_eq parser struct_micropython subclass_classmethod
12 tests failed: async_await async_await2 async_def async_for async_for2 async_with async_with2 blinky enumerate errno1 struct1 struct2

Then I press the board reset button and restart run-tests and it just hangs.
After a while I get this in the log:

[ 2947.012329] INFO: task scsi_eh_0:69 blocked for more than 120 seconds.
[ 2947.012343]       Tainted: G         C      4.14.34-v7+ #1110
[ 2947.012348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2947.012356] scsi_eh_0       D    0    69      2 0x00000000
[ 2947.012398] [<8079a0b8>] (__schedule) from [<8079a730>] (schedule+0x50/0xa8)
[ 2947.012415] [<8079a730>] (schedule) from [<8079aba8>] (schedule_preempt_disabled+0x18/0x1c)
[ 2947.012435] [<8079aba8>] (schedule_preempt_disabled) from [<8079c4b0>] (__mutex_lock.constprop.3+0x190/0x58c)
[ 2947.012453] [<8079c4b0>] (__mutex_lock.constprop.3) from [<8079c9c8>] (__mutex_lock_slowpath+0x1c/0x20)
[ 2947.012469] [<8079c9c8>] (__mutex_lock_slowpath) from [<8079ca28>] (mutex_lock+0x5c/0x60)
[ 2947.012487] [<8079ca28>] (mutex_lock) from [<805ea52c>] (device_reset+0x28/0x54)
[ 2947.012505] [<805ea52c>] (device_reset) from [<80571424>] (scsi_try_bus_device_reset+0x34/0x58)
[ 2947.012521] [<80571424>] (scsi_try_bus_device_reset) from [<80572be8>] (scsi_eh_ready_devs+0x25c/0x88c)
[ 2947.012536] [<80572be8>] (scsi_eh_ready_devs) from [<805741dc>] (scsi_error_handler+0x3a0/0x3a4)
[ 2947.012553] [<805741dc>] (scsi_error_handler) from [<8013d860>] (kthread+0x13c/0x16c)
[ 2947.012572] [<8013d860>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[ 2947.012584] INFO: task usb-storage:71 blocked for more than 120 seconds.
[ 2947.012590]       Tainted: G         C      4.14.34-v7+ #1110
[ 2947.012595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2947.012600] usb-storage     D    0    71      2 0x00000000
[ 2947.012622] [<8079a0b8>] (__schedule) from [<8079a730>] (schedule+0x50/0xa8)
[ 2947.012637] [<8079a730>] (schedule) from [<8079e314>] (schedule_timeout+0x1d0/0x3f4)
[ 2947.012653] [<8079e314>] (schedule_timeout) from [<8079b3b4>] (wait_for_common+0xc0/0x184)
[ 2947.012669] [<8079b3b4>] (wait_for_common) from [<8079b498>] (wait_for_completion+0x20/0x24)
[ 2947.012685] [<8079b498>] (wait_for_completion) from [<805b3314>] (usb_sg_wait+0x174/0x188)
[ 2947.012704] [<805b3314>] (usb_sg_wait) from [<805eaffc>] (usb_stor_bulk_transfer_sglist.part.2+0x88/0xec)
[ 2947.012723] [<805eaffc>] (usb_stor_bulk_transfer_sglist.part.2) from [<805eb0bc>] (usb_stor_bulk_srb+0x5c/0x64)
[ 2947.012740] [<805eb0bc>] (usb_stor_bulk_srb) from [<805eb1e8>] (usb_stor_Bulk_transport+0x124/0x370)
[ 2947.012758] [<805eb1e8>] (usb_stor_Bulk_transport) from [<805ebaac>] (usb_stor_invoke_transport+0x30/0x4bc)
[ 2947.012776] [<805ebaac>] (usb_stor_invoke_transport) from [<805ea7b0>] (usb_stor_transparent_scsi_command+0x18/0x1c)
[ 2947.012794] [<805ea7b0>] (usb_stor_transparent_scsi_command) from [<805ecf7c>] (usb_stor_control_thread+0x198/0x2a0)
[ 2947.012811] [<805ecf7c>] (usb_stor_control_thread) from [<8013d860>] (kthread+0x13c/0x16c)
[ 2947.012827] [<8013d860>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[ 2947.012931] INFO: task python3:1658 blocked for more than 120 seconds.
[ 2947.012938]       Tainted: G         C      4.14.34-v7+ #1110
[ 2947.012942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2947.012947] python3         D    0  1658   1657 0x00000000
[ 2947.012973] [<8079a0b8>] (__schedule) from [<8079a730>] (schedule+0x50/0xa8)
[ 2947.012988] [<8079a730>] (schedule) from [<8014b0b4>] (io_schedule+0x20/0x40)
[ 2947.013004] [<8014b0b4>] (io_schedule) from [<8021c9b4>] (wait_on_page_bit+0x110/0x130)
[ 2947.013019] [<8021c9b4>] (wait_on_page_bit) from [<8021cab4>] (__filemap_fdatawait_range+0xe0/0x114)
[ 2947.013033] [<8021cab4>] (__filemap_fdatawait_range) from [<8021cb54>] (filemap_fdatawait_keep_errors+0x34/0x60)
[ 2947.013050] [<8021cb54>] (filemap_fdatawait_keep_errors) from [<802ba9ec>] (sync_inodes_sb+0x20c/0x24c)
[ 2947.013068] [<802ba9ec>] (sync_inodes_sb) from [<802c02c0>] (sync_inodes_one_sb+0x24/0x28)
[ 2947.013088] [<802c02c0>] (sync_inodes_one_sb) from [<8028ddec>] (iterate_supers+0xf0/0x150)
[ 2947.013106] [<8028ddec>] (iterate_supers) from [<802c0664>] (sys_sync+0x44/0xb4)
[ 2947.013122] [<802c0664>] (sys_sync) from [<80108060>] (ret_fast_syscall+0x0/0x28)

Unplugging the board gives:

[ 5006.724981] usb 1-1.4: USB disconnect, device number 6
[ 5006.734420] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 5006.734468] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00
[ 5006.734483] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 00 00 05 00 00 01 00
[ 5006.734494] print_req_error: I/O error, dev sda, sector 5
[ 5006.734514] Buffer I/O error on dev sda1, logical block 4, lost async page write
[ 5006.734578] sd 0:0:0:0: rejecting I/O to offline device
[ 5006.734592] sd 0:0:0:0: killing request
[ 5006.734613] sd 0:0:0:0: rejecting I/O to offline device
[ 5006.734628] sd 0:0:0:0: [sda] killing request
[ 5006.734645] sd 0:0:0:0: rejecting I/O to offline device
[ 5006.734660] sd 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[ 5006.734673] sd 0:0:0:0: [sda] CDB: opcode=0x2a 2a 00 00 00 00 08 00 00 01 00
[ 5006.734694] print_req_error: I/O error, dev sda, sector 8
[ 5006.734703] Buffer I/O error on dev sda1, logical block 7, lost async page write
[ 5006.735030] usb-storage: Error in queuecommand_lck: us->srb = b6ea0ad0
[ 5006.740689] sd 0:0:0:0: rejecting I/O to offline device
[ 5006.740725] sd 0:0:0:0: rejecting I/O to offline device
[ 5006.854473] usb-storage: Error in queuecommand_lck: us->srb = b6ea0ad0
[ 5006.974454] usb-storage: Error in queuecommand_lck: us->srb = b6ea0ad0
[ 5007.094695] cdc_acm 1-1.4:1.0: ttyACM0: USB ACM device
[ 5007.304584] blk_partition_remap: fail for partition 1
[ 5007.304604] FAT-fs (sda1): unable to read boot sector to mark fs as dirty

Plug it in:

[ 5072.424635] usb 1-1.4: new full-speed USB device number 7 using dwc_otg
[ 5072.558439] usb 1-1.4: New USB device found, idVendor=239a, idProduct=8021
[ 5072.558454] usb 1-1.4: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[ 5072.558463] usb 1-1.4: Product: Metro M4 Express
[ 5072.558472] usb 1-1.4: Manufacturer: Adafruit Industries LLC
[ 5072.558480] usb 1-1.4: SerialNumber: 1475264D15E47535020202D453B280FF
[ 5072.561215] cdc_acm 1-1.4:1.0: ttyACM0: USB ACM device
[ 5072.563321] usb-storage 1-1.4:1.2: USB Mass Storage device detected
[ 5072.572012] scsi host0: usb-storage 1-1.4:1.2
[ 5072.578317] input: Adafruit Industries LLC Metro M4 Express as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.3/0003:239A:8021.0004/input/input3
[ 5072.645921] hid-generic 0003:239A:8021.0004: input,hidraw0: USB HID v1.11 Keyboard [Adafruit Industries LLC Metro M4 Express] on usb-3f980000.usb-1.4/input3
[ 5073.615766] scsi 0:0:0:0: Direct-Access                                    PQ: 0 ANSI: 3
[ 5073.619857] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 5073.621912] scsi 0:0:0:1: Direct-Access                                    PQ: 0 ANSI: 3
[ 5073.622787] sd 0:0:0:1: Attached scsi generic sg1 type 0
[ 5073.623302] sd 0:0:0:0: [sda] 4089 512-byte logical blocks: (2.09 MB/2.00 MiB)
[ 5073.625166] sd 0:0:0:0: [sda] Write Protect is off
[ 5073.625184] sd 0:0:0:0: [sda] Mode Sense: 00 00 00 00
[ 5073.626508] sd 0:0:0:0: [sda] Asking for cache data failed
[ 5073.626523] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 5073.627389] sd 0:0:0:1: [sdb] Attached SCSI removable disk
[ 5073.644089]  sda: sda1
[ 5073.647533] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 5074.751345] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

Now I can run the tests again.

I've had many different hang situations when working on #893, but I haven't paid attention to the details. I first press reset, if that doesn't work I unplug the board, if that doesn't work I upload the firmware anew (not sure if this has ever helped), and if that doesn't work I use the flash eraser.
Most of the testing for #893 was done on a Feather M0 Express, and some on a Metro M4 Express.

I can't spend more time working on this, but I've made a bug report after request by @tannewt.
Just close this if it can't be reproduced.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions