python ../../test/stress_tests.py TaskTests.testGettingManyObjects
fails consistently on my branch (but not on the master).
B?e?ƭ????ewi??z?u?55??A?YjݳT\?M?h?%9L[?sk??P1?ߺ?qV?n_??r????????????ƕ4??A?YjݳT\?M?h?%9L[[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd -1940912352. The client may have hung up.
[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd -1941952336. The client may have hung up.
B?e?ƭ????ewi??z?u?5?er??Sv
1???|?^???sk??P1?ߺ?qV?n_??r????????????ƕ?er??Sv
1???|?^??B?e?ƭ????ewi??z?u?5?k?q?a@|Uo ?????9jF?sk??P1?ߺ?qV?n_??r????????????ƕ?k?q?a@|Uo ?????9jF[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd 79083733. The client may have hung up.
B?e?ƭ????ewi??z?u?5?`?[?σ??G*?Cn? M?sk??P1?ߺ?qV?n_? ?r????????????ƕ?`?[?σ??G*?Cn? Mphoton_scheduler(21871,0x7fffc08ce3c0) malloc: *** error for object 0x7fef8c6001c8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6 (core dumped)
I've seen these strange file descriptor values before, which is why I bring this up.
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fffb7c0cdda libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
* frame #0: 0x00007fffb7c0cdda libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fffb7cf8787 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fffb7b72420 libsystem_c.dylib`abort + 129
frame #3: 0x00007fffb7c6cfb1 libsystem_malloc.dylib`szone_error + 626
frame #4: 0x00007fffb7c61c9e libsystem_malloc.dylib`tiny_malloc_from_free_list + 1148
frame #5: 0x00007fffb7c60522 libsystem_malloc.dylib`szone_malloc_should_clear + 400
frame #6: 0x00007fffb7c63d57 libsystem_malloc.dylib`szone_realloc + 2858
frame #7: 0x00007fffb7c631d5 libsystem_malloc.dylib`malloc_zone_realloc + 115
frame #8: 0x00007fffb7c630c4 libsystem_malloc.dylib`realloc + 256
frame #9: 0x0000000104b5df50 photon_scheduler`sdscatlen [inlined] sdsMakeRoomFor(s=<unavailable>, addlen=20) + 35 at sds.c:142 [opt]
frame #10: 0x0000000104b5df2d photon_scheduler`sdscatlen(s=<unavailable>, t=<unavailable>, len=20) + 45 at sds.c:241 [opt]
frame #11: 0x0000000104b5b9f7 photon_scheduler`redisvFormatCommand(target=<unavailable>, format=<unavailable>, ap=<unavailable>) + 871 at hiredis.c:264 [opt]
frame #12: 0x0000000104b61941 photon_scheduler`redisAsyncCommand [inlined] redisvAsyncCommand(format=<unavailable>, ap=<unavailable>) + 8 at async.c:654 [opt]
frame #13: 0x0000000104b61939 photon_scheduler`redisAsyncCommand(ac=0x00007fef8c600000, fn=(photon_scheduler`redis_task_table_add_task_callback at redis.c:768), privdata=0x000000000000018e, format=<unavailable>) + 153 at async.c:669 [opt]
frame #14: 0x0000000104b339d7 photon_scheduler`redis_task_table_add_task(callback_data=0x00007fef8da012a0) + 535 at redis.c:792
frame #15: 0x0000000104b36a11 photon_scheduler`init_table_callback(db_handle=0x00007fef8c4028b0, id=(id = "?xİ?`?fz?:???\x93???), label="task_table_add_task", data=0x00007fef8da011e0, retry=0x0000000104b6e948, done_callback=0x0000000000000000, retry_callback=(photon_scheduler`redis_task_table_add_task at redis.c:783), user_context=0x0000000000000000) + 1201 at table.c:52
frame #16: 0x0000000104b3923e photon_scheduler`task_table_add_task(db_handle=0x00007fef8c4028b0, task=0x00007fef8da011e0, retry=0x0000000104b6e948, done_callback=0x0000000000000000, user_context=0x0000000000000000) + 174 at task_table.c:20
frame #17: 0x0000000104b233fe photon_scheduler`give_task_to_global_scheduler(state=0x00007fef8c402810, algorithm_state=0x00007fef8c500390, spec=0x00007fef8d900000) + 654 at photon_algorithm.c:419
frame #18: 0x0000000104b23499 photon_scheduler`handle_task_submitted(state=0x00007fef8c402810, algorithm_state=0x00007fef8c500390, spec=0x00007fef8d900000) + 121 at photon_algorithm.c:438
frame #19: 0x0000000104b1dd70 photon_scheduler`process_message(loop=0x00007fef8c4027b0, client_sock=21, context=0x00007fef8c5010a0, events=1) + 288 at photon_scheduler.c:263
frame #20: 0x0000000104b3a4bb photon_scheduler`aeProcessEvents(eventLoop=0x00007fef8c4027b0, flags=3) + 539 at ae.c:412
frame #21: 0x0000000104b3ab6e photon_scheduler`aeMain(eventLoop=0x00007fef8c4027b0) + 94 at ae.c:455
frame #22: 0x0000000104b26d15 photon_scheduler`event_loop_run(loop=0x00007fef8c4027b0) + 21 at event_loop.c:56
frame #23: 0x0000000104b1eb10 photon_scheduler`start_server(node_ip_address="127.0.0.1", socket_name="/tmp/sched1", redis_addr="127.0.0.1", redis_port=6379, plasma_store_socket_name="/tmp/s1", plasma_manager_socket_name="/tmp/m1", plasma_manager_address="127.0.0.1:23894", global_scheduler_exists=true, start_worker_command=0x0000000000000000) + 608 at photon_scheduler.c:429
frame #24: 0x0000000104b1f3f2 photon_scheduler`main(argc=13, argv=0x00007fff5b0e39a0) + 2226 at photon_scheduler.c:520
frame #25: 0x00007fffb7ade255 libdyld.dylib`start + 1
So the error is happening somewhere in hiredis.
I don't fully understand the problem here, and I'm unsure whether it was introduced in #245 or exposed by it.