Post-Mortem Analysis for Multiprocessing Program in C

Last updated on December 24, 2023 pm

Post-Mortem Analysis for Multiprocessing Program in C

When a program crashes, sometimes there would be a core dumped message.

1
make: *** [Makefile:33: run] Segmentation fault (core dumped)

This core dump file is extremely useful for post-mortem analysis.

Enable Core Dump

Here is how to enable core dump on CentOS.

Set ulimit

If the value of ulimit -c is 0, then core dump is disabled.

1
ulimit -c unlimited

This operation only affect current shell session, so better to add it to users’ .bashrc.

1
echo 'ulimit -c unlimited' >> ~/.bashrc

Config dumped files location

On CentOS, the default core dump destination is defined in /proc/sys/kernel/core_pattern, which will require sudo permission to modify.

1
echo 'dumps/core.%e.%t.%p' | sudo tee /proc/sys/kernel/core_pattern

Here we set the dumped file located in dumps directory of current working directory, with filename as core.%e.%t.%p. The placeholders’ meaning:

  • %e: name of the executable
  • %t: timestamp of dumping, in seconds since the UNIX Epoch
  • %p: process ID of the task

Inside a Docker container, the /proc/sys/kernel/core_pattern is Read-Only. This is because Docker on Windows uses WSL2 as backend. So simply change it in WSL2 would work. However, this might be flushed after reboot.

Analyze Core Dump

Now that core dump is enabled, we can use gdb to analyze a crashed program’s exiting state.

1
gdb <executable> <core dump file>

Here are some useful commands:

  • bt: backtrace, show the stack trace
  • info locals: show the local variables of current stack frame
  • frame <frame id> or f <frame id>: switch to a specific stack frame which is shown in backtrace
  • list: show the source code of current stack frame

POSIX Threads

  • info threads: show all threads’ information, current thread is marked with *
  • thread <thread id>: switch to a specific thread
1
2
3
4
5
6
(gdb) info threads
Id Target Id Frame
4 Thread 0x7f9f675be700 (LWP 9983) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 3 Thread 0x7f9f685c0700 (LWP 9981) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
2 Thread 0x7f9f68dc4740 (LWP 9980) 0x00007f9f68998017 in pthread_join (threadid=140322627389184, thread_return=0x0) at pthread_join.c:90
1 ...

Mutex

If a thread is stuck at __lll_lock_wait() function, then it is waiting for a mutex.

1
2
3
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 ...

Use p (pthread_mutex_t) <mutex> to print a mutex’s value.

1
2
3
(gdb) p (pthread_mutex_t) my_mutex
$0 = {__data = {__lock = 2, __count = 0, __owner = 9982, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\376&\000\000\001", '\000' <repeats 26 times>, __align = 2}
  • __owner: the ID of the thread who locks the mutex at the moment
  • __nusers: the number of threads who are waiting for the mutex
  • __kind: the type of the mutex, 0 stands for PTHREAD_MUTEX_NORMAL

Condition Variable

If a thread is stuck at pthread_cond_wait() function, then it is waiting for a condition variable.

1
2
3
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 ...

Similarly, use p (pthread_cond_t) <cond> to print a condition variable’s value.

1
2
3
4
5
6
7
8
(gdb) p (pthread_cond_t) my_cond1
$20 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
__broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}
(gdb) p (pthread_cond_t) my_cond2
$21 = {__data = {__lock = 0, __futex = 1, __total_seq = 1, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x603160 <my_mutex2>,
__nwaiters = 2, __broadcast_seq = 0},
__size = "\000\000\000\000\001\000\000\000\001", '\000' <repeats 23 times>, "`1`\000\000\000\000\000\002\000\000\000\000\000\000",
__align = 4294967296}

Analyze Deadlocked Program

Even though this post is mainly about analyzing core dump, here is a short tip when a running program is deadlocked.

Replace the <pid> with the process ID of the program, which can be found by top command.

  1. Use gdb --pid <pid> to attach to the program and debug as usual.
  2. Save the current state of the program as a core dump file.
    1. Use gcore or gcore <name> to save the core when inside gdb.
    2. Use gcore <pid> or gcore -o <name> <pid> to dump the core when in shell.

References


Post-Mortem Analysis for Multiprocessing Program in C
https://lingkang.dev/2023/12/02/core-dump/
Author
Lingkang
Posted on
December 2, 2023
Licensed under