OSC Lab Task 5 Reasoning

Last updated on October 31, 2023 pm

OSC Lab Task 5 Reasoning

This post is my version of analysis of OSC lab task 5. The full task sheet can be found here. Check the appendix for source code and output csv files.

Task 5

By running task 5 source code task5.c and visualization code vis.py, we can get visual result of process scheduling.

The following result is based on 1 second of experiment duration and 8 processes, output files are named as <pid>.tmp.csv call python script as python vis.py <min_pid> <max_pid> to get the visualization.

1
2
#define NUMBER_OF_PROCESSES 8
#define MAX_EXPERIMENT_DURATION 1

Note that using stdout or stdout with redirection may harm the output format, so we write them to files respectively to keep consistency.

First view

We can clearly see that even within one second, all of the processes are scheduled massively and rapidly, making the plotted dots look like lines.

Zoom in

Zoom in the figure, we can find out that these lines are actually composed by dashed lines with various length. This verified that during the running time, processes are scheduled in a time-sharing manner.

Zoom in further

However, things are not so simple as we thought. When we zoom in further, we can see that some dashes have overlaps on their running time, this is what we are expecting because only one process can run at one time. This is because there is more than one CPU core / hardware threads. Actually, take a closer look at the following figure, we can see that at time 372079, the running process was changed from pid 258 to pid 260, while process 259 keeps running throughout the context switch.

Closer to see dots and overlaps

For example, we can see that process 259 was scheduled since time 372036, and ran until time 372097, at least 61 microseconds in total.Then it was preempted by process 254 at time 372101. A 4 microseconds of context switch cost is reasonable.

From time <code>372075</code> to <code>372082</code>

Even though every microsecond there are multiple printed lines, as their printed time is same, they are plotted on the figure as a single dot. (A zip archive of csv used for this analysis, ranging from pid 251 to 260, can be found at the appendix - CSV.) We can simply say that microsecond is still not the best granularity to this experiment, or in other words, modern CPUs are too fast to be measured.

References

  1. realtime:documentation:technical_basics:sched_policy_prio:start [Wiki]

Appendix

C

The following C program is used to generate the result of task 5.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/*
Some simple results with explanation are at ./README.md
*/

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <unistd.h>

#define NUMBER_OF_PROCESSES 8
#define MAX_EXPERIMENT_DURATION 1

typedef enum
{
false,
true
} bool;

long int getDifferenceInMilliSeconds(struct timeval start, struct timeval end)
{
int seconds = end.tv_sec - start.tv_sec;
int useconds = end.tv_usec - start.tv_usec;
int mtime = (seconds * 1000 + useconds / 1000.0);
return mtime;
}

long int getDifferenceInMicroSeconds(struct timeval start, struct timeval end)
{
int seconds = end.tv_sec - start.tv_sec;
int useconds = end.tv_usec - start.tv_usec;
int mtime = (seconds * 1000000 + useconds);
return mtime;
}

bool exceedDuration(struct timeval start, struct timeval end)
{
if (getDifferenceInMilliSeconds(start, end) > MAX_EXPERIMENT_DURATION * 1000)
{
return true;
}
return false;
}

int main()
{
// get the base time
struct timeval startTime;
gettimeofday(&startTime, NULL);

int i;
struct timeval currentTime;
pid_t pid;
for (; i < NUMBER_OF_PROCESSES; i++)
{
pid = fork();
if (pid < 0)
{
printf("fork error\n");
}
else if (pid == 0)
{
break;
}
}
/*
IO may be disturbed by context switching, especially with stdout redirecting.
Save the output to their own output file respectively, named with their pids.
*/
char file_name[15];
sprintf(file_name, "%d.tmp.csv", getpid());
FILE *f_ptr = fopen(file_name, "w");
while (true)
{
gettimeofday(&currentTime, NULL);
fprintf(
f_ptr, "%ld, %d\n",
getDifferenceInMicroSeconds(startTime, currentTime),
getpid());
if (exceedDuration(startTime, currentTime))
{
fclose(f_ptr);
exit(0);
}
}
}

// gcc task5.c -o a.out -Wall
// ./a.out

Python

The following python script is used to visualize the result of task 5.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
Handle the visualization of the process scheduling in task 5
"""

import sys

from matplotlib import pyplot as plt
import pandas as pd

if __name__ == "__main__":
argc = len(sys.argv)
# argv[1] == <min_pid>
# argv[2] == <max_pid>
assert argc == 3, "Usage: python vis.py <min_pid> <max_pid>"

min_pid = int(sys.argv[1])
max_pid = int(sys.argv[2])
df = pd.DataFrame(columns=["time", "process"])

for pid in range(min_pid, max_pid + 1):
with open(str(pid) + ".tmp.csv", "r", encoding="utf-8") as f:
csv = pd.read_csv(
f,
header=None,
names=["time", "process"],
)
df = pd.concat([df, csv])

plt.scatter(df["time"].values, df["process"].values, s=1)
plt.xlabel("time")
plt.ylabel("process id")
plt.ylim(min_pid - 1, max_pid + 1)
plt.xlim(min(df["time"].values), max(df["time"].values))
plt.show()
exit(0)

CSV

CSV files used for analysis


OSC Lab Task 5 Reasoning
https://lingkang.dev/2023/10/31/osc-lab-task5/
Author
Lingkang
Posted on
October 31, 2023
Licensed under