-
Notifications
You must be signed in to change notification settings - Fork 5
/
index.html
executable file
·357 lines (283 loc) · 13.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
---
layout: default
title: StackTrack
---
<blah>
<div class="blurb">
<h1 style="display: inline">StackTrack</h1> — Call graph visualization and execution tracking
<div id="ToC_div"></div>
<h2 id="Introduction">Introduction</h2>
<p>This site presents a code flow visualization and execution tracking tool called StackTrack. An example code flow graph for an execution of the kexec_file_load linux system call is shown below.
<iframe id="demo_frame" src="html/demo.html?json=/json/sys_kexec_file_load_callees-trace.json&depth=4&y=8&x=200" scrolling="no" onclick="javascript:alert(1);"></iframe>
<p>
When browsing large codebases, it is often unclear how a certain function can be reached. StackTrack tries to solve this problem for the linux kernel by generating call graphs.
The graphs are represented as expandable trees.
Examples can be found <a href=#traced_syscalls>here</a></p>
</p>
<p>For the sake of brevity this page is lacking a lot of implementation details. Should you need more information regarding some of the details, please feel free to drop me an email.
</p>
<h2 id="Xrefs">Cross Reference Database, Graph Datastructure and Visualization</h2>
<p>
A function call cross reference database table was built by parsing the output of the <a href=http://www.gson.org/egypt/egypt.html>egypt</a> tool. This mysql database dump can be downloaded from the <a href="https://github.com/stacktrack/stacktrack.github.com/tree/master/database">github repository</a>. It consists of a single table with two columns: caller and callee.
</p>
<p>
The <a href="https://github.com/stacktrack/stacktrack.github.com/blob/master/src/Graph.py">Graph.py</a> python script creates a "Graph" datastructure based on this table. This graph consists of nodes and edges. The graph information can be exported in json format with the dump_node(s) method. The script can be invoked from the cli to retrieve a callgraph from the xrefs database in json format.
The command line options of this script are
<pre>
usage: Graph.py [-h] [-v VERBOSE] [-r] [-e] [-a] [-d DIRECTORY]
[nodes [nodes ...]]
positional arguments:
nodes
optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
Logging level (10: debug, 20 : info, etc )
-r, --callers Export caller tree
-e, --callees Export caller tree
-a, --allnodes Dump all nodes
-d DIRECTORY, --directory DIRECTORY
Output directory
</pre>
The "nodes" cli arguments are function names.
The script is able to lookup codepaths in both caller and callee directions.
The following command for example generates a graph of all callees for the sys_kexec_file_load system call:
<pre>
$ python Graph.py -e sys_kexec_file_load
INFO:root:dumping callees of Node sys_kexec_file_load to /tmp/sys_kexec_file_load_callees.json
</pre>
The javascript renders <a href="/tree.html?json=/json/sys_kexec_file_load_callees.json&trace=/json/sys_kexec_file_load_callees-trace.json">this graph</a> using <a href="json/sys_kexec_file_load_callees.json">this json file</a> for the static call graph and <a href=json/sys_kexec_file_load_callees-trace.json>this file</a>for the dynamic call graph (execution trace) coloring</a> .
</p>
<p>
The json graph output is rendered using the <a>d3js</a> javascript visualization library.
</p>
<h2 id="Installation">Installation</h2>
<p>
The source code for this project is available in the <a href=https://github.com/stacktrack/stacktrack.github.com>github repository</a> . To test the execution tracking you will need a linux kernel with debug symbols and a <a href="https://www.kernel.org/doc/Documentation/gdb-kernel-deb>ugging.txt">gdb session</a>. gdb must by The graph generating code is written in python and uses a mysql server.
Following items are explained in the <a href="https://github.com/stacktrack/stacktrack.github.com/blob/master/src/README.md">Readme</a> file in the src directory from the repository:
<ul>
<li>Installing the database and running the Graph.py script to generate json call graphs.</li>
<li>Rendering the json graphs with javascript.</li>
<li>Generating Call traces.</li>
</ul>
</p>
<h2 id="Tracking">Tracking</h2>
<p>
To track execution we place breakpoints on the functions that are in the codepath of our interest. Every time a breakpoint is hit, we query the cross-reference database to retrieve all functions that can be called next (the callees) and we place breakpoints on these functions as well. This process allows us to track the program flow.
<p>
The tracking is implemented in the <a href="https://github.com/stacktrack/stacktrack.github.com/blob/master/src/breakpoint.py">breakpoint.py</a> python code which is called from within gdb.
This trace flow is also saved in a <a href="https://github.com/stacktrack/stacktrack.github.com/blob/master/src/Graph.py">Graph datastructure</a>.
</p>
<p>
<script>
function toggle_vis(id){
display = document.getElementById(id).style.display;
if(display == 'inline'){
document.getElementById(id).style.display = 'none';
}
else{
document.getElementById(id).style.display = 'inline';
}
}
</script>
<a href="javascript:toggle_vis('gdb_sample');"> Read more</a>
</p>
<div id="gdb_sample">
{% highlight bash session linenos=table%}
ubuntu pygdb # gdb -q vmlinux
Reading symbols from vmlinux...done.
(gdb) set pagination off
(gdb) target remote u:8864
Remote debugging using u:8864
native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
self.func_name = func_name
50 }
(gdb) python
>import breakpoint, importlib
>importlib.reload(breakpoint)
>breakpoint.KxrBreakpoint('sys_chdir')
>end
loaded
DEBUG:root:dumping Node hrtimer_cancel to /tmp/hrtimer_cancel.json
DEBUG:root:Ignoring Node hrtimer_cancel
DEBUG:root:dumping Node getname_flags to /tmp/getname_flags.json
DEBUG:root:Ignoring Node getname_flags
DEBUG:root:dumping Node __wake_up to /tmp/__wake_up.json
DEBUG:root:Ignoring Node __wake_up
...
{% endhighlight %}
The node(s) dumped contain the trace.
{% highlight python linenos=table %}
class STBreakpoint(gdb.Breakpoint):
>
loaded
Breakpoint 1 at 0xffffffff811f7dc0: file fs/open.c, line 422.
(gdb) c
Continuing.
Breakpoint 2 at 0xffffffff817c0990: file arch/x86/kernel/mcount_64.S, line 151.
Breakpoint 3 at 0xffffffff81209420: file fs/namei.c, line 2327.
Breakpoint 4 at 0xffffffff81203fd0: file fs/namei.c, line 459.
Breakpoint 5 at 0xffffffff81203560: path_put. (36 locations)
Breakpoint 6 at 0xffffffff8122d310: file fs/fs_struct.c, line 33.
Breakpoint 7 at 0xffffffff8107a090: file kernel/panic.c, line 493.
DEBUG:root:Creating Node sys_chdir 0ncel.json
DEBUG:root:Ignoring Node hrtimer_cancel
DEBUG:root:dumping Node getname_flags to /tmp/getname_flags.json
DEBUG:root:Ignoring Node getname_flags
DEBUG:root:dumping Node __wake_up to /tmp/__wake_up.json
DEBUG:root:Ignoring Node __wake_up
# node = self.node = self.graph.add_node(func_name)
# if not node in self.graph.nodes: self.graph.load_node(self.node)
self.parent = parent
STBreakpoint.bplist.add(func_name)
#print('ini %s'%str(func_name))
super(STBreakpoint, self).__init__(
func_name, gdb.BP_BREAKPOINT, internal=False
)
def _stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
return
print(self.func_name)We
def stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
DEBUG:root:Creating Node user_path_at_empty 1
Breakpoint 8 at 0xffffffff81208e60: file fs/namei.c, line 126.
Breakpoint 9 at 0xffffffff81209200: file fs/namei.c, line 2151.
DEBUG:root:Creating Node getname_flags 2
Breakpoint 10 at 0xffffffff811dad30: file mm/slub.c, line 2519.
Breakpoint 11 at 0xffffffff813ee620: file lib/strncpy_from_user.c, line 100.
Breakpoint 12 at 0xffffffff8111e430: file kernel/auditsc.c, line 1701.
Breakpoint 13 at 0xffffffff8111e490: file kernel/auditsc.c, line 1724.
Breakpoint 14 at 0xffffffff811da560: file mm/slub.c, line 2744.
Breakpoint 15 at 0xffffffff811da8e0: file mm/slub.c, line 2531.
...
{% endhighlight %}
{% highlight bash session linenos=table%}
(gdb) python
self.func_name = func_name
>breakpoint.KxrBreakpoint.graph.dump_nodes('/tmp')
>end
DEBUG:root:dumping Node fsnotify to /tmp/fsnotify.json
DEBUG:root:Ignoring Node fsnotify
DEBUG:root:dumping Node hrtimer_cancel to /tmp/hrtimer_cancel.json
DEBUG:root:Ignoring Node hrtimer_cancel
DEBUG:root:dumping Node getname_flags to /tmp/getname_flags.json
DEBUG:root:Ignoring Node getname_flags
DEBUG:root:dumping Node __wake_up to /tmp/__wake_up.json
DEBUG:root:Ignoring Node __wake_up
# node = self.node = self.graph.add_node(func_name)
# if not node in self.graph.nodes: self.graph.load_node(self.node)
self.parent = parent
STBreakpoint.bplist.add(func_name)
#print('ini %s'%str(func_name))
super(STBreakpoint, self).__init__(
func_name, gdb.BP_BREAKPOINT, internal=False
)
def _stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
return
print(self.func_name)We
def stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
return
</div>
...
{% endhighlight %}
The node(s) dumped contain the trace.
{% highlight python linenos=table %}
class STBreakpoint(gdb.Breakpoint):
>
bplist = set()
todelete = []
edges = []
graph = Graph()
def __init__(self, func_name, parent=None):
get_bt_start()
self.func_name = func_name
# node = self.node = self.graph.add_node(func_name)
# if not node in self.graph.nodes: self.graph.load_node(self.node)
self.parent = parent
STBreakpoint.bplist.add(func_name)
#print('ini %s'%str(func_name))
super(STBreakpoint, self).__init__(
func_name, gdb.BP_BREAKPOINT, internal=False
)
def _stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
return
print(self.func_name)We
def stop(self):
comm = get_current()['comm'].string()
if not comm.startswith('trinity'):
return
</div>
if self.parent:
STBreakpoint.graph.add_edge(self.parent,self.func_name)
else:
get_bt_start()
for bp in STBreakpoint.todelete:
try:
if bp.func_name != self.func_name:
bp.delete()
except:
pass
for callee in get_callees(self.func_name):
if callee not in STBreakpoint.bplist:
STBreakpoint(callee,self.func_name)
STBreakpoint.todelete += [self]
{% endhighlight %}
<p>
<li>
<ul>Example with dynamic tracking of chdir("/")<a href=http://stacktrack.github.io/tree.html?json=sys_chdir.json>sys_chdir</a></ul>
</li>
</div>
<h2 id="traced_syscalls">Examples: Syscall Tracking</h2>
<p>Following syscall execution traces have been generated with the trinity syscall fuzzer:
<div id="syscall_list"></div>
<script src="js/traces.js"></script>
<script>
function print_syscall(syscall){
var link = " <span class=sc><a href=tree.html?json=/json/sys_" + syscall[0] + "_callees.json&trace=/json/" + syscall[1] + ">" + syscall[0] + "</a></span> ";
document.getElementById("syscall_list").innerHTML += link;
}
syscalls.forEach(print_syscall);
</script>
</p>
<h2 id="Future Work">Future Work</h2>
There are a lot of improvements to be made, for example
<!-- Time however ... :o -->
<ul>
<li>Source code integration</li>
<li>Argument tracking</li>
<li>Wider Coverage: lkm, android ...</li>
<li>Indirect Function Calls: Resolving indirect function calls is halting equivalent. I will try to approximate this using both semantic and dynamic analysis.</li>
<li>Dynamic Instrumentation:
In order to be able to provide maximum coverage, dynamic instrumentation has to be created to generate code to reach different parts of the code.</li>
<li>...</li>
</ul>
<script>
// Generate ToC
var newLine, el, title, link;
var ToC = "<nav role='navigation' class='table-of-contents'>" +
"<h2>Contents</h2>" +
"<ul id='ToC'>";
$("h2").each(function(){
el = $(this);
title = el.text();
link = "#" + el.attr("id");
newLine =
"<li>" +
"<a href='" + link + "'>" +
title +
"</a>" +
"</li>";
ToC += newLine;
});
ToC += "</ul>" + "</nav>";
//alert(ToC);
$("#ToC_div").html(ToC);
</script>