开始详细了解最小的JOS
内核。与 boot loader
类似,内核也从汇编语言开始,从而使C语言代码能够执行。
1. 虚拟内存解决位置依赖问题 - Using virtual memory to work around position dependence
在 Part2 中可以发现内存的加载地址LMA
和链接地址VMA
差别非常大:
[post cid="535" /]
obj/kern/kernel: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000019e9 f0100000 00100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
操作系统内存通常链接并运行在非常高的虚拟地址上,例如 0xf0100000
,从而为用户程序留下较低的虚拟地址空间。
In fact, in the next lab, we will map the entire bottom 256MB of the PC's physical address space, from physical addresses 0x00000000 through 0x0fffffff, to virtual addresses 0xf0000000 through 0xffffffff respectively. You should now see why JOS can only use the first 256MB of physical memory.
这次将只映射物理地址的前 4MB
,使用kern/entrypgdir.c
中手写的,静态初始化的 page directory
和 page table
来完成此操作。
kern/entry.S
设置 CR0_PG
标志后,虚拟地址便可以翻译至物理地址。
虚拟地址: [0xf0000000
, 0xf0400000
] => 物理地址 [0x00000000
, 0x004000000
]
!> 这个范围以外的地址引用会引发硬件异常,since we haven't set up interrupt handling yet
Exe7
2. Formatted Printing to the Console - 格式化打印至控制台
Exe8
3. The Stack 栈
在本实验的最后练习中,我们将更详细地探讨C语言在x86上使用堆栈的方式,并在此过程中编写一个有用的新内核监视器函数,用于打印堆栈的回溯:保存来自嵌套调用指令的指令指针(IP)值至当前执行点的列表。
Exe 9.
The x86 stack pointer (esp
register) points to the lowest location on the stack that is currently in use.
栈自顶向底使用
The ebp
(base pointer) register, in contrast, is associated with the stack primarily by software convention.
Other Questions
Explain the interface between printf.c
and console.c
. Specifically, what function does console.c
export? How is this function used by printf.c
?
console.c
export cputchar
printf.c
: cputchar
was called in function putch
Explain the following from console.c
:
|
|
完整方法如下:
|
|
分析对 \r
, \n
的操作可知
CRT_COLS
表示一行的最大字符数crt_pos
当前光标位置CRT_SIZE
表示缓存的最大字符数?
可以发现在复制后,光标又后退了一整行的距离,因此这一段的作用是实现字符滚动一行。
For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86. Trace the execution of the following code step-by-step:
|
|
In the call to cprintf()
, to what does fmt
point? To what does ap
point?
fmp
: 字符串地址
ap
: 参数地址
List (in order of execution) each call to cons_putc
, va_arg
, and vcprintf
. For cons_putc
, list its argument as well. For va_arg
, list what ap
points to before and after the call. For vcprintf
list the values of its two arguments.
void cons_putc(int) // output a character to the console
vcprintf()
=> cons_putc(122)//'x'
=> cons_putc(32)//' '
=> va_arg(*ap, int)
( ap
pointes before x
after y
) => cons_putc(44)//','
=> cons_putc(32)//' '
=> cons_putc(123)//'y'
=> cons_putc(32)//' '
=> va_arg(*ap, unsigned int)
( ap
pointes before y
after z
) Omit 最后 points NULL
Run the following code.
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.
The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set i
to in order to yield the same output? Would you need to change 57616
to a different value? Here's a description of little- and big-endian and a more whimsical description.
分析可知,%x
会识别 unsigned int
并转换为 16
进制,即 57616
转换为 0xe110
,
小端表示低位数字存在低地址处,对于0x00646c72
,一个word
分为4个byte
即为 NUL d l r
,
因此对应输出为 rld
,所以最终输出为He110 World
In the following code, what is going to be printed after y=
? (note: the answer is not a specific value.) Why does this happen?
cprintf("x=%d y=%d", 3);
由可变参数的方式知道会打印第一个参数之上的栈里面的4字节内容
Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change cprintf
or its interface so that it would still be possible to pass it a variable number of arguments?
如果GCC参数入栈方式改为从左往右,那么
- 方案一:对于字符串的检测方式需要改变为从右往左,格式化片段也要倒着写
- 方案二:为
cprintf
函数增加一个表示参数个数的参数来辅助移动栈指针
Exe7. 查看分页效果
Use QEMU and GDB to trace into the JOS kernel and stop at the movl %eax, %cr0
. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi
GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.
What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the movl %eax, %cr0
in kern/entry.S
, trace into it, and see if you were right.
查看 obj/kern/kernel.asm
:
movl %eax, %cr0
f0100025: 0f 22 c0 mov %eax,%cr0
因此在映射前,这条指令的地址为 0x00100025
,使用gdb
:
(gdb) b *0x00100025
Breakpoint 1 at 0x100025
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x100025: mov %eax,%cr0
Breakpoint 1, 0x00100025 in ?? ()
(gdb) x/8x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>: 0x00000000 0x00000000 0x00000000 0x00000000
0xf0100010 <entry+4>: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) si
=> 0x100028: mov $0xf010002f,%eax
0x00100028 in ?? ()
(gdb) x/8x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
(gdb) x/8x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0xf0100010 <entry+4>: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
可以发现, movl %eax, %cr0
这条指令完成了物理内存向虚拟内存的映射。
在 kern/entry.S
中注释掉该命令后,运行 make clean && make
重新编译,
+ symbol-file obj/kern/kernel
(gdb) b *0x00100025
Breakpoint 1 at 0x100025
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x100025: mov $0xf010002c,%eax
Breakpoint 1, 0x00100025 in ?? ()
(gdb) si
=> 0x10002a: jmp *%eax
0x0010002a in ?? ()
(gdb) si
=> 0xf010002c <relocated>: add %al,(%eax)
relocated () at kern/entry.S:74
74 movl $0x0,%ebp # nuke frame pointer
(gdb) si
Remote connection closed
报错信息为:qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c
Exe8. 打印八进制
We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.
in lib/printfmt.c
:
观察打印无符号整数 %u
:
|
|
因此打印八进制 %o
应该为:
|
|
Exe9. Stack size
Exercise 9. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?
in kern/entry.S
relocated:
# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer
# Set the stack pointer
movl $(bootstacktop),%esp
in pbj/kern/kernel.asm
# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer
f010002f: bd 00 00 00 00 mov $0x0,%ebp
# Set the stack pointer
movl $(bootstacktop),%esp
f0100034: bc 00 00 11 f0 mov $0xf0110000,%esp
因此栈开始于 0xf0110000
in kern/entry.S
.data
###################################################################
# boot stack
###################################################################
.p2align PGSHIFT # force page alignment
.globl bootstack
bootstack:
.space KSTKSIZE
.globl bootstacktop
bootstacktop:
由此可知,栈空间的大小为 为 KSTKSIZE
Exe10. back trace
Exercise 10. To become familiar with the C calling conventions on the x86, find the address of the test_backtrace
function in obj/kern/kernel.asm
, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of test_backtrace
push on the stack, and what are those words?
Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.
in obj/kern/kernel.asm
// Test the stack backtrace function (lab 1 only)
void
test_backtrace(int x)
{
f0100040: 55 push %ebp
f0100041: 89 e5 mov %esp,%ebp
f0100043: 56 push %esi
f0100044: 53 push %ebx
f0100045: e8 72 01 00 00 call f01001bc <__x86.get_pc_thunk.bx>
f010004a: 81 c3 be 12 01 00 add $0x112be,%ebx
f0100050: 8b 75 08 mov 0x8(%ebp),%esi
cprintf("entering test_backtrace %d\n", x);
f0100053: 83 ec 08 sub $0x8,%esp
f0100056: 56 push %esi
f0100057: 8d 83 f8 06 ff ff lea -0xf908(%ebx),%eax
f010005d: 50 push %eax
f010005e: e8 e6 09 00 00 call f0100a49 <cprintf>
if (x > 0)
f0100063: 83 c4 10 add $0x10,%esp
f0100066: 85 f6 test %esi,%esi
f0100068: 7f 2b jg f0100095 <test_backtrace+0x55>
test_backtrace(x-1);
else
mon_backtrace(0, 0, 0);
f010006a: 83 ec 04 sub $0x4,%esp
f010006d: 6a 00 push $0x0
f010006f: 6a 00 push $0x0
f0100071: 6a 00 push $0x0
f0100073: e8 0b 08 00 00 call f0100883 <mon_backtrace>
f0100078: 83 c4 10 add $0x10,%esp
cprintf("leaving test_backtrace %d\n", x);
f010007b: 83 ec 08 sub $0x8,%esp
f010007e: 56 push %esi
f010007f: 8d 83 14 07 ff ff lea -0xf8ec(%ebx),%eax
f0100085: 50 push %eax
f0100086: e8 be 09 00 00 call f0100a49 <cprintf>
}
f010008b: 83 c4 10 add $0x10,%esp
f010008e: 8d 65 f8 lea -0x8(%ebp),%esp
f0100091: 5b pop %ebx
f0100092: 5e pop %esi
f0100093: 5d pop %ebp
f0100094: c3 ret
test_backtrace(x-1);
f0100095: 83 ec 0c sub $0xc,%esp
f0100098: 8d 46 ff lea -0x1(%esi),%eax
f010009b: 50 push %eax
f010009c: e8 9f ff ff ff call f0100040 <test_backtrace>
f01000a1: 83 c4 10 add $0x10,%esp
f01000a4: eb d5 jmp f010007b <test_backtrace+0x3b>
using gdb
|
|
可以发现两次 epb
的差值为 0x20
, which is 32 bits
= 4*8 bytes
= 1 word
Exe11.
in
|
|
output of gdb
:
|
|
Exe12. Debug with STAB
打印 mon_backtrace()
中对应每个 eip 的函数名、文件名和行号,在kern/kdebug.c
中的函数 debuginfo_eip()
添加查询行号的代码,然后完善 mon_backtrace
函数打印文件名、函数名及行号信息。
在编译内核的时候,我们可以看到加了 -Wall -Wno-format -Wno-unused -Werror -gstabs -m32
,通过-gstabs
参数在可执行文件中加了调试信息。
STAB (Symbol TABle)
==> this
常见的stabs和stabn的定义如下:
.stabs "string",type,0,desc,value
.stabn type,0,desc,value
.stabd type,0,desc
其中string的格式为
"name[:symbol_descriptor]
[type_number[=type_descriptor ...]]"
参见 inc/stab.h
可以看到stabs结构的定义和常见的stabs类型:
|
|
一个简单的例子:
#include<stdio.h>
int main()
{
printf("hello world\n");
return 0;
}
int foo(int a) {
printf("function foo\n");
return 0;
}
运行命令:
gcc -Wno-format -Wno-unused -Werror -gstabs -m32 -S -o hello.s hello.c
可以看到 hello.s 中的一些stabs:
.stabs "hello.c",100,0,2,.Ltext0
.text
.Ltext0:
...
.stabs "main:F(0,1)",36,0,0,main
main:
.stabn 68,0,3,.LM0-.LFBB1
.stabs "foo:F(0,1)",36,0,0,foo
.stabs "a:p(0,1)",160,0,0,8
foo:
.stabn 68,0,8,.LM4-.LFBB2
其中第1行是描述源文件信息的,文件名是 hello.c,类型是N_SO(100),后面的desc=2表示C语言文件,.Ltext0为文件对应代码区的开始地址。
第2行stabs是描述main函数的,内容分别是函数名,函数类型F(全局),以及返回值为int。类型为 N_FUNC (36),后面的main是函数起始地址。第4行stabs描述foo函数,同理。
第3行的stabn描述的是行号信息,其中 68 是类型 N_SLINE,后面的0是other值,不用管。而desc为3是main函数在源文件中的行号,value是源代码行的起始地址。第5行的stabn同理。关于stabs的详细信息请参考.
在 kdebug.c
的 debuginfo_eip
函数偏最后的注释位置添加如下代码:
|
|
修改monitor.c
的 backtrace
函数 :
|
|
运行结果:
|
|
参考链接:
https://github.com/shishujuan/mit6.828-2017/blob/master/docs/lab1-exercize.md