linux启动之进入C语言环境 arch/x86/kernel/head_64.S

kernel对Virtual Address Space的基本规划:

@see Documentation/x86/x86_64/mm.txt

ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
init_level4_pgt(272)->level3_ident_pgt(0)->level2_ident_pgt(512*2M=1G)

ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
init_level4_pgt(511)->level3_kernel_pgt(510)->level2_kernel_pgt(256*2M=512M)

@see arch/x86/include/asm/page_64_types.h

#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
#define __PHYSICAL_START        CONFIG_PHYSICAL_START // 0x1000000 = 16M

#define __START_KERNEL_map  _AC(0xffffffff80000000, UL)
#define __START_KERNEL      (__START_KERNEL_map + __PHYSICAL_START) // 0xffffffff81000000

// Kernel image size is limited to 512 MB
#define KERNEL_IMAGE_SIZE   (512 * 1024 * 1024)
#define KERNEL_IMAGE_START  _AC(0xffffffff80000000, UL)
pagetable

startup_64

在head_64.S里写死的pagetable map了0xffff880000000000开头的1GB内存(level2_ident_pgt) 和 从0xffffffff80000000到0xffffffffa0000000的512M内存(level2_kernel_pgt).
init_level4_pgt -> level3_ident_pgt -> level2_ident_pgt
init_level4_pgt -> level3_kernel_pgt -> level2_kernel_pgt
level2_fixmap_pgt, level1_fixmap_pgt, level2_spare_pgt
编译时假设了vmlinux会被加载到内存16M处,如果不是,就需要修正pagetable.
由于写死的pagetable只map了开头的1GB内存,如果vmlinux被加载到1GB以上的位置,那就需要再多map出一些内存,当然首先得知道要map哪段内存,计算出pud_index即可.然后将 level2_spare_pgt 加到 level3_ident_pgt 里, 将加载位置开头的2M内存map(pmd_index)设到 level2_spare_pgt 里. (level2_spare_pgt的spare原来是这个意思)
接着在ident_complete里继续修正level2_kernel_pgt,我们知道level2_kernel_pgt只用了前256项,后256项是空着的,修正时先测试每一项(8字节)的最后一个bit是否为1,如果是,则修正,否则,跳过去.
修正完level2_kernel_pgt,接着修正 phys_base 和 trampoline_level4_pgt, trampoline不清楚是做什么用的.

secondary_startup_64

我们在startup_32里已经enable了PAE,这里把PGE也Enable了,然后把init_level4_pgt load进CR3,这样我们就用上了新的Pagetable,并且 level2_kernel_pgt 和 level2_ident_pgt 的 G bit 也起上了作用.
0x1e3 = 0001 1110 0011b (G bit = 1)

关于PGE的一些说明

The processor invalidates the TLB whenever CR3 is loaded either explicitly or implicitly. After the TLB is invalidated, subsequent address references can consume many clock cycles until their translations are cached as new entries in the TLB. Invalidation of TLB entries for frequently-used or critical pages can be avoided by specifying the translations for those pages as global. TLB entries for global pages are not invalidated as a result of a CR3 load. Global pages are invalidated using the INVLPG instruction.
Global-page extensions are controlled by setting and clearing the PGE bit in CR4 (bit 7). When CR4.PGE is set to 1, global-page extensions are enabled. When CR4.PGE is cleared to 0, global-page extensions are disabled. When CR4.PGE=1, setting the global (G) bit in the translation-table entry marks the page as global.
The INVLPG instruction ignores the G bit and can be used to invalidate individual global-page entries in the TLB. To invalidate all entries, including global-page entries, disable global-page extensions (CR4.PGE=0).
接下来Enable了System Call和No Execute(如果支持的话).

关于SCE

System-Call Extension (SCE) Bit. Setting this bit to 1 enables the SYSCALL and SYSRET instructions. Application software can use these instructions for low-latency system calls and returns in a non-segmented (flat) address space.

关于NXE,前边的pagetable的63位都是0,也就是说,map的内存里的代码是可执行的

No Execute (NX) Bit. Bit 63. This bit is present in the translation-table entries defined for PAE paging.
This bit controls the ability to execute code from all physical pages mapped by the table entry. For example, a page-map level-4 NX bit controls the ability to execute code from all 128M (512 × 512 × 512) physical pages it maps through the lower-level translation tables. When the NX bit is cleared to 0, code can be executed from the mapped physical pages. When the NX bit is set to 1, code cannot be executed from the mapped physical pages.
到目前为止,我们的所有操作都没有用到栈,但栈一定要设的:
movq stack_start(%rip), %rsp

ENTRY(stack_start)
    .quad  init_thread_union+THREAD_SIZE-8
    .word  0
我们看到,%rsp设到了init_thread_union的stack上. 这个后边再细说.

接着,load了新的gdt. 现在的gdt是在startup_32时load的,注释里说的不太明白怎么回事.
新的gdt定义在arch/x86/kernel/cpu/common.c#0138,显然是一个cpu有一个.

gdt ready之后,重置了ds,ss,es,我们知道64bit-mode下这三个段寄存器没用处,fs和gs有用,但目前不太了解,略过.

最后,再次使用lret跳转到C语言入口函数 x86_64_start_kernel(char *real_mode_data),并设置好了正确的CS(我们load了新的gdt).

OK,到此,我们算是进入了C语言环境,接下来的kernel代码就是C语言的了,易读多了.