Skip to content

Add support for qemu virtual machines using KVM#423

Open
IkerGalardi wants to merge 4 commits into
seL4:mainfrom
IkerGalardi:kvm-support
Open

Add support for qemu virtual machines using KVM#423
IkerGalardi wants to merge 4 commits into
seL4:mainfrom
IkerGalardi:kvm-support

Conversation

@IkerGalardi

@IkerGalardi IkerGalardi commented Feb 26, 2026

Copy link
Copy Markdown

This PR adds support for running microkit based operating systems using qemu with KVM enabled.

The main issue with the qemu platform with KVM enabled is that it drops the kernel in EL1 instead of the expected EL2. This PR builds a custom kernel for this platform with hypervisor support disabled.

The loader seems to work fine, but jumping to the kernel causes a instruction abort exception. The next is the log:

LDR|ERROR: loader trapped exception: Synchronous (Current Exception level with SP_ELx)
    esr_el1: 0x0000000086000004
    ec: 0x00000021 (Instruction Abort taken without a change in Exception level)
    il: 0x0000000000000001
    iss: 0x0000000000000004
    far: 0xffffff8040000000
    reg: 0x00000000: 0x0000000040041000
    reg: 0x00000001: 0x0000000000211274
    reg: 0x00000002: 0x0000000000000000
    reg: 0x00000003: 0x0000000000000000
    reg: 0x00000004: 0xffffff8040000000
    reg: 0x00000005: 0xffffffffffffffff
    reg: 0x00000006: 0x0000000000c5183d
    reg: 0x00000007: 0x0000000000001124
    reg: 0x00000008: 0x0000001485100510
    reg: 0x00000009: 0x0000000000000000
    reg: 0x0000000a: 0x0000000000007830
    reg: 0x0000000b: 0x000000000000000d
    reg: 0x0000000c: 0x0000000000000030
    reg: 0x0000000d: 0x0000000000000030
    reg: 0x0000000e: 0xffffff8040000000
    reg: 0x0000000f: 0x0000000000000030
    reg: 0x00000010: 0x0000000000000030
    reg: 0x00000011: 0x0000000070003298
    reg: 0x00000012: 0x0000000000000000
    reg: 0x00000013: 0x0000000070004000
    reg: 0x00000014: 0x00000000700032f0
    reg: 0x00000015: 0x0000000070003550
    reg: 0x00000016: 0x0000000000000100
    reg: 0x00000017: 0x000000007000c008
    reg: 0x00000018: 0x0000000070003510
    reg: 0x00000019: 0x00000000700034f8
    reg: 0x0000001a: 0x000000007000c008
    reg: 0x0000001b: 0x0000000070005f70
    reg: 0x0000001c: 0x0000000000000000
    reg: 0x0000001d: 0x0000000000000000
    reg: 0x0000001e: 0x0000000000000000
    reg: 0x0000001f: 0x0000000000000000

@Ivan-Velickovic

Ivan-Velickovic commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

@IkerGalardi you should include the full logs for the loader, it's not clear to me that the loader is working properly. There is most likely an issue with the initial virtual address space that is setup by the loader.

I also would imagine that the smc calls in the loader would need to be replaced with hvc calls when using KVM, as I had to do something similar in the past.

@IkerGalardi

IkerGalardi commented Mar 23, 2026

Copy link
Copy Markdown
Author

Here are the full logs:

LDR|INFO: disabling MMU (if it was enabled)
LDR|INFO: PSCI version is 1.1
LDR|INFO: altloader for seL4 starting
LDR|INFO: flags:
LDR|INFO: kernel:      entry:   0xffffff8040000000
LDR|INFO: root server: physmem: 0x0000000040241000 -- 0x00000000406cc000
LDR|INFO:              virtmem: 0x0000000000200000 -- 0x000000000068b000
LDR|INFO:              entry  : 0x0000000000211274
LDR|INFO: region: 0x00000000   addr: 0x0000000040000000   size: 0x0000000000241000   offset: 0x0000000000000000   type: 0x0000000000000001
LDR|INFO: region: 0x00000001   addr: 0x0000000040241000   size: 0x00000000000086e8   offset: 0x0000000000241000   type: 0x0000000000000001
LDR|INFO: region: 0x00000002   addr: 0x000000004024a6e8   size: 0x0000000000015428   offset: 0x00000000002496e8   type: 0x0000000000000001
LDR|INFO: region: 0x00000003   addr: 0x0000000040260b10   size: 0x00000000000100b0   offset: 0x000000000025eb10   type: 0x0000000000000001
LDR|INFO: region: 0x00000004   addr: 0x0000000040271000   size: 0x000000000005b3cc   offset: 0x000000000026ebc0   type: 0x0000000000000001
LDR|INFO: region: 0x00000005   addr: 0x00000000402cd000   size: 0x00000000003ff000   offset: 0x00000000002c9f8c   type: 0x0000000000000001
LDR|INFO: copying region 0x00000000
LDR|INFO: copying region 0x00000001
LDR|INFO: copying region 0x00000002
LDR|INFO: copying region 0x00000003
LDR|INFO: copying region 0x00000004
LDR|INFO: copying region 0x00000005
LDR|INFO|CPU0: active CPUs to start: 0x00000001
LDR|INFO|CPU0: enabling MMU
LDR|INFO|CPU0: CurrentEL=EL1
LDR|INFO|CPU0: enabling MMU
LDR|INFO|CPU0: jumping to kernel

LDR|ERROR: loader trapped exception: Synchronous (Current Exception level with SP_ELx)
    esr_el1: 0x0000000086000004
    ec: 0x00000021 (Instruction Abort taken without a change in Exception level)
    il: 0x0000000000000001
    iss: 0x0000000000000004
    far: 0xffffff8040000000
    reg: 0x00000000: 0x0000000040041000
    reg: 0x00000001: 0x0000000000211274
    reg: 0x00000002: 0x0000000000000000
    reg: 0x00000003: 0x0000000000000000
    reg: 0x00000004: 0xffffff8040000000
    reg: 0x00000005: 0xffffffffffffffff
    reg: 0x00000006: 0x0000000000c5183d
    reg: 0x00000007: 0x0000000000001124
    reg: 0x00000008: 0x0000001485100510
    reg: 0x00000009: 0x0000000000000000
    reg: 0x0000000a: 0x0000000000007830
    reg: 0x0000000b: 0x000000000000000d
    reg: 0x0000000c: 0x0000000000000030
    reg: 0x0000000d: 0x0000000000000030
    reg: 0x0000000e: 0xffffff8040000000
    reg: 0x0000000f: 0x0000000000000030
    reg: 0x00000010: 0x0000000000000030
    reg: 0x00000011: 0x0000000070003298
    reg: 0x00000012: 0x0000000000000000
    reg: 0x00000013: 0x0000000070004000
    reg: 0x00000014: 0x00000000700032f0
    reg: 0x00000015: 0x0000000070003550
    reg: 0x00000016: 0x0000000000000100
    reg: 0x00000017: 0x000000007000c008
    reg: 0x00000018: 0x0000000070003510
    reg: 0x00000019: 0x00000000700034f8
    reg: 0x0000001a: 0x000000007000c008
    reg: 0x0000001b: 0x0000000070005f70
    reg: 0x0000001c: 0x0000000000000000
    reg: 0x0000001d: 0x0000000000000000
    reg: 0x0000001e: 0x0000000000000000
    reg: 0x0000001f: 0x0000000000000000

The issue could be the page tables, but still, the kernel entry being at 0xffffff8040000000 feels kinda strange.

About the smc instruction, secure monitor calling convention states that hypervisors catch and emulate those smc calls, so it should be fine™.

@Indanz

Indanz commented Mar 23, 2026

Copy link
Copy Markdown

The issue could be the page tables, but still, the kernel entry being at 0xffffff8040000000 feels kinda strange.

On non-HYP, the kernel uses the upper address range for its page table configured by TTBR1_EL1. It uses TTBR0_EL1 to configure the user space tables. For HYP, when running in EL2, there is no corresponding TTBR1_EL2, only a TTBR0_EL2. However, it is not shared with user space like TTBR0_EL1 and TTBR1_EL1 are.

@IkerGalardi

Copy link
Copy Markdown
Author

Makes sense.

I've seen that the page tables used in the upper VA range are set in boot_lvlX_upper variable. That variable is populated by the microkit tool (I assume after seeing references to it on the function aarch64_setup_pagetables in loader.rs).

lvl0 descriptors are completelly set to 0, so it makes sense that the there is a instruction abort.

(gdb) p /x boot_lvl0_upper 
$5 = {0x0 <repeats 512 times>}
(gdb) p /x boot_lvl1_upper 
$6 = {0x0, 0x7000a003, 0x0 <repeats 510 times>}
(gdb) p /x boot_lvl2_upper 
$7 = {0x40000711, 0x40200711, 0x40400711, 0x40600711, 0x40800711, 0x40a00711, 0x40c00711, 0x40e00711, 0x41000711, 0x41200711, 0x41400711, 
  0x41600711, 0x41800711, 0x41a00711, 0x41c00711, 0x41e00711, 0x42000711, 0x42200711, 0x42400711, 0x42600711, 0x42800711, 0x42a00711, 
  0x42c00711, 0x42e00711, 0x43000711, 0x43200711, 0x43400711, 0x43600711, 0x43800711, 0x43a00711, 0x43c00711, 0x43e00711, 0x44000711, 
  0x44200711, 0x44400711, 0x44600711, 0x44800711, 0x44a00711, 0x44c00711, 0x44e00711, 0x45000711, 0x45200711, 0x45400711, 0x45600711, 
  0x45800711, 0x45a00711, 0x45c00711, 0x45e00711, 0x46000711, 0x46200711, 0x46400711, 0x46600711, 0x46800711, 0x46a00711, 0x46c00711, 
  0x46e00711, 0x47000711, 0x47200711, 0x47400711, 0x47600711, 0x47800711, 0x47a00711, 0x47c00711, 0x47e00711, 0x48000711, 0x48200711, 
  0x48400711, 0x48600711, 0x48800711, 0x48a00711, 0x48c00711, 0x48e00711, 0x49000711, 0x49200711, 0x49400711, 0x49600711, 0x49800711, 
  0x49a00711, 0x49c00711, 0x49e00711, 0x4a000711, 0x4a200711, 0x4a400711, 0x4a600711, 0x4a800711, 0x4aa00711, 0x4ac00711, 0x4ae00711, 
  0x4b000711, 0x4b200711, 0x4b400711, 0x4b600711, 0x4b800711, 0x4ba00711, 0x4bc00711, 0x4be00711, 0x4c000711, 0x4c200711, 0x4c400711, 
  0x4c600711, 0x4c800711, 0x4ca00711, 0x4cc00711, 0x4ce00711, 0x4d000711, 0x4d200711, 0x4d400711, 0x4d600711, 0x4d800711, 0x4da00711, 
  0x4dc00711, 0x4de00711, 0x4e000711, 0x4e200711, 0x4e400711, 0x4e600711, 0x4e800711, 0x4ea00711, 0x4ec00711, 0x4ee00711, 0x4f000711, 
  0x4f200711, 0x4f400711, 0x4f600711, 0x4f800711, 0x4fa00711, 0x4fc00711, 0x4fe00711, 0x50000711, 0x50200711, 0x50400711, 0x50600711, 
  0x50800711, 0x50a00711, 0x50c00711, 0x50e00711, 0x51000711, 0x51200711, 0x51400711, 0x51600711, 0x51800711, 0x51a00711, 0x51c00711, 
  0x51e00711, 0x52000711, 0x52200711, 0x52400711, 0x52600711, 0x52800711, 0x52a00711, 0x52c00711, 0x52e00711, 0x53000711, 0x53200711, 
  0x53400711, 0x53600711, 0x53800711, 0x53a00711, 0x53c00711, 0x53e00711, 0x54000711, 0x54200711, 0x54400711, 0x54600711, 0x54800711, 
  0x54a00711, 0x54c00711, 0x54e00711, 0x55000711, 0x55200711, 0x55400711, 0x55600711, 0x55800711, 0x55a00711, 0x55c00711, 0x55e00711, 
  0x56000711, 0x56200711, 0x56400711, 0x56600711, 0x56800711, 0x56a00711, 0x56c00711, 0x56e00711, 0x57000711, 0x57200711, 0x57400711, 
  0x57600711, 0x57800711, 0x57a00711, 0x57c00711, 0x57e00711, 0x58000711, 0x58200711, 0x58400711, 0x58600711, 0x58800711, 0x58a00711, 
  0x58c00711, 0x58e00711...}

The kernel has its entry in the 0xff8040000000 memory address, so in order to resolve the first level translation I've set the 0x1ff entry on the base table with set variable boot_lvl0_upper[0x1ff] = 0x7000b003 and the kernel is now fully booted.

My test setup fails with a data fault on the serial transmit virtualizer, but this might be unrelated to the tool.

MON|ERROR: received message 0x00000006  badge: 0x0000000000000002  tcb cap: 0x000000000000000b
MON|ERROR: faulting PD: serial_virt_tx
Registers: 
pc : 0x0000000000204ce0
sp: 0x00007ffffffffd40
spsr : 0x0000000020000040
x0 : 0x0000000000000000
x1 : 0x0000000000000000
x2 : 0x0000000000000000
x3 : 0xffffffffffffffff
x4 : 0x0000000000201fc4
x5 : 0x000000000000001b
x6 : 0x0000000000000000
x7 : 0x0000000000000000
x8 : 0x0000000000000000
x16 : 0x0000000000000000
x17 : 0x0000000000000000
x18 : 0x0000000000000000
x29 : 0x00007ffffffffd50
x30 : 0x0000000000204e7c
x9 : 0x0000000000000000
x10 : 0x0000000000000000
x11 : 0x0000000000000000
x12 : 0x0000000000000000
x13 : 0x0000000000000000
x14 : 0x0000000000000000
x15 : 0x0000000000000000
x19 : 0x00007ffffffffe50
x20 : 0x0000000000000000
x21 : 0x0000000000207000
x22 : 0x000000000020b2a0
x23 : 0x0000000000000000
x24 : 0x0000000000000000
x25 : 0x0000000000000000
x26 : 0x0000000000000000
x27 : 0x0000000000000000
x28 : 0x0000000000000000
tpidr_el0 : 0x0000000000000000
tpidrro_el0 : 0x0000000000000000
MON|ERROR: VMFault: ip=0x0000000000204ce0  fault_addr=0x0000000000000000  fsr=0x0000000092000006  (data fault)
MON|ERROR:   ec: 0x00000024  Data Abort from a lower Exception level   il: 1   iss: 0x00000006
MON|ERROR:   dfsc = translation fault, level 2 (0x00000006)

I'll try to fix the table generator in the loader.rs file, but I'm not really familiarized with rust so I might take my time to fix 😅.

@midnightveil

Copy link
Copy Markdown
Collaborator

@IkerGalardi

Copy link
Copy Markdown
Author

The EL1 page table generation works on my webserver example. But still faults on the Serial transmission virtualizer.

Once I figure out whats going on there I'll mark this PR ready for review.

@IkerGalardi IkerGalardi marked this pull request as ready for review June 9, 2026 09:33
This board differs from the standard qemu_virt_aarch64 board in that
the entry point is in EL2 instead of EL1. This is necessary due to the
lack of nested virtualization support on the KVM subsystem.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
Previously the defaults where applied ALWAYS, meaning that if some
board tried to specialize a kernel building parameter that the default
configuration did, it would get overwriten by the default
configuration. This patch applies the specialization to the default
configuration instead of doing it the other way around.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
@IkerGalardi

Copy link
Copy Markdown
Author

I've tested this branch on the webserver and everything is working as expected.

I've found an issue with virtio, but I assume its related to sDDF rather than microkit and the kvm platform.

Comment thread loader/src/aarch64/init.c Outdated

for (uint32_t i = 0; i <= it_lines_number; i++) {
*((volatile uint32_t *)(GICD_BASE + 0x80 + (i * 4))) = 0xFFFFFFFF;
if (is_set(CONFIG_ARM_HYPERVISOR_SUPPORT)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this being done based on whether or not seL4 has hyp support, it should be done based on the current exception level, as that is what really matters.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the sake of git blame: I'd rather do an early return, than to move this all into the if statement.

@IkerGalardi IkerGalardi Jun 15, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

I currently do not have any project to test it. But I guess that configuring the interrupt priority first does not break anything on systems running on EL2.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it and it works.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi: I'll look further tomorrow, but you'll want to reference the ARM GIC manual as to what needs to be done when, as in when can you write to certain registers. I do recall their are restrictions, but not what they are or what they apply to.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seL4 does already set the interrupt priority to 0x80.

The reason why this code did that is that we wanted to configure it while in EL3 (secure mode), which allows us to actually configure the GICC_PMR:

image

It will "work" but it also ignores it. If we're in EL1 the write to that won't do anything, and would later be done by seL4 later anyway.

So this should be after the if-check anyway.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, that code needs to run only if the loader is running in EL2, right? So the configure_gicv2 function is kind of useless in the KVM case.

I'll move the priority setting to the end of the function (as it was before), but I feel that this check could be done at compile time.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that on the ZCU** platforms where we start in EL3, and the QEMU platform where nobody starts before us, we are the onee who have to configure it. Most other platforms have UBoot configure it. When not in EL2 we don't configure it and hope it works.

The reason why this shouldn't be a compile time check is that if you want to run an EL1 seL4 image on the zcu** then you still need to do this set up before we drop to EL1. Note that configure_gicv2 is the absolute first thing we do.

Comment thread tool/microkit/src/loader.rs Outdated
@midnightveil midnightveil self-assigned this Jun 12, 2026
@IkerGalardi IkerGalardi force-pushed the kvm-support branch 4 times, most recently from a619d3d to 4ca50bd Compare June 15, 2026 09:32
EL1 software can not access interrupt group registers on the GIC distributor,
so the function does an early return if the loader is running in EL1.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
Comment thread build_sdk.py
},
),
BoardInfo(
name="qemu_virt_aarch64_kvm",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now that all the code changes are done: this should be called probably "_el1" or otherwise "non_hyp"; it's not for KVM more just a seeming limitation of KVM that it can't emulate EL2.

And you should add this board to the manual.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants