Writing C++17 for 16-bit x86

Why would you do that to yourself?

A few weeks ago Gynvael Coldwind announced a contest (I’m sorry, the link is in Polish) related to his excellent OS dev streams (again, in Polish, but if you do understand it, definitely consider watching them). The task was simple: make a BIOS-bootable diskette image with the prettiest graphical effect; all in 16-bit text mode, with binary size limit of 512 bytes.

It’s as simple as providing the right target to clang, right?

Aesthetics isn’t exactly my thing (as one could conclude from perusing this blog), but I decided to try and see if I could write an entry in C++, using newer standards freely. As it turns out, both gcc and clang claim to be capable of generating 16-bit code with the -m16 switch.

Let’s give it a try:

void foo()
	// text mode text buffer begins at 0xB8000 
	char* textBuffer = reinterpret_cast<char*>(0xB8000);
	textBuffer[0] = '+';

Compiled with the following (actually, I tried g++ and clang++ with -Os and -Oz and chose the smallest binary every time):

clang++ -Wl,--oformat=binary -nostdlib -fomit-frame-pointer
 -fno-builtin -nostartfiles -nodefaultlibs -Wl,-e,0x7c00
 -Wl,-Tbss,0x7c00 -Wl,-Tdata,0x7c00 -Wl,-Ttext,0x7c00
 -Oz -std=c++1z -m16 main.cpp -o kq.bin

Produces binary:

00000000  67C60500800B002B  mov byte [dword 0xb8000],0x2b
00000008  66C3              o32 ret

The 8-byte mov instruction looks really suspicious (and is terrible for the contest, taking 1/64th of the available space). When tested with BOCHS, it simply doesn’t work — at least not while still booting. Moreover, it doesn’t look at all like the Segment:Offset addressing the 16-bit code should be full of. A quick look into the documentation solves this particular mystery quite easily, though.

The generated code and the ABI remains 32-bit but the assembler emits instructions appropriate for a CPU running in 16-bit mode, with address-size and operand-size prefixes to enable 32-bit addressing and operations.


The -m16 option is the same as -m32, except for that it outputs the .code16gcc assembly directive at the beginning of the assembly output so that the binary can run in 16-bit mode.

Working around the compilers

Okay, so it turns out it’s not possible (or I don’t know the magic switches) to have either of those compilers generate truly 16-bit code without a medium-to-major time investment of writing the backend myself. I don’t claim to know nearly enough about this topic to discern the cause of this peculiar behaviour, but for my purposes, knowing that another way had to be found was sufficient. What about the asm blocks?

Let’s check:

void foo()
	asm("mov 0B800h, %ax;"
		"mov %ax, %es;");


00000000  A100B8            mov ax,[0xb800]
00000003  8EC0              mov es,ax
00000005  66C3              o32 ret

Unfortunately, this could hardly be called a C++ solution, not when it’s a thinly veiled assembly implementation and any normal pointer data access is impossible, because it would generate 32-bit instructions. What is more, attempting to use different segment registers would require rewriting the code or applying an ugly macro (or, possibly in C++20, using something akin to string mixins from the D language). I went with a macro.

Creating the building blocks — output

It’s far from ideal, but it was a workable start:

void foo()
	SegmentedAddress<0xB800, SegmentRegister::gs> video_buffer;


00000000  6657              push edi
00000002  B800B8            mov ax,0xb800
00000005  89C0              mov ax,ax
00000007  8EE8              mov gs,ax
00000009  B83412            mov ax,0x1234
0000000C  B91000            mov cx,0x10
0000000F  89CF              mov di,cx
00000011  658905            mov [gs:di],ax
00000014  665F              pop edi
00000016  66C3              o32 ret

After special-casing the text mode video buffer a readable hello world can be created:

void foo()
	VideoBuffer buf;
	buf.writeLine("Hello, World!", 20, VideoBuffer::Colour::Red);
Hello, World!Picture 1. Hello, World!

Creating the building blocks — keyboard input

With help in the form of Ralf Brown’s Interrupt List, creating an abstraction over keyboard input was simple:

struct Keyboard
	static inline bool keyAvailable() noexcept {
		u8 ret = 1;
		asm volatile(
			"movb $1, %%ah; \n\t"
			"int $22; \n\t"
			"jnz 1f; \n\t"
			"movb $0, %0; \n\t"
			: "=q" (ret)
			: "ax"
		return !ret;
	static inline u8 getKey() noexcept {
		u8 k;
		asm volatile(
			"xor %%ax, %%ax; \n\t"
			"int $22; \n\t"
			"movb %%ah, %0; \n\t"
			: "=q" (k)
			: "ax"
		return k;

Creating the building blocks — random memory access

Since, as is written above, using standard C/C++ pointers was not possible, I created an abstraction over the SegmentedAddress class templates.

template<u16 Addr, u8 Size, typename Elem = kq::sized_type<Size>, u16 Elements = 1>
struct MemoryEntity{
	constexpr static u16 addr = Addr;
	constexpr static u8 size = Size;
	constexpr static u16 elements = Elements;
	using type = Elem;
	using storage_type = kq::sized_type<Size>;
	template<typename T>
	static inline void set(T&& val, u16 n) noexcept {
		data.raw_write<Size>(nasty_cast<storage_type>(val), Addr + n * Size);
	static inline type get(u16 n) noexcept {
		return nasty_cast<type>(data.raw_read<Size>(Addr + n * Size));

It worked, and it worked well — the class compiled down to nothing and the resultant abstraction was fairly readable (but it could be better):

constexpr static auto Blocks = MemoryEntity<0x200, 16, Point2D, 256>{};

The Result

The source and the binary are available here; the mechanics of the snake game are trivial and I’ll skip them. With clang++ and -Oz the resultant binary had exactly 512 bytes. That means it passed the contest criteria — if only just. Although I had to forego adding the proper boot signature bytes at the end.

Conclusions / lessons learned

  • First of all, it shows that I started doing this project without any kind of plan. Basic components are tacked on others and do not complement each another. Even for a toy project, it is jarring.
  • Secondly, when they say readability is important, they are right. Using similarly-sized integers for data values and offsets is, simply put, dumb, especially when I could have trivially boxed the offset in its own type
  • ‘Zero-cost abstractions’ is, I believe, a term coined by Bjarne Stroustrup. While doing this project I inspected the resultant binary after each compilation and I can say with 100% certainty that the abstractions I used cost me nothing in terms of binary size or speed, since my results were identical to those I hand-crafted in the C language.
  • Optimizers in modern compilers are truly great. I did not go out of my way to help them, yet they performed their job admirably.
  • Using modern compilers to target old and semi-forgotten platforms wasn’t the best choice. I had to work around the compiler and I still ended up with a bulky binary because the compiler used 32-bit versions of instructions.

Side notes

Since the contest was about creating a pretty graphical effect, my work didn’t win. In fact, it was the ugliest one, although it may have had a chance to win in the “most interesting” category. At least in my opinion.

On the other hand, the quality of works sent by others was just jaw-dropping, even for someone passably familiar with the scene. You can see them all + sources here. The two best entries can be seen in action here and here. The reader is reminded that these works have not left the text mode, nor have they gone over the decreed 512 bytes.

Gynvael will be testing the waters with an English stream about “a CTF challenge or two, probably exploitation or reverse engineering” on July, 15th, at 19:00 UTC+2 — more info on his blog. Given the quality of his Polish streams, I urge you to check it out.

6 thoughts on “Writing C++17 for 16-bit x86

  1. I’ve heard there’s an ongoing port of gcc6 to djgpp – maybe that’s what you should be aiming for!

    1. I didn’t plan to write more, but with this… I could definitely follow up with something better. Thanks for letting me know.

    1. If I saw even a hint of those I’d disable them. I think that the compiler felt safe not including them at all (at least in -Os) since the whole program consisted of one TU.

Leave a Reply

Your email address will not be published.