Brainfuck Compiler

What is it?

Brainfuck is a very simple programming language with only eight instructions, created by Urban Müller in 1993.

I've created a simple Brainf*ck compiler for the ENTERPRISE 128 computer, it compiles the Brainf*ck program to Z80 machine code.

You can try it easily on lgb's ENTERPISE emulator written in JavaScript.

You can download the compiler from here (the extension of the file is .com, but it is not an MS-DOS file!).

Download sample Brainf*ck programs (not my works). The zip contains the following files:

99botles.bf
digit9.bf
factor.bf
hello.bf
life.bf
numwarp.bf
primes.bf
squares.bf

How to use the program

Available commands are:
L - Load the Brainf*ck source file
C - Compile the source to 4000H address
R - Run the compiled code
H - Prints the available commands
Q - Quit the program

Limitations:

The maximum size of the source code is 12KiB (it is loaded to 0x1000 - 0x3FFF).
The compiled code must fit in 16KiB (0x4000 - 0x7FFF).
Maximum 255 nested loops are allowed.


How does it work?

To understand the following parts, I assume you know Brainfuck language and Z80 assembly.

It is very easy to match the eight Brainf*ck instructions to the equivalent Z80 code:

>
INC  HL
<
DEC  HL
+
INC  (HL)
-
DEC  (HL)
.
CALL outchar
,
CALL inchar
[
LD   A,(HL)
OR A
JP Z, matching ] address
]
JP   matching [ address

where HL register stores the data pointer.

You can see, it is very easy to compile the first 6 commands, but to find the matching addresses for the loops is a little bit more sophisticated.


When the compiler finds a "[" command:

        push    hl          ; save the address to the stack
        ld      a,7eh       ; 07eh = opcode of "ld a,(hl)"
        ld      (hl),a
        inc     hl
        ld      a,0b7h      ; 0b7h = opcode of "or a"
        ld      (hl),a
        inc     hl
        ld      a,0cah      ; 0cah = opcode of "jp z,"
        ld      (hl),a
        inc     hl          ; skip the next two bytes
        inc     hl          ; it will be filled when matching "]" found
        ld      a,(brcount) ; A = amount of "["
        inc     a           ; increment it
        or      a           ; is it 0? (=256?)
        jp      z,toomuch   ; jump if yes -> too much nested loops
        ld      (brcount),a ; save A

When the compiler finds a "]" command:

        ld      a,0c3h                  ; 0c3h = opcode of "jp,"
        ld      (hl),a
        inc     hl
brcount equ     $+1
        ld      a,0                     ; A = amount of "["
        dec     a                       ; decrement it
        ld      (brcount),a             ; save A
        jp      m,unmatched_close_loop  ; if A is negativ then jump to no mathcing "["
        pop     bc                      ; get matching address from stack (remember, we pushed it when "[" found)
        ld      (hl),c
        inc     hl
        ld      (hl),b
        inc     hl
        inc     bc
        inc     bc
        inc     bc
        ld      a,l
        ld      (bc),a
        ld      a,h
        inc     bc
        ld      (bc),a

Optimalization

Let's see the source code of a simple "Hello World!" program:

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

You don't need sharp eyes to realize, some optimizing should be done: instead of compiling the first ten '+' commands to ten 'inc (hl)', we could done it better.

Brainf*ck codeCompiled toByte / Cycle
>
INC  HL
1 / 6
>>
INC  HL
INC HL
2 / 12
>>>
INC  HL
INC HL
INC HL
3 / 18
>>>> or more
LD   BC,nnnn   ; nnnn = number of ">" commands
ADD HL,BC
4 / 21 (save n-4 bytes and 6n-21 cycles)
+
INC  (HL)
1 / 11
++
INC  (HL)
INC (HL)
2 / 22
+++ or more
LD   A,(HL)
ADD A,nn
LD (HL),A
4 / 21
<
DEC  HL
1 / 6
<<
DEC  HL
DEC HL
2 / 12
<<<
DEC  HL
DEC HL
DEC HL
3 / 18
<<<< or more
LD   BC,-nnnn  ; nnnn = number of "<" commands
ADD HL,BC
4 / 21
-
DEC  (HL)
1 / 11
--
DEC  (HL)
DEC (HL)
2 / 22
--- or more
LD   A,(HL)
ADD A,-nn
LD (HL),A
4 / 21