Search

mercredi 8 février 2017

Self-interpreter: introduction (1)

Context

Let's create a BrainFuck interpreter in BrainFuck. The goal of this post is to understand how we can do that, and then next ones will be about implementation.
BrainFuck execution needs
  • Instructions (+ - . , < > [ ])
  • Instruction pointer
  • Input stream
  • Output stream
  • Memory
  • Memory pointer

Instructions and input stream

Here, as it's an interpreter, the instructions will come from the global input stream. Then, we need to make a distinction between code to interpret and input used by interpreted code.
This can be done using a special character that will be forbidden in the code part.
For this project, let's use # as our separator.

Memory

The memory needs to be a subset of global execution memory. But let's have it as big as possible: we'll use some space at the beginning of the memory array for instructions to execute (fixed size), and some variables / swap memory for our interpreter.
Another issue with the memory is that it's an array. As we saw in previous posts, we can work with arrays, including array item random access, as long as values aren't null within the array. Or, one can have null values in its memory, this would prevent our code from reading the array.

We can figure this out by at least 2 different ways:
  • We can have an offset (+1) on each memory cell. Then zeros will be ones, ones will be twos, ...
    • Pros: Quite easy to implement (our operations to read and write in memory array should just handle it)
    • Cons: now, 255 is a forbidden value
  • We can have 2 memory values for one array cell. Like one as value, one as is_null flag, or even just [1, value]: browsing array will be done by blocks of 2 cells, whatever the value is
    • Pros: all values are now allowed
    • Cons:
      • a bit more complex to implement
      • uses twice more space than memory
Anyway: the final choice doesn't really matter, but the second option would be a good example of arrays allowing '0' values, and it's also better to avoid forbidden memory values.

Instruction pointer

The core of our program will be
  1. Read code to interpret
  2. While there is a current instruction
    1. execute it
    2. move to next instruction (except for '[' and ']', where it's not exactly the next instruction)
Then, we need to loop on instruction fetches. let's loop on instruction pointer value, and then instruction array will have a one-based index (to avoid null pointer breaking the loop).

Finally, to avoid complexity, [ and ] will be handled in a particular way
  • an inactive_flag will indicate if code can be run
  • direction_flag will indicate if instructions should be read from left to right or right to left
  • if inactive_flag is not null
    • Execution of '['
      • increase inactive_flag
      • if inactive_flag is now null, reset direction_flag
    • Execution of ']'
      • decrease inactive_flag
    • Execution of anything else
      • Do nothing
  • if inactive_flag is null
    • Execution of '['
      • If current memory value is not null: do nothing
      • If null:
        • increase inactive_flag
    • Execution of ']'
      • If current memory value is null: do nothing
      • If not null
        • Set direction_flag to 1
        • Decrease inactive_flag
This way, the interpreter will read all instructions one by one. The inactive_flag will be used as a counter to know if we reached the right number of [ and ] to start execution back to normal.

Global memory map and execution flow

Memory map:
         0 0 0 instructions 0 instruction_pointer some_free_space memory_pointer 0 0 0 0 0 0 memory

Execution flow
  1. Read instructions, i.e. read only instructions, and stop when # separator is read
  2. Fetch instruction given by instruction pointer
  3. Execute instruction
    • + and - : use memory_pointer to reach current cell in memory array and update value
    • < and >: update memory_pointer (+/- 1)
    • , : read in input stream the next char and store in memory cell pointed by memory_pointer
    • . : write current memory value to output stream
    • [ : read current memory value
      • if 0 : increase instruction pointer until it meets as many '[' than ']'
      • if not: do nothing
    • ] : read current memory value
      • if 0: do nothing
      • if not: decrease instruction pointer until it meets as many '[' than ']'


Aucun commentaire:

Enregistrer un commentaire