Principles of Compiler Design

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at ... u Basic block analysis u Backpatching u Dynamic programmi...

0 downloads 99 Views 1MB Size
Principles of Compiler Design - The Brainf*ck Compiler Clifford Wolf - www.clifford.at http://www.clifford.at/papers/2004/compiler/

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 1/56

Introduction ● Introduction ● Overview (1/2) ● Overview (2/2) ● Aim Brainf*ck Lexer and Parser

Introduction

Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 2/56

Introduction Introduction ● Introduction ● Overview (1/2)

n

My presentation at 20C3 about CPU design featuring a Brainf*ck CPU was a big success

n

My original plan for 21C3 was to build a Brainf*ck CPU with tubes..

n

But: The only thing more dangerous than a hardware guy with a code patch is a programmer with a soldering iron.

n

So this is a presentation about compiler design featuring a Brainf*ck Compiler.

● Overview (2/2) ● Aim Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 3/56

Overview (1/2) Introduction ● Introduction ● Overview (1/2) ● Overview (2/2) ● Aim

In this presentation I will discuss: n

A little introduction to Brainf*ck

n

Components of a compiler, overview

n

Designing and implementing lexers

n

Designing and implementing parsers

n

Designing and implementing code generators

n

Tools (flex, bison, iburg, etc.)

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 4/56

Overview (2/2) Introduction ● Introduction ● Overview (1/2)

n

Overview of more complex code generators u Abstract syntax trees u Intermediate representations u Basic block analysis u Backpatching u Dynamic programming u Optimizations

n

Design and implementation of the Brainf*ck Compiler

n

Implementation of and code generation for stack machines

n

Design and implementation of the SPL Project

n

Design and implementation of LL(regex) parsers

● Overview (2/2) ● Aim Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 5/56

Aim Introduction ● Introduction ● Overview (1/2) ● Overview (2/2) ● Aim

n

After this presentation, the auditors ..

n

.. should have a rough idea of how compilers are working.

n

.. should be able to implement parsers for complex configuration files.

n

.. should be able to implement code-generators for stack machines.

n

.. should have a rough idea of code-generation for register machines.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 6/56

Introduction Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions

Brainf*ck

Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 7/56

Overview Introduction

n

Brainf*ck is a very simple turing-complete programming language.

n

It has only 8 instructions and no instruction parameters.

n

Each instruction is represented by one character: < > + - . , [ ]

n

All other characters in the input are ignored.

n

A Brainfuck program has an implicit byte pointer which is free to move around within an array of 30000 bytes, initially all set to zero. The pointer itself is initialized to point to the beginning of this array.

Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Some languages are designed to solve a problem. Others are designed to prove a point.

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 8/56

Instructions Introduction Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions Lexer and Parser Code Generators

>

Increment the pointer.

++p;

<

Decrement the pointer.

--p;

+

Increment the byte at the pointer.

++*p;

-

Decrement the byte at the pointer.

++*p;

.

Tools Complex Code Generators The BF Compiler

putchar(*p); ,

Input a byte and store it in the byte at the pointer. *p = getchar();

Stack Machines The SPL Project

Output the byte at the pointer.

[

LL(regex) parsers

Jump forward past the matching ] if the byte at the pointer is zero. while (*p) {

URLs and References

]

Jump backward to the matching [ unless the byte at the pointer is zero. }

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 9/56

Implementing "while" Introduction

n

Implementing a while statement is easy, because the Brainf*ck [ .. ] statement is a while loop.

n

So while (x) { } becomes:

Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines

[ ]

The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 10/56

Implementing "x=y" Introduction

n

Implementing assignment (copy) instructions is a bit more complex.

n

The straight forward way of doing that resets y to zero:

Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions

[ + ]

Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers

n

So, a temporary variable t is needed: [ + ]

URLs and References


pointer pointer pointer pointer

to to to to

t> x> y> t>

[ + + ] http://www.clifford.at/papers/2004/compiler/ – p. 11/56

Implementing "if" Introduction Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

n

The if statement is like a while-loop, but it should run its block only once. Again, a temporary variable is needed to implement if (x) { }: [ + ] [ [ + ] ]

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 12/56

Functions Introduction

n

Brainf*ck has no construct for functions.

n

The compiler has support for macros which are always inlined.

n

The generated code may become huge if macros are used intensively.

n

So recursions must be implemented using explicit stacks.

Brainf*ck ● Overview ● Instructions ● Implementing "while" ● Implementing "x=y" ● Implementing "if" ● Functions Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 13/56

Introduction Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms

Lexer and Parser

● Conflicts Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 14/56

Lexer Introduction

n

The lexer reads the compiler input and transforms it to lexical tokens.

n

E.g. the lexer reads the input "while" and returns the numerical constant TOKEN WHILE.

n

Tokens may have additional attributes. E.g. the textual input "123" may be transformed to the token TOKEN NUMBER with the integer value 123 attached to it.

n

The lexer is usually implemented as function which is called by the parser.

Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms ● Conflicts Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 15/56

Parser Introduction

n

The parser consumes the lexical tokens (terminal symbols) and reduces sequences of terminal and non-terminal symbols to non-terminal symbols.

n

The parser creates the so-called parse tree.

n

The parse tree never exists as such as memory-structure.

n

Instead the parse-tree just defines the order in which so-called reduction functions are called.

n

It is possible to create tree-like memory structures in this reduction functions which look like the parse tree. This structures are called "Abstract Syntax Tree".

Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms ● Conflicts Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 16/56

BNF Introduction Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms

BNF (Backus-Naur Form) is a way of writing down parser definitions. A BNF for parsing a simple assign statement (like “x = y + z * 3”) could look like (yacc style syntax): assign: NAME ’=’ expression;

● Conflicts Code Generators Tools

primary: NAME | NUMBER | ’(’ expression ’)’;

Complex Code Generators The BF Compiler Stack Machines The SPL Project

product: primary | product ’*’ primary | product ’/’ primary;

LL(regex) parsers URLs and References

sum: product | sum ’+’ product | sum ’-’ product; expression: sum;

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 17/56

Reduce Functions Introduction

n

Brainf*ck Lexer and Parser ● Lexer

Whenever a sequence of symbols is reduced to a non-terminal symbol, a reduce function is called. E.g.: %union { int numval; } %type sum product

● Parser ● BNF ● Reduce Functions ● Algorithms ● Conflicts Code Generators Tools

%%

Complex Code Generators The BF Compiler

sum: product | sum ’+’ product | sum ’-’ product

Stack Machines The SPL Project LL(regex) parsers

{ $$ = $1 + $3; } { $$ = $1 + $3; };

URLs and References

n

Clifford Wolf, December 22, 2004

The attributes of the symbols on the right side of the reduction can be accessed using $1 .. $n. The attributes of the resulting symbol can be accessed with $$.

http://www.clifford.at/papers/2004/compiler/ – p. 18/56

Algorithms Introduction

n

A huge number of different parser algorithms exists.

n

The two most important algorithms are LL(N) and LALR(N).

n

Other algorithms are LL(k), LL(regex), GLR and Ad-Hoc.

n

Most hand written parsers are LL(1) parsers.

n

Most parser generators create LALR(1) parsers.

n

A detailed discussion of various parser algorithms can be found in “The Dragonbook” (see references on last slide).

n

The design and implementation of LL(1) parsers is also discussed in the section about LL(regex) parsers.

Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms ● Conflicts Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 19/56

Conflicts Introduction

n

Sometimes a parser grammar is ambiguous.

n

In this cases, the parser has to choose one possible interpretation of the input.

n

LALR parsers distinguish between reduce-reduce and shift-reduce conflicts.

n

Reduce-reduce conflicts should be avoided when writing the BNF.

n

Shift-reduce conflicts are always solved by shifting.

Brainf*ck Lexer and Parser ● Lexer ● Parser ● BNF ● Reduce Functions ● Algorithms ● Conflicts Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 20/56

Introduction Brainf*ck Lexer and Parser Code Generators ● Overview ● Simple Code Generators

Code Generators

Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 21/56

Overview Introduction

n

Writing the code generator is the most complex part of a compiler project.

n

Usually the code-generation is split up in different stages, such as: u Creating an Abstract-Syntax tree u Creating an intermediate code u Creating the output code

n

A code-generator which creates assembler code is usually much easier to write than a code-generator creating binaries.

Brainf*ck Lexer and Parser Code Generators ● Overview ● Simple Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 22/56

Simple Code Generators Introduction

n

Simple code generators may generate code directly in the parser.

n

This is possible if no anonymous variables exist (BFC) or the target machine is a stack-machine (SPL).

Brainf*ck Lexer and Parser Code Generators ● Overview ● Simple Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

Example: if_stmt: TK_IF TK_ARGS_BEGIN TK_STRING TK_ARGS_END stmt { $$ = xprintf(0, 0, "%s{", debug_info()); $$ = xprintf($$, $5, "(#tmp_if)<#tmp_if>[-]" "<%s>[-<#tmp_if>+]" "<#tmp_if>[[-<%s>+]\n", $3, $3 $$ = xprintf($$, 0, "]}"); }

http://www.clifford.at/papers/2004/compiler/ – p. 23/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools ● Overview

Tools

● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 24/56

Overview Introduction

n

There are tools for writing compilers.

n

Most of these tools cover the lexer/parser step only.

n

Most of these tools generate c-code from a declarative language.

n

Use those tools but understand what they are doing!

Brainf*ck Lexer and Parser Code Generators Tools ● Overview ● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 25/56

Flex / Lex Introduction

n

Flex (Fast Lex) is the GNU successor of Lex.

n

The lex input file (*.l) is a list or regular expressions and actions.

n

The “actions” are c code which should be executed when the lexer finds a match for the regular expression in the input.

n

Most actions simply return the token to the parser.

n

It is possible to skip patterns (e.g. white spaces) by not providing an action at all.

Brainf*ck Lexer and Parser Code Generators Tools ● Overview ● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 26/56

Yacc / Bison Introduction

n

Bison is the GNU successor of Yacc (Yet Another Compiler Compiler).

n

Bison is a parser generator.

n

The bison input (*.y) is a BNF with reduce functions.

n

The generated parser is a LALR(1) parser.

n

Bison can also generate GLR parsers.

Brainf*ck Lexer and Parser Code Generators Tools ● Overview ● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 27/56

Burg / iBurg Introduction

n

iBurg is the successor of Burg.

n

iBurg is a “Code Generator Generator”.

n

The code generator generated by iBurg implements the “dynamic programming” algorithm.

n

It is a bit like a parser for an abstract syntax tree with an extremely ambiguous BNF.

n

The reductions have cost values applied and an iBurg code generator chooses the cheapest fit.

Brainf*ck Lexer and Parser Code Generators Tools ● Overview ● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 28/56

PCCTS Introduction

n

PCCTS is the “Purdue Compiler-Compiler Tool Set”.

n

PCCTS is a parser generator for LL(k) parsers in C++.

n

The PCCTS toolkit was written by Terence J. Parr of the MageLang Institute.

n

His current project is antlr 2 - a complete redesign of pccts, written in Java, that generates Java or C++.

n

PCCTS is now maintained by Tom Moog, Polhode, Inc.

Brainf*ck Lexer and Parser Code Generators Tools ● Overview ● Flex / Lex ● Yacc / Bison ● Burg / iBurg ● PCCTS Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 29/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

Complex Code Generators

Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 30/56

Overview Introduction

n

Unfortunately it’s not possible to cover code generation in depth in this presentation.

n

However, I will try to give a rough overview of the topic and explain the most important terms.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 31/56

Abstract syntax trees Introduction

n

With some languages it is hard to create intermediate code directly from the parser.

n

In compilers for such languages, an abstract syntax tree is created from the parser.

n

The intermediate code generation can then be done in different phases which may process the abstract syntax tree bottom-up and top-down.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 32/56

Intermediate representations Introduction

n

Most compilers create intermediate code from the input and generate output code from this intermediate code.

n

Usually the intermediate code is some kind of three-address code assembler language.

n

The GCC intermediate language is called RTL and is a wild mix of imperative and functional programming.

n

Intermediate representations which are easily converted to trees (such as functional approaches) are better for dynamic programming, but are usually not optimal for ad-hoc code generators.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 33/56

Basic block analysis Introduction

n

A code block from one jump target to the next is called “Basic Block”.

n

Optimizations in basic blocks are an entirely different class of optimization than those which can be applied to a larger code block.

n

Many compilers create intermediate language trees for each basic block and then create the code for it using dynamic programming.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 34/56

Backpatching Introduction

n

It is often necessary to create jump instructions without knowing the jump target address yet.

n

This problem is solved by outputting a dummy target address and fixing it later.

n

This procedure is called backpatching.

n

The Brainf*ck compiler doesn’t need backpatching because Brainf*ck doesn’t have jump instructions and addresses.

n

However, the Brainf*ck runtime bundled with the compiler is using backpatching to optimize the runtime speed.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 35/56

Dynamic programming Introduction

n

Dynamic programming is an algorithm for generating assembler code from intermediate language trees.

n

Code generators such as Burg and iBurg are implementing the dynamic programming algorithm.

n

Dynamic programming uses two different phases.

n

In the first phase, the tree is labeled to find the cheapest matches in the rule set (bottom-up).

n

In the 2nd phase, the code for the cheapest solution is generated (top-down).

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 36/56

Optimizations Introduction

n

Most optimizing compilers perform different optimizations in different compilation phases.

n

So most compilers don’t have a separate “the optimizer” code path.

n

Some important optimizations are: u Global register allocation u Loop detection and unrolling u Common subexpression elimination u Peephole optimizations

n

The Brainf*ck compiler does not optimize.

n

The SPL compiler has a simple peephole optimizer.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators ● Overview ● Abstract syntax trees ● Intermediate representations ● Basic block analysis ● Backpatching ● Dynamic programming ● Optimizations The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 37/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

The BF Compiler

Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler ● Running ● Implementation Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 38/56

Overview Introduction

n

The project is split up in an assembler and a compiler.

n

The assembler handles variable names and manages the pointer position.

n

The compiler reads BFC input files and creates assembler code.

n

The assembler has an ad-hoc lexer and parser.

n

The compiler has a flex generated lexer and a bison generated parser.

n

The compiler generates the assembler code directly from the parser reduce functions.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler ● Running ● Implementation Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 39/56

Assembler Introduction

n

The operators [ + and - are unmodified.

n

The ] operator sets the pointer back to the position where it was at [.

n

A named variable can be defined with (x).

n

The pointer can be set to a named variable with .

n

A name space is defined with { ...

n

A block in single quotes is passed through unmodified.

n

Larger spaces can be defined with (x.42).

n

An alias for another variable can be defined with (x:y).

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler ● Running ● Implementation Stack Machines The SPL Project

}.

LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 40/56

Compiler Introduction

n

Variables are declared with var x;.

n

C-like expressions for =, +=, -=, if and while are available.

n

Macros can be defined with macro x() { ...

n

All variables are passed using call-by-reference.

n

The compiler can’t evaluate complex expressions.

n

Higher functions (such as comparisons and multiply) are implemented using built-in functions.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler

}.

● Running ● Implementation Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 41/56

Running Introduction

n

The compiler and the assembler are both filter programs.

n

So compilation is done by: $ ./bfc < hanoi.bfc | ./bfa > hanoi.bf Code: 53884 bytes, Data: 275 bytes.

n

The bfrun executable is a simple Brainf*ck interpreter: $ ./bfrun hanoi.bf

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler ● Running ● Implementation Stack Machines The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 42/56

Implementation Introduction Brainf*ck Lexer and Parser

Code review of the assembler.

Code Generators Tools Complex Code Generators The BF Compiler ● Overview ● Assembler ● Compiler

.. and the compiler.

● Running ● Implementation Stack Machines The SPL Project

.. and the built-ins library.

LL(regex) parsers URLs and References

.. and the hanoi example. Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 43/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

Stack Machines

Complex Code Generators The BF Compiler Stack Machines ● Overview ● Example The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 44/56

Overview Introduction

n

Stack machine are a computer architecture, like register machines or accumulator machines.

n

Every instruction pops it’s arguments from the stack and pushes the result back on the stack.

n

Special instructions push the content of a variable on the stack or pop a value from the stack and write it back to a variable.

n

Stack machines are great for virtual machines in scripting languages because code generation is very easy.

n

However, stack machines are less efficient than register machines and are harder to implement in hardware.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines ● Overview ● Example The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 45/56

Example Introduction Brainf*ck Lexer and Parser Code Generators Tools

x = 5 * ( 3 + y );

Complex Code Generators The BF Compiler Stack Machines ● Overview ● Example The SPL Project LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

PUSHC PUSHC PUSH IADD IMUL POP

"5" "3" "y"

"x"

http://www.clifford.at/papers/2004/compiler/ – p. 46/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

The SPL Project

Complex Code Generators The BF Compiler Stack Machines The SPL Project ● Overview ● WebSPL ● Example LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 47/56

Overview Introduction

n

SPL is an embeddable scripting language with C-like syntax.

n

It has support for arrays, hashes, objects, perl regular expressions, etc. pp.

n

The entire state of the virtual machine can be dumped at any time and execution of the program resumed later.

n

In SPL there is a clear separation of compiler, assembler, optimizer and virtual machine.

n

It’s possible to run pre-compiled binaries, program directly in the VM assembly, use multi threading, step-debug programs, etc. pp.

n

SPL is a very small project, so it is a good example for implementing high-level language compilers for stack machines.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project ● Overview ● WebSPL ● Example LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 48/56

WebSPL Introduction

n

WebSPL is a framework for web application development.

n

It creates a state over the stateless HTTP protocol using the dump/restore features of SPL.

n

I.e. it is possible to print out an updated HTML page and then call a function which “waits” for the user to do anything and returns then.

n

WebSPL is still missing some bindings for various SQL implementations, XML and XSLT bindings, the WSF (WebSPL Forms) library and some other stuff..

n

Right now I’m looking for people who want to participate in the project.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project ● Overview ● WebSPL ● Example LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 49/56

Example Introduction Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project ● Overview ● WebSPL ● Example LL(regex) parsers URLs and References

Clifford Wolf, December 22, 2004

object Friend { var id; <...> method winmain(sid) { title = name; .sid = sid; while (1) { template = "show"; bother_user(); if ( defined cgi.param.edit ) { template = "edit"; bother_user(); name = cgi.param.new_name; phone = cgi.param.new_phone; email = cgi.param.new_email; addr = cgi.param.new_addr; title = name; } if ( defined cgi.param.delfriend ) { delete friends.[id].links.[cgi.param.delfriend]; delete friends.[cgi.param.delfriend].links.[id]; } if ( defined cgi.param.delete ) { delete friends.[id]; foreach f (friends) delete friends.[f].links.[id]; &windows.[winid].finish(); } } } } http://www.clifford.at/papers/2004/compiler/ – p. 50/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

LL(regex) parsers

Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers ● Overview ● Left recursions ● Example URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 51/56

Overview Introduction

n

LL parsers (recursive decent parsers) are straight-forward implementations of a BNF.

n

Usually parsers read lexemes (tokens) from a lexer.

n

A LL(N) parser has access to N lookahead symbols to decide which reduction should be applied.

n

Usually LL(N) parsers are LL(1) parsers.

n

LL(regex) parsers are LL parsers with no lexer but a regex engine.

n

LL(regex) parsers are very easy to implement in perl.

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers ● Overview ● Left recursions ● Example URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 52/56

Left recursions Introduction

n

Often a BNF contains left recursion: <...> product: primary | product ’*’ primary | product ’/’ primary; <...>

n

Left recursions cause LL parsers to run into an endless recursion.

n

There are algorithms for converting left recursions to right recursions without effecting the organization of the parse tree.

n

But the resulting BNF is much more complex than the original one.

n

Most parser generators do that automatically (e.g. bison).

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers ● Overview ● Left recursions ● Example URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 53/56

Example Introduction Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators

Code review of llregex.pl.

The BF Compiler Stack Machines The SPL Project

http://www.clifford.at/papers/2004/compiler/llregex.pl

LL(regex) parsers ● Overview ● Left recursions ● Example URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 54/56

Introduction Brainf*ck Lexer and Parser Code Generators Tools

URLs and References

Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References ● URLs and References

Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 55/56

URLs and References Introduction

n

My Brainf*ck Projects: http://www.clifford.at/bfcpu/

n

The SPL Project: http://www.clifford.at/spl/

n

Clifford Wolf: http://www.clifford.at/

n

“The Dragonbook” Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman Addison-Wesley 1986; ISBN 0-201-10088-6

n

LINBIT Information Technologies http://www.linbit.com/

Brainf*ck Lexer and Parser Code Generators Tools Complex Code Generators The BF Compiler Stack Machines The SPL Project LL(regex) parsers URLs and References ● URLs and References

http://www.clifford.at/papers/2004/compiler/ Clifford Wolf, December 22, 2004

http://www.clifford.at/papers/2004/compiler/ – p. 56/56