Write your own JVM compiler OSCON Java 2011 Ian Dees @undees Hi, - - PowerPoint PPT Presentation

write your own jvm compiler
SMART_READER_LITE
LIVE PREVIEW

Write your own JVM compiler OSCON Java 2011 Ian Dees @undees Hi, - - PowerPoint PPT Presentation

Write your own JVM compiler OSCON Java 2011 Ian Dees @undees Hi, Im Ian. Im here to show that theres never been a better time than now to write your own compiler. the three paths Abstraction compilers Time Why would someone want


slide-1
SLIDE 1

Write your own JVM compiler

OSCON Java 2011 Ian Dees @undees

Hi, I’m Ian. I’m here to show that there’s never been a better time than now to write your own compiler.

slide-2
SLIDE 2

Time Abstraction compilers

the three paths

Why would someone want to do this? That depends on the direction you’re coming from.

slide-3
SLIDE 3

Time Abstraction compilers e-

xor eax, eax

If, like me, you came from a hardware background, compilers are the next logical step up the ladder

  • f abstraction from programming.
slide-4
SLIDE 4

Time Abstraction compilers grammars logic λx.x e-

xor eax, eax

If you come from a computer science background, compilers and parsers are practical tools to get things done (such as reading crufty ad hoc data formats at work).

slide-5
SLIDE 5

Time Abstraction compilers grammars logic λx.x e-

xor eax, eax

If you’re following the self-made path, compilers are one of many computer science topics you may come to naturally on your travels.

slide-6
SLIDE 6

where we’re headed

By the end of this talk, we’ll have seen the basics of how to use JRuby to create a compiler for a fictional programming language.

slide-7
SLIDE 7

Thnad

a fictional programming language

Since all the good letters for programming languages are taken (C, D, K, R, etc.), let’s use a fictional

  • letter. “Thnad,” a letter that comes to us from Dr. Seuss, will do nicely.
slide-8
SLIDE 8

function factorial(n) { if(eq(n, 1)) { 1 } else { times(n, factorial(minus(n, 1))) } } print(factorial(4))

Here’s a simple Thnad program. As you can see, it has only integers, functions, and conditionals. No variables, type definitions, or anything else. But this bare minimum will keep us plenty busy for the next half hour.

slide-9
SLIDE 9

before we see the code

hand-waving

Before we jump into the code, let’s look at the general view of where we’re headed.

slide-10
SLIDE 10

minus(n, 1)

:funcall :args ‘minus’ :name :number :name ‘n’ ’1’ :arg :arg

{ :funcall => { :name => 'minus' }, { :args => [ { :arg => { :name => 'n' } }, { :arg => { :number => '1' } } } ] }

We need to take a piece of program text like “minus(n, 1)” and break it down into something like a sentence’s parts of speech: this is a function call, this is a parameter, and so on. The code in the middle is how we might represent this breakdown in Ruby, but really we should be thinking about it graphically, like the tree at the bottom.

slide-11
SLIDE 11

minus(n, 1)

Funcall @args ‘minus’ @name @name ‘n’ Usage Number @value 1

Funcall.new 'minus', Usage.new('n'), Number.new(1)

The representation on the previous slide used plain arrays, strings, and so on. It will be helpful to transform these into custom Ruby objects that know how to emit JVM bytecode. Here’s the kind of thing we’ll be aiming for. Notice that the tree looks really similar to the one we just saw—the only difgerences are the names in the nodes.

slide-12
SLIDE 12
  • 1. parse
  • 2. transform
  • 3. emit

Each stage of our compiler will transform one of our program representations—original program text, generic Ruby objects, custom Ruby objects, JVM bytecode—into the next.

slide-13
SLIDE 13
  • 1. parse

First, let’s look at parsing the original program text into generic Ruby objects.

slide-14
SLIDE 14

describe Thnad::Parser do before do @parser = Thnad::Parser.new end it 'reads a number' do expected = { :number => '42' } @parser.number.parse('42').must_equal expected end end

Here’s a unit test written with minitest, a unit testing framework that ships with Ruby 1.9 (and therefore JRuby). We want the text “42” to be parsed into the Ruby hash shown here.

slide-15
SLIDE 15

tool #1: Parslet

http://kschiess.github.com/parslet

How are we going to parse our programming language into that internal representation? By using Parslet, a parsing tool written in Ruby. For the curious, Parslet is a PEG (Parsing Expression Grammar) parser.

slide-16
SLIDE 16

require 'parslet' module Thnad class Parser < Parslet::Parser # gotta see it all rule(:space) { match('\s').repeat(1) } rule(:space?) { space.maybe } # numeric values rule(:number) { match('[0-9]').repeat(1).as(:number) >> space? } end end

Here’s a set of Parslet rules that will get our unit test to pass. Notice that the rules read a little bit like a mathematical notation: “a number is a sequence of one or more digits, possibly followed by trailing whitespace.”

slide-17
SLIDE 17
  • 2. transform

The next step is to transform the Ruby arrays and hashes into more sophisticated objects that will (eventually) be able to generate bytecode.

slide-18
SLIDE 18

input = { :number => '42' } expected = Thnad::Number.new(42) @transform.apply(input).must_equal expected

Here’s a unit test for that behavior. We start with the result of the previous step (a Ruby hash) and expect a custom Number object.

slide-19
SLIDE 19

tool #2: Parslet

(again)

How are we going to transform the data? We’ll use Parslet again.

slide-20
SLIDE 20

module Thnad class Transform < Parslet::Transform rule(:number => simple(:value)) { Number.new(value.to_i) } end end

This rule takes any simple Ruby hash with a :number key and transforms it into a Number object.

slide-21
SLIDE 21

require 'parslet' module Thnad class Number < Struct.new(:value)

# more to come...

end end

Of course, we’ll need to define the Number class. Once we’ve done so, the unit tests will pass.

slide-22
SLIDE 22
  • 3. emit

Finally, we need to emit JVM bytecode.

slide-23
SLIDE 23

bytecode

ldc 42

Here’s the bytecode we want to generate when our compiler encounters a Number object with 42 as the value. It just pushes the value right onto the stack.

slide-24
SLIDE 24

describe 'Emit' do before do @builder = mock @context = Hash.new end it 'emits a number' do input = Thnad::Number.new 42 @builder.expects(:ldc).with(42) input.eval @context, @builder end end

And here’s the unit test that specifies this behavior. We’re using mock objects to say in efgect, “Imagine a Ruby object that can generate bytecode; our compiler should call it like this.”

slide-25
SLIDE 25

require 'parslet' module Thnad class Number < Struct.new(:value) def eval(context, builder) builder.ldc value end end end

Our Number class will now need an “eval” method that takes a context (useful for carrying around information such as parameter names) and a bytecode builder (which we’ll get to on the next slide).

slide-26
SLIDE 26

tool #3: BiteScript

https://github.com/headius/bitescript

A moment ago, we saw that we’ll need a Ruby object that knows how to emit JVM bytecode. The BiteScript library for JRuby will give us just such an object.

slide-27
SLIDE 27

iload 0 ldc 1 invokestatic example, 'minus', int, int, int ireturn

This, for example, is a chunk of BiteScript that writes a Java .class file containing a function call—in this case, the equivalent of the “minus(n, 1)” Thnad code we saw earlier.

slide-28
SLIDE 28

$ javap -c example Compiled from "example.bs" public class example extends java.lang.Object{ public static int doSomething(int); Code: 0: iload_0 1: iconst_1 2: invokestatic #11; //Method minus:(II)I 5: ireturn

If you run that through the BiteScript compiler and then use the standard Java tools to view the resulting bytecode, you can see it’s essentially the same program.

slide-29
SLIDE 29

enough hand-waving

let’s see the code!

Armed with this background information, we can now jump into the code.

slide-30
SLIDE 30

a copy of our home game

https://github.com/undees/thnad

At this point in the talk, I fired up TextMate and navigated through the project, choosing the direction based on whim and audience questions. You can follow along at home by looking at the various commits made to this project on GitHub.

slide-31
SLIDE 31

more on BiteScript

http://www.slideshare.net/CharlesNutter/ redev-2010-jvm-bytecode-for-dummies

There are two resources I found helpful during this project. The first was Charles Nutter’s primer on Java bytecode, which was the only JVM reference material necessary to write this compiler.

slide-32
SLIDE 32

see also

http://createyourproglang.com

Marc-André Cournoyer

The second helpful resource was Marc-André Cournoyer’s e-book How to Create Your Own Freaking Awesome Programming Language. It’s geared more toward interpreters than compilers, but was a great way to check my progress after I’d completed the major stages of this project.

slide-33
SLIDE 33

hopes

My hope for us as a group today is that, after our time together, you feel it might be worth giving

  • ne of these tools a shot—either for a fun side project, or to solve some protocol parsing need you

may encounter one day in your work.