Author Archive

The Prototype Pattern and Extensible Compilers

Saturday, November 1st, 2008

During some of the early design work on the Mention programming language, I recognized that a fixed parser would be unable to deal with a great number of interesting language problems. Things like embedded SQL and other DSL language, or perhaps even a unit notation.

All of these things could be added into a syntax, I believe, with a carefully constructed extensible parser. Each extension would become a libraries (modules) adding or replacing function points in the existing parser later reusing some of them as part of the extended syntax. (The actual parser would be based on Parsec and other similar combinator parsers. OMeta is also related to this parser architecture.)

The problem with adding new syntaxes to a language isn’t just the parser implementation however. Most of the really tricky problems actually come from the rest of the tool chain. Things like the compiler, debugger, profilers, editor, and a bunch of other potential tools. It had become clear to me that I wasn’t just working on an extensible parser, but rather an extensible language, and that meant an extensible compiler architecture.

Basically I thinking about a completely extensible intermediate language, which would power a set of semi-extensible libraries for implementing the bulk of the tool-chain. These libraries will take the leg-work out of processing the source code, and will allow the tools to manipulate or interact with the code in its intermediate form (this includes editors and analytical tools). The only problem I saw was that I had absolutely no idea how to construct such an intermediate language, little own how to write tools and libraries to interact with it.

Anyway, recently I run into the latest (at the time) Steve Yegge post, with a somewhat inspiring discussion about the Prototype Pattern (The Universal Design Pattern). After musing about pattern (again) for a couple of days, it finally hit me, I had the architecture all wrong. This isn’t just some extensible compiler or parser…this was the Self programming language…redesigned, implemented and tailored for the task of compiler writing. The AST is simply a specialized variation of the Prototype pattern, or more succinctly, a compiler is just a constrained dialect of Self, designed for compiler tasks.

So this my proposal is this, use the prototype pattern for Mentions AST and build a framework of libraries and tools designed to aid in the maintenance and development of an AST and the compilers, parsers, debugger, et cetera, which depend on it.

There are already a few pitfalls that I can imagine coming out of this system. For one, a good namespace or typing system will probably be required to prevent name conflicts. That is, each extension or collection of extensions would add only private collections of attributes, while a smaller group of shared public attribute interfaces would be used to facilitate cooperation between extensions. And then there’s still the issue of expected behavior, which may be complicated by the potential for different combinations of compiler extensions to inadvertently interact in unexpected ways.

In any case, I believe that this will be one of the key features of the core Mention programming language and it may even make some of the more complicated features like type systems easier to implement and experiment with later on. At the very least, you can expect to see some follow up material on this when I start to get further into the implementation of the Mention.

– Lorenz

Read Transparency

Wednesday, October 22nd, 2008

Over the past few months I’ve been putting together a collection of requirements, of sorts, for a new class of dynamic functional programming languages, a language that could provide powerful dynamic features on par with Smalltalk or Self (and JavaScript etc.), without losing some important features like laziness or strong static typing.

One of the requirements I keep stumbling across is the need for referentially transparent functions that are capable of interacting directly with mutable data structures. Yeah, the kind that, by definition, are not referentially transparent.

At first I considered using compiler attributes to mark a piece of code that interacts with side effects vs. code that doesn’t. Something like the IO/ST monads from Haskell but general enough any function can interact with mutable variables, not just ones returning a monadic value. Unfortunately any code that interacts with mutable variables may behave radically differently when not constrained by the full monadic infrastructure or some equivalent like strictness (Scheme and ML).

I believe the only useful solution is to define a new property of the code–read transparency–which allows us to recognize code that is referentially transparent on immutable data while also referentially transparent towards unchanging mutable variables.

A function is read transparent if it does not cause new side effects (either by IO, mutable state or otherwise), and the function produces equivalent result between any two calls provided the state of all arguments is equivalent (or the arguments are both immutable and the same values).

The above definition depends on the equivalence property of values, which I define along the lines of “any two data structures are equivalent if they are functionally the same value, or they only contain mutable variables containing equivalent state.” Given a good equivalence for all primitives this allows us to easily check for all functions which could safely be called in our new semi-functional programs.

The original motivating example for this concept was to allow functions like show in Haskell to work with any arbitrary data structure, mutable or not. This would greatly simplify programs like interpreters, which could use many of the same functions as pure code, without having to use somewhat cumbersome combinators like the lifting functions and unboxing functions (readIORef, etc.).

An important thing to remember about this property is that, while many previously referentially transparent functions would now only be read transparent, when used only with pure functional data, they are in fact promoted automatically to fully referentially transparent functions. That is, they remain effectively referentially transparent if you don’t care about mutable state, or to put it another way–the functions become more flexible, not less so. Only when the implementation of a function relies on language features that prevent mutable data from being used would the more restrictive referential transparency be required (like lazily evaluated results, incorporating the read operation).

We don’t actually have to stop there either. Just because a piece of code isn’t read transparent, that doesn’t mean that it suffers from all variations of side-effects. There are still yet more shades of grey which we can use to limit the damage caused by bugs and side-effects in our software. A property like write transparency for code that reads and writes to mutable data structures but performs no IO could be equally useful, just as the ST monad in Haskell, which cannot perform IO is useful for working with mutable state.

Unfortunately while our little definition is a rather interesting take on the mutable state issue, there are some rather serious caveats we still have to consider. The biggest problem we face here is that, while a read transparent function can read from mutable variables without any reprimands, it cannot use the unread mutable variable in the result, unless it uses it, still boxed. That is, if we returned a result lazily, the result might be different depending on when the result was used, definitely not something we want, and the primary reason that mutable state in Haskell is limited to the ST monad et cetera.

More concretely, we can return new mutable variables (since the result would always be equivalent), use the results from reading a mutable variable so long as we read it strictly before the function returns, and finally we can return the mutable variable verbatim as part of the result (something Haskell can do also). If any language implemented mutable variables unboxed however, they would not be usable in last context however, and you would always have to read from them strictly, leading to somewhat less flexibility in the language.

One last thing I’d like to add. The above definition is strikingly similar to something that might be rigorously defined and explored, but for some reason I haven’t done so in this post. Well, for all intents and purposes, I wouldn’t know where to start, at least not yet anyway. If and when I apply this property in an actual language implementation however, you can be sure I’ll have a rigorous writeup of it and it will likely be part of the languages spec too.

– Lorenz

Haskell and Game Development

Sunday, October 19th, 2008

Like many gamedev enthusiasts, I’ve only finished a handful of computer games and, perhaps not surprisingly, most of those were never released even on my homepage either. Usually the point of game development at an enthusiast or hobbyist level is not to create gaming masterpieces anyway, but rather to simply learn about the process and have fun working the code (or graphics, design, et cetera), and maybe even develop a few reasonably simple games.

Over the years I’ve spend quite a bit of time learning the gamedev concepts and recently I was able to apply some of them in a project, intended to test Haskell’s gamedev capabilities for hobbyist or small team game development (related post). Specifically I was interested to see how the difference between Haskell and Object-Oriented languages changed the design and implementation of the game code and engine, hopefully lending itself to the eventual development of languages tailored towards game development.

I set out to develop one of my favorite arcade remakes–yet another variation of Geometry Wars. The premise would be to keep the game as simple as possible, only the basic gameplay would be present with a couple of levels of progression. Enough to test all of the simple elements of gameplay ready for a more substantial project. As you may have guessed from the absence of the obligatory screenshot it didn’t work out as well as I expected.

Luckily I did pick up a couple of things while working on the game, and better yet I may even be able to rewrite the code and find that screenshot yet. It all comes down to a couple of quirks in Haskell’s syntax which didn’t blend well with my usual game architecture. Obviously the architecture would need to be modified to suite the idiomatic style of Haskell, but I was reasonably certain (and still are) that the architecture is the simplest implementation for both Object-Oriented languages as well as functional languages, based almost solely on the requirements of the high level game data structures.

The Haskell code for these structures was essentially based directly on records. The records originally contained only functional values, but I quickly realized how crazy that was, which lead me to the current design which uses records full of IORefs, at least for the mutable slots. (If its not immediately obvious why the IORefs are needed, it might help to know that game objects quite often need to reference each other simply because the operations are related between the two…bullets from the player ship for example.)

Finally, after working with this approach for a while it eventually sunk in that I could probably write the same code in Python in not only half the time, but also quicker for the same number of overall bugs. Not because the bugs are easier to catch in Python…they really, really aren’t. But because Python allows me to test many more variations of the game. At least more variations than Haskell seems to, all of which makes me wonder if I haven’t quite tapped into the idiomatic code still.

So I’m not quite out of ideas yet. I believe I can tweak my accessors to reduce the namespace damage caused by them (at the expense of some verbosity, however…why the hell hasn’t this been fixed yet anyway). And there’s even some clever usage of typeclasses which could simplify the use of common algorithms between game objects. If all goes to plan I’ll have some code to post next time…

– Lorenz

Yet More Work On The Blog

Saturday, October 4th, 2008

Theres been quite a lot of work going into the blog under the surface recently. The most important being an update to the current version of…well everything. The most obvious ramification of this process have been the change in theme, back to the default bundled theme (not counting the header color which is modifiable in the settings page of the them). I intend to keep this version which is considerably faster (than K2) and I think still more than usable, while I start developing a completely new theme for the blog (no time frame however, as per usual).

The other change is not particularly obvious, but basically the site is now under the Git DVCS. This will make it significantly simpler to manage any updates of the site while concurrently maintaining the code I write for the plugins and visual style. Basically the site is maintained through a collections of branches which are rebased into the deployed branch which is finally published to the server. The files on the server are then reset to the latest version of the deployed branch. The side affect of this being, I can update any single piece using its corresponding feature branch and simply rebase the dependent branches to the last patch on that branch, while any changes after the branch or merge point are simply re-applied afterwards (by Git, not me). Update…fix conflicts…push…done!

So I promised more actual posts I believe, I am getting to it, albeit slowly.

For some background into the lack of actual language related work I’ve produced lately it helps to understand, arguably the most important, project that I’m working on right now. The Mention Programming Language is a new language that I’ve had in the works since the very beginning of this blog. At one point I even had some pages dedicated to the language. Don’t bother looking, there are no posts about the language yet… I decided very early on to develop the language in the dark, at least until I was sure that it would lead to an actual release of some sort (I’ve worked on quite a few dead language designs over the years).

So Mention is a functional programming language with strong influences from Haskell, and Smalltalk. Yup, you heard right, the design of the language is a dynamic one, which despite the Haskell based syntax will provide a myriad of dynamic features like delegation and accessors more, much more reminiscent of Smalltalk than Haskell. There are a couple of other minor influences that creep in there every so often, due to other languages I’ve worked with or investigated (the design), but those are the main ones.

There’s only one problem, however. I don’t understand the math (lambda calculus and type theory) needed to implement the language yet. I’m getting there but it is taking a little while. The good news is that this wont stop me from posting for much longer. Some of the posts might be about specific things relating the the theory, but the real problem was my lack of a mathematical background. Thanks to my undergrad math text that problem is much less of an issue now and it the Mention project should be getting underway properly very soon, expect some posts on the non-implementation issues like the syntax high level semantics.

On another issue might now be obvious why I haven’t finished the Scheme parsing library and the corresponding series about its design and implementation… I’m not working in Scheme anymore. There is still plenty I could say about the design of Scheme and Lisp dialects, but I’ll leave that for later, but what I would like to add in post in the near future is the relationship between laziness + monads and strictness + mutable-state. For this comparison I’ll be using Scheme and Haskell (don’t worry…the code will only be for examples in both cases). What won’t be happening is any more posts or work on the Scheme parser itself.

– Lorenz

New URL, Update your bookmarks…

Tuesday, September 16th, 2008

After sitting on the domain name associated with my online handle (Krysole), I’ve finally decided it was time to move my blog to an URL using its own namesake.

What does this mean?

From now on the proper URL for A Lexical mistake is simply alexicalmistake.com

The configuration for the site seems to properly redirect pretty much everything to the new URL (including permalinks… I’ll keep the old URL pointing to the site for this very reason).

On another issue, the blog very much isn’t dead, it just seems that way because I’ve been somewhat preoccupied lately. The main reason for this is my move to Linux, or more specifically Ubuntu. I’ve also been transitioning most of my mac dependencies to their web equivalents (Google reader, etc).

The point is I should be back to posting biweekly again (or close too), very soon now.

– Lorenz