Tuesday, December 31, 2013

Book Review: Keeping Up With the Quants

When all you have is a six-step formula, even a philosopher running naked through the streets of Athens, genitals flapping furiously in the breeze, can be shoehorned to fit into your pet paradigm.

Don't. Buy. This. Book.

Credit is however due to the authors for turning what should have been nothing more than a lengthy blog post into a book with all of 200-odd double-spaced pages. It does help when you can liberally quote blog posts and other more-or-less unrelated text snippets as well, not to mention repeat your six-step mantra every opportunity you get. No credit to you if you were unlucky enough to have bought the book rather than borrowing it from the library like I did.

While on the subject of inducements to buy, here's a pro-tip for aspiring authors: liberal application of maalish, in the form of quotes for your book from popular figures (Larry Summers excluded; the page containing his words of wisdom alone is enough grounds for taking the book to the shredder and torturing it slowly, while giving it false hopes by periodically pretending to put it back on the bookshelf) can persuade these quoters (don't bother; it's not a real word) to provide some more quotes to adorn the back of the dust jacket.

One might argue that the book is targeted at senior executives who don't know quantitative analytics, but if a senior executive reads gems like "Since data itself does not tell us anything, we need to analyze it in order to decipher its meaning and relationships" and a light bulb goes off in his brain, it's time to short his company's stock, folks.

The book does have some good things, stuff like when to use which statistical technique, references to other material, and so on, but as I mentioned earlier, nothing that can't be said in a crisp blog post.

Oh, and providing free advertising for NCSU's MSA program? You've got to be kidding me.

Friday, December 20, 2013

Building a Lisp Interpreter from Scratch: -- Part 13: Updates

(This is Part 13 of a series of posts on pLisp)

Quite a few of the posts in this series are woefully out of sync with the current code base, so I thought I'd do a post on the most significant developments.

Memory System

It turns out that we can have our cake and eat it too: native pointers/memory via malloc/free that still allow us to tag them with object types. The magic incantation that makes this possible is posix_memalign(), which allows us to create aligned memory chunks; if we request for memory aligned to a specified boundary, we can get pointers with the corresponding lower bits zeroed-out. Voila, we now have a place to tag our object types.

Using native pointers this way means that we don't need to manage the heap ourselves. No more mucking around with free lists, start and end segments, and so on. Life just became a whole lot simpler.

Object System

OBJECT_PTRs are still unsigned ints, but they're no longer indexes into the heap that are decorated with tag values; they are uintptr_t values (derived from posix_memalign(); see above) to which the tag has been appended. The same logic for marshalling and unmarshalling objects still applies, of course, the only catch being that the number we get after lopping off the tag bits is not an index into a heap managed by ourselves, but is a native pointer.

The other change in the object system has to do with integers/floats and with continuation objects. Integers and floats are also created using posix_memalign() and the corresponding tag appended to the pointers; quite a bit inefficient, since new space will be created for each occurrence of the number, but this doesn't seem to affect performance that much. Coming to continuation objects, we create a one-word space and plonk the relevant stack object there. The earlier hackery is thus mercifully done away with.

Garbage Collection

The main change here is that garbage collection no longer happens only after each top-level expression is evaluated. While this approach is simpler, it breaks down when a large computation is in progress and there is no memory available because GC hasn't happened yet. The solution is to trigger GC after every, say, 1000 instructions of the virtual machine. The trick is to protect the VM's ISA elements (reg_accumulator, reg_current_env, and their brethren) from being garbage collected -- this is done by keeping them pinned to the grey set.

Serialization

Since we're not managing the heap ourselves, using TPL for serialization is no longer feasible, since we don't have a single pointer to serialize. We therefore serialize and deserialize the relevant OBJECT_PTRs ourselves. The format chosen is JSON, and we use the cJSON library for this (Update: have replaced this with a homegrown JSON parser, on account of the need to handle large JSON arrays in a performant way). One positive impact of this change is that the size of the image file is now practically tiny (about 600K), when compared to earlier, when the entire heap -- whether used fully or not -- was dumped to hard disk. Oh, we also now 'serialize' the loaded foreign libraries. Update: serialization and deserialization is now possible at the level of individual objects too, through the SAVE-OBJECT and LOAD-OBJECT special forms.

The UI

The UI is a bit more user-friendly now; there is parens matching, some rudimentary indentation (nothing fancy, but better than nothing), and using the workspace efficiently got a bit easier. Not sure whether I mentioned this in one of the earlier posts, but there is now a debugger window that helpfully shows the in-progress frames and their respective local variables. We're are not in Smalltalkville yet, but getting there.

P.S. There is now also a special form called 'profile' which takes as argument an expression and prints the profiling information: for each sub-expression called by this expression, the number of times the sub-expression was called, how much wall- and CPU time the sub-expression took, and the number of words allocated during the evaluation of the sub-expression. Screenshots below.



Tuesday, December 10, 2013

December 9, 2013

Ivory tower jargon (actual quote from a scientific article):
In the everyday exercise of controlling their locomotion, humans rely on their optic flow of the perceived environment to achieve collision-free navigation
Translation for us mere mortals:
While walking, people use their eyes to avoid bumping into things.
Gah.