Swaroopch.com

BGs Clojure Course Day 1

2013-08-23

Continuing my journey of learning Clojure, I am attending BG's Clojure Course. Today was Day 1. ~30 people in one room and BG as teacher for the weekend. Of course, I was looking forward to it. Most people in the audience had a background of mixture of C/C++, Java/Android, Python and Ruby.

BG Clojure Workshop

Below are my rough notes from the day:

Day 1

BG introduced Clojure as "invented in 2007 by a wicked-cool, guitar-playing, computer scientist cum Zen master", referring to Rich Hickey.

Emphasized that it is an abstraction-oriented language.

First known production usage of Clojure was in 2009 for a message bus infrastructure for a veterinary hospital in Canada. In 2010, BG's company launched paisa.com in Clojure. And BG's company's new product HelpShift is also built on Clojure platform.

Three tenets of Clojure:

Simplicity

Simplicity is objective, easy is subjective

Frameworks (forced structure) vs. Libraries (opt-in functionality)

Power

Practical

Leverage - designed to be a hosted language - using the underlying platform doesn't require a wrapper or different API

Platforms today are JVM, V8, CLR, Python, LLVM, etc.

Focus

Focus on my problem (readability of code), not the language

A language as powerful as Lisp is timeless. Lisp is based on Lambda Calculus, the ultimate abstraction (BG made me realize why the name of the website lambda-the-ultimate.org is called so).

Great example of indexOfAny from Apache Commons Lang to explain the concept of complecting, and how you would do it in Clojure which was a simple few lines and which was more generic and useful.

Simple introduction to Lisp syntax.

REPL is a puzzle / surprise for the Java guys in the audience.

REPL is to language what terminal is to your operating system.

Namespaces are reified vs. packages, so namespaces exist at runtime whereas in Java, packages are only used for loading classes, you cannot modify packages at runtime, so in Clojure, you can remove a symbol from a namespace at runtime (using ns-unmap), you cannot do that in Java.

Introduction to reader - converts literal forms to core Lisp syntax to hand over to compiler.

Introduction to symbols, keywords and namespaces.

Namespaces are a map data structure underneath.

Core data structures / "collections" are list, vector, map, set. All are immutable.

type

properties

list

singly linked,

insert at front

vector

indexed,

insert at rear

map

key/value

set

key

Maps are one of the most-important data structures because it is pervasive.

Collections are persistent, immutable, abstraction-oriented.

Persistent means we get immutability with performance. It's not about Hibernate / saving to database. Phil Bagwell wrote the paper on it called HAMT (Hash Array Map Trie) but could not implement it with performance because he always used branching factor of 2 and trees became very deep. Rich Hickey used bit-partitioned tries of branching factor 32 which was a great optimization on HAMT to get the performance.

Why 32, not 64? Because in JVM, 32 is the "cache line", so 32 means the entire thing is loaded at once.

Introduction to conj, the magnificent.

Introduction to apply function - the difference between vec and vector.

into is used to convert sequences back into collections, the opposite of seq.

A sequence is guaranteed to have a first and a rest, there is no empty sequence, which is why (seq []) will return nil.

Learned about update-in which is related to assoc-in and realized I was abusing the latter to do the former in my own code.

Introduction to some and every? and ? as a convention for predicates ("test" functions that return boolean).

Prefer the fn form always over the #(+ %1 %2) lambda form. For example, #([%1, %2]) will fail because it is expecting a function in the first position, and you should use either #(vector %1 %2) or use (fn [x y] [x y]) (in this case though, simply use vec?).

Got people to write and think functionally which was challenging and exciting because it made people think, for example, find the 100th fibonacci number using iterate.

My naive solution was this:
[clojure]
(defn fib
[[x y]]
[y (+' x y)])

(defn nth-fib
[num]
(first (first (take 1 (drop (- num 1) (iterate fib [0 1]))))))

(nth-fib 100)
#= 218922995834555169026N
[/clojure]

which was quite close to the example code that BG showed, after getting people to rack their brains, so that was nice. One improvement was he used nth instead of my combination of take and drop.

I got reminded of the Tutorial on Good Lisp Programming Style where Peter Norvig says that most algorithms are a combination of the following:

Searching

Sorting

Filtering

Mapping

Combining

Counting

Introduction to Java interop. Can't believe how specifically well-designed Clojure is for hosted platform interoperability, esp. the ., (.method Class) and the doto forms.

Introduced more details of functions such as multiple arities and variadic functions.

Most people seemed enthralled by the idea of functional programming and succinct code in Clojure, but are still grappling with it, which was fine because it was just the first day of immersion and a lot of conceptual ground was covered by BG.

Looking forward to Day 2 where BG said we'll look into protocols, macros, etc.

Clojure Lessons Learned So Far

As an aside, I think the lessons I have learned from Clojure in the past year could be summarized as:

Model the data (core data structures - map, vector, set) + lots of functions to manipulate it : Simpler + Less code - compare JSON parsing in Clojure vs. Java (GSON) vs. Hide data with method wrappers inside opaque objects. I learned this first from Perl but this lesson was forgotten after C++ OOP, Python classes, Ruby on Rails, Java OOP, etc. over the years, and now re-learned through Clojure.

Lisp syntax means no difference between built-in vs. user-defined functionality vs. Code becomes ugly if non-built-in function, hence the need for monkey-patching, e.g. things.get(things.size() - 1) in Java == things[-1] in Python == (last things) in Clojure, and in this case last could be either built-in or user-defined

Separation between state, time, value, identity : Core philosophy of Clojure explained in Rich Hickey's talk "Are We There Yet?" : Account for time (and hence avoid concurrency issues) : an identity corresponds to series of immutable values at each point in time, called as state. Observers pick and use a state. => Values and Identities are persistent data structures. Timelines and perception implemented using Agents, STM, MVCC, etc. This point still has to sink in, but something I should remind myself from time to time to fully grok it.

Use lots of generic functions. Code that I used to write as is_ssl = p.startswith("/a") or p.startswith("/b") becomes is_ssl = any(p.startswith("/" + k) for k in ("a", "b")) and the latter code is much easier to modify later, esp. adding new items to that list.

Actually, I think that first point needs more elaboration:

Dear lazytwitter: what is a good name for "data that can be serialized to JSON"? jsonable? We need a name for this!

-- @nedbat

"JSONable": interchange, jsonic, _asdict, serializable, json-serializable, structured, organized, external, plain-old-data. others? votes?

-- @nedbat

From https://news.ycombinator.com/item?id=3917695 :

> I'm not so sure I agree that simple hashes are the best choice for internal data representations

I've been contributing to the Clojure community lately. My experience working with hash-maps as the primary data structure has been entirely liberating.

At my startup, we've got an app with Sinatra services, a Rails API, a Node.js web frontend, and Backbone client code. JSON gets passed between them. Being forced to encode keys as strings is a mild annoyance that Clojure's reader syntax avoids, but the real issue is that I've got raw JSON, Javascript domain objects (Backbone.Model), Ruby models (ActiveRecord), and Ruby hashes (hashie/mash/similar). Each has their own idiosyncrasies and interfaces. Of all of them, the raw JSON is most pleasurable to work with. CoffeeScript & Underscore.js roughly approximate 10% of the awesomeness that is Clojure's core data structures, including maps, sets, vectors, and lazy-seqs.

ActiveRecord, for example, makes it super easy to tangle a bunch of objects up. If we had a big bag of functions, they could operate on in-memory hashes, or they could operate on database rows, or they could operate on the result of an API call. It would be so much simpler to reuse code between our main Rails API and our Sinatra service. And we could one-for-one translate functions for non-Ruby services. Instead of requiring a crazy tangled ness of polymorphism and mutable state.

> Every non-trivial program is going to have to define abstract datatypes

Absolutely true. However, Clojure has taught me that you really aught to only define a very small number of those. It's been said that it's much better to have 100 functions which operate on 1 data structures, than to have 10 functions that operate on 10. Clojure's get-in function for example: (get-in some-hash [:some :key :path]) is glorious compared to Ruby's somehash[:some][:key][:path] because you don't need to go monkey patch in a getpath method. And even if you did monkey patch that in, it won't work for the someobject.some.key.path case, unless you got fancy with object.send and yet another monkey patch.

Look at some of the substantial pieces of Clojure code out there. They may only define a small handful of data structures, but most of those are even defined with defrecord, which produces a hash-like object, which all those 100s of functions work on. The rest are tiny primitives that compose in powerful an interesting ways.

> I'm not sure how embedding and dispatching on a type tag in a hash is any better than using the more explicit support for dynamic dispatch you find in typical OO languages

Because you may want differing dispatch and single-dispatch inheritance doesn't let you change your mind as easily. Those dynamic dispatches in Ruby/Python whatever are simply hash lookups anyway. You'll get the same performance either way. Look at the output of the ClojureScript compiler for example. Most code paths dispatch on :op, but you could just as easily dispatch on some bit of metadata, maybe the [:meta :dynamic] key path to have a function that runs differently on static vars than dynamic ones. People are also working on advanced predicate dispatch systems.

> The real problem with most OO is that it mashes a lot of interdependent, mutable state together.

That's a real problem. But it's not the real one :-)

List, vectors, maps and sets seem sufficient to model most data, why would you want software constructs that are not easily JSON-able or EDN-able? Especially these days when we use polyglot languages and polyglot databases.

Lastly, Greenspun's Tenth Rule Of Programming says:

"Every sufficiently complex application/language/tool will either have to use Lisp or reinvent it the hard way."

Weak things must boast of being new, like so many German philosophies. But strong things can boast of being old.

-- @GKCDaily

Obviously Chesterton was talking about software; scholars are divided as to whether he was talking about lisp or Debian Stable.

-- @technomancy

Twitter Comments

@swaroopch Great post, Swaroop! Thanks for the kind words.

-- @ghoseb

@swaroopch Brilliant post. Thanks. I was trying to decide between Go and Clojure. Chose Go for now. Will learn Clojure some day soon.

-- @P7h

@swaroopch Also, which language [excluding Lisp] do u think is more elegant n wonderful? Python, Scala, Clojure or any other?

-- @P7h

Thanks to @swaroopch's Clojure post, reread Smashing Magazine's interview with Doug Crockford on "How I work". http://t.co/ETDkivAExu 1/2

-- @P7h

Comments

Mohit says:

What are the benefits of using Clojure over other dynamically type languages — Python, Ruby or PHP, in production environments ?

Mayank says:

Nice share :)

swaroop says:

@Mohit I don't think I'm qualified to answer that question, @ghoseb can do a much better job of answering this question, but I'll give it a try - immutability, connecting to production runtime and investigating what is happening via nREPL, functional encouraging good concurrent code (and hence good parallelism), are all advantages in production environments.

Event Report of @GhoseB’s 2-Day Clojure workshop – by @SwaroopCH | punetech.com says:

[…] has written a detailed and insightful event report. Here is his report of day 1 and report of day […]