2015-03-01

Creating a programming language is apparently all the rage these days, and it’s got me thinking about what I would really like to see in one. I’m starting to suspect the things I want are either impossible or mutually incompatible, so I’d better write them down and let smarter people tell me why I can’t have everything and also a pony.

I’m strongly influenced by my love of Python, my aversion to C and C++, my fascination with Rust, and the bits of Haskell I understand. I very recently read an overview of Nim, which is part of what got my juices flowing. Also I have a lot of fond memories of what Perl 6 could have been, so, fair warning.

This is a brain dump, not a linear narrative, so some of this might be mutually referential or causally reversed or even complete nonsense. Please pardon the dust.

Core goals

Safety. The wrong thing should always be harder, or even impossible. I don’t think this will be low-enough level that you’ll be allocating your own memory or otherwise susceptible to memory errors, but the semipredicate problem is a good example of an API that makes it easier to do the wrong thing than the right thing. Languages should help me avoid mistakes.

Familiarity. Learning the language should not require a complete upheaval of your mental model of the universe. (Sorry, Haskell.) Average code should be moderately comprehensible to anyone with any programming experience at all, and ideally to a lot of people with none at all. And in a different sense, programs by different authors should look at least moderately similar, so you don’t feel you have to learn the author’s style just to make sense of their code. (Compare: Perl.)

Convenience. We like to say “expressiveness” or “elegance” or “ease of use” but honestly I just want to splort out some code with as few arbitrary hurdles as possible. Note that this is somewhat at odds with familiarity, since extreme brevity makes code more obtuse.

Rigor. I really don’t like C-derived static typing, but I love being able to statically reason about code without having to run it. That’s what humans do all the time, but we have relatively primitive automated tools for doing the same. So I would like to have lots of that, without getting stuck in the existing box of “let’s static types that look a lot like java, well that’s the best that we can do, I guess we’re done here”.

Locality. A stronger and better-defined constraint than “readability”. I want to maximize how much I can know about runtime context for a given line of code, while minimizing how much stuff I have to read (or, worse, hunt down). Brevity is part of this: an idea that seems simple to humans should ideally take less code to express. Namespacing is another part: if I see a function name I don’t recognize, I want to know where it came from as easily as possible.

Dynamism. It’s useful and I like it. When untamed, it’s at odds with locality, which is why this is listed lower.

Portability. I don’t care so much about running on VMS. I do care about being easily shipped as a single unit to Windows machines; being immune to library upgrades elsewhere on the system; being able to run on mobile platforms and JavaScript VMs; being embeddable as a scripting language; and compiling to the Z-Machine. I don’t want portability to be the lone reason that entire huge problem spaces are rendered completely inaccessible.

Speed. This is last, but I would still like to be fast enough, whatever that means.

(I’m sure someone will at this point object that the language is not the implementation. Except it totally is, and besides, neither one exists yet. Shush.)

Gripes with existing languages

A slight diversion. It’s helpful to remember what I’m trying to fix, or what problems I’m trying to avoid.

General

In C-family languages, type names are a completely different “kind of thing” that exist outside the language proper, vanish at runtime, and have different syntax from actual code that happens to reuse most of the same symbols for slightly different purposes.

In C++-family languages, generic types are an exponential nightmare of madness to express. One of the nice things about Python (or even Haskell!) is that “a list” is always the same kind of thing, whereas C++ or Rust need to specialize because the underlying storage might look different.

Interfaces and multiple inheritance are usually implemented in a way that means all methods share a single global namespace. (Counterexamples: Rust traits, Haskell typeclasses)

Unicode is hard. Time is hard. Serialization is hard. Everyone is bad at all of them.

Python

The descriptor protocol is neat, but I very frequently find myself wanting to know the name I’m “about to be assigned to”. For function decorators you already have this in the form of __name__, but for anything else you don’t.

Magic methods work differently from other methods, in that they only work when assigned to the class and not when assigned to an instance. It turns out there’s not actually a good reason for this.

There’s lots of nitpicking that could be done regarding Python… but a lot of it either results from the rules of its dynamic runtime, or would be something to obviously not replicate. So, sorry, this will not be the Python gripefest people sometimes ask me for.

Nim

Namespacing is very sloppy. Importing a module dumps the entirety of its contents into your namespace. Method calls are just syntactic sugar: a.b() is exactly the same as b(a), so methods are also in the globalish namespace. Seems to rely extremely heavily on overloading.

As an example I saw some demo code that did this:

What the hell is % doing here? Is that built-in syntax or what?

No, it turns out that’s a custom operator from the standard json module, which polymorphically converts various values into JSON nodes. That’s cute, I guess, but if the variable here hadn’t been named “json” then I would’ve had absolutely no idea where to even look. The short file this came from has eight imports at the top; I would’ve had to check all of them, after figuring out that %{ isn’t language syntax.

Also the “pragma” syntax is {.foo.}, which looks really bizarre. Currently a whole lot of features are implemented with pragmas so this seems to show up a lot, and often in the middle of a definition.

It looks like creating new structures requires doing e.g. newSeq[type](size) or initSet[type](), which seems cumbersome. I assume this is because types can’t have methods, and even if they did have methods they’d live in the global namespace anyway. So the only kinds of construction you get are literal struct building, or writing a creatively-named global constructor. And note that these are both in the standard library; I don’t know why one is “new” and the other is “init”.

Rust

Lifetimes are fantastically useful but suddenly become very awkward when dealing with recursive data structures.

It turns out I don’t actually really care about pointers versus not so anything in the language that tries to make me care (like, you can’t have a trait value, only a pointer to one!) is mostly just annoying.

Discovering you want to add a mutable element (like a filehandle) to a structure means you have to slap muts all over the place, including for any transitive container, which by the way will probably cause all kinds of borrow errors. RefCell to the rescue, at least.

I seem to have a knack for trying to write things in Rust that rely on intertangled references and mutability: first a game, then a UI library…

Syntax

I don’t really know what it’ll look like, but I need to get this out of the way so I can write example code. I’m pretty fond of Python’s syntax for being relatively low on noise. (See: locality.) I’ll be writing examples in something vaguely Python-like, but don’t take that to mean I’m 100% set on anything in particular just because I wrote it down here.

That said, there are a couple fundamentals I’m pretty attached to:

Indentation for blocks

If your language has braces, then you are indenting for the sake of humans (because humans are good at noticing alignment and edges), and bracing for the sake of computers. That’s double the effort for something as mundane as where a block ends. If they get out of sync, then your code naturally breaks in a way that’s very difficult for you to detect.

Also, braces mean that you waste an entire line at the end of every block just for a single closing brace. If you use Allman style, you also spend a line at the start of every block. That means a lot of vertical space lost, which means less code on my screen at once. See: locality.

This also eliminates all possible arguments about brace style, removes the need for braceless blocks, and frees up a valuable pair of bracketing characters for other potential uses.

Tabs are a syntax error

If blocks are delimited by invisible characters then they should really only be delimited by one kind of invisible character.

I have a few other vague ideas, but they’re really more about features than syntax, so they’ll come up below.

Compilation and speed

There is a thing that bothers me sometimes about hot Python code. I’m sure it bothers everyone eventually.

Some 99% of Python code I write could be ported directly to C, line by line, and compiled. It’s just math and function calls on known concrete types. Sprinkle some type names around, pretend there’s a stdlib worth mentioning, throw it at GCC, and bam! My program is a hundred times faster.

But I use Python for that other 1% of shenanigans, which let me define restricted DSLs, which let me operate on generic data easily, which let me patch third-party libraries, which let me control the environment when I’m running tests.

I value those things, certainly. But it still bothers me, when it happens to matter, that my options are two extremes: to use a very stodgy (but fast) language or to use a very slow (but expressive) language.

The current approach is to throw JITs at existing languages, and that is a really cool area of research that I don’t understand at all. Here, though, I have a different opportunity: I’m designing a language from scratch, and I have an interest in making it amenable to static analysis. Surely I can put that effort towards analysis that’s helpful for a compiler.

Some of this will require knowing enough to unbox values and skip dynamic dispatch. But I’m also interested in recognizing common patterns and skipping the usual overhead of language features, when possible. More wild speculation on this later.

I don’t expect to be appropriate for kernels, or AAA game engines, or any other cases where people deeply care about things like whether there’s a vtable. I would like concrete operations on concrete types to have minimal overhead. Essentially, the more the dev is willing to tell us about their code, the more we should be able to reward them by speeding it up.

On the other hand, if I ship something you could call a compiler, I also want to ship a more traditional interpreter. Compilers are things that throw away as much information about your program as possible. They are, by their very nature, actively hostile towards development. It’d be really nice if you didn’t need to actually compile until you were ready to deploy.

Type system

This is so complex that I actually have to use another level of subheadings. I apologize in advance for how incoherent this might be.

Inference

Type inference is great. It is not great enough.

Rust is the static language I’ve most recently bumbled across, and it has left a couple of distinct impressions on me.

First: I have to explicitly declare types for every argument to every function. But rather a lot of the time, I don’t actually care about the types of some of the arguments — all I do is pass them along to another function.

In a language like Python this is a far bigger problem, since wrapper functions and delegation and other forms of proxying are extremely common. What on Earth is the type of *args?

My knee-jerk reaction here is to say that argument types are also inferred, but that leaves us with very few actual concrete type annotations. Which sounds great, but then, where do the types actually come from?

I’m hesitant to follow this train of thought, because it seems like unexplored territory, and surely there’s a good reason for that. But here I go anyway.

Consider this code:

What are the types of those arguments?

If you know Python, you probably know the answer already: a is a string, b is a string (or a tuple of strings), and c is a list. Also, c is probably a list of strings.

No types appear anywhere in that code, but any Python dev knows exactly what can be passed to it. We know .startswith is the name of a string method, we know what it operates on and we know that .append is likewise the name of a list method.

The function might have been intended for methods of those names on other types, true. But it doesn’t really matter, because we can just get a little more vague and say definitively that this code will fail if you pass an a that doesn’t have a startswith method or a c that doesn’t have an append method.

That’s already a fairly decent assertion that will weed out most glaring type errors. Numbers obviously won’t work. None won’t work. Files, bools, and other built-in types won’t work. There’s nothing else in the language or standard library that has either of those method names.

We know all of this without running the code. There’s no reason we couldn’t check it statically. We’d be fooled by anything with a __getattr__, sure, but the vast majority of types don’t support that. And this is just stock Python, not even anything proposed for Sylph. If you actually did provide some type annotations, we’d know much more.

It seems I’m proposing something along the lines of statically inferred duck typing.

The big problem here is what happens if you make a typo, or call a method that you didn’t mean to be possible. If you let the compiler infer the argument types, it can’t tell you when you make a mistake.

On the other hand, it could still complain somewhere , as long as you actually call the function. It would notice that list doesn’t actually have a method called apend, or whatever.

Rather than explicitly annotate every argument throughout your entire codebase, you’d have an engine that would default to telling you about conflicts between values and how you treat them. I think I could get behind that.

Signatures

But getting back to proxy functions. *args, **kwargs is a fabulous thing, with a couple downsides.

It’s two things, not one thing.

Because it’s not one thing, you can’t write a literal of it.

These are not massive inconveniences or anything, but if we’re going down the road of static analysis, it would sure be nice to fix them. So let’s say there’s a “signature” type representing all the things you can stuff into a function.

The signature type is really a family of generic types — some functions might accept any number of integers, some functions might accept exactly one string. If you do something like this:

Then you know the signature type of wrapper is the same as the signature type of f, but with the type of extra in front. You can statically enforce that wrapper only receives arguments that f will understand.

I guess that’s not very mindblowing. But when I first had this idea, I assumed that the signature type of a function would be available at runtime, as a property of the function. I only now realize that this means:

Types as values

In Python, list is both a type and a value. You can instantiate it to create a new list object, and the type of that object will be list. You can also inspect list, access properties on it, put it in a dict somewhere, and whatever else you might want to do with a value.

This isn’t too revolutionary. The same idea exists in Perl (and Ruby)… sort of.

But it doesn’t exist in, say, C++. Types (and namespaces and, to some extent, functions) are thrown away (yes, rtti, vtables, shut up) at compile time. They exist as program structure, as scaffolding, as the stage on which your code will play someday in the future — they are not part of the orchestra.

Not only that, but types use a completely different syntax than any of your “real” code. They reuse some of the same symbols to mean vaguely similar things, but char * has very little to do with *foo. (This is a big part of why C++ is a nightmare to parse: type descriptions and value expressions can appear in many of the same places syntactically, but mean radically different things. Nothing in the grammar really distinguishes char * text = "hi" from x * y = z. (Oh hey I bet that’s why C originally required struct in front of all user-defined type names huh.))

What the hell was I even talking about here. Oh, right.

Blah blah this is all really heading towards: the syntax for generics in C-family languages is fucking terrible.

I am sorry to be beating up on Rust here, I love you Rust, you are a very good try, but you’re the static language I’ve used the most recently so you get the brunt of this.

CONSIDER this Rust code that a reasonable human being might try to write:

If you try to compile that, you’ll get an error like this:

What you actually have to write is something like this:

Now it typechecks correctly.

This seems a little ridiculous. The compiler already knew that T had to be a type that supports addition — it just told me that. So why am I spelling it out?

I got access to T::Output this way, but that’s still something the compiler knew. The only way to support addition is to implement the stdlib Add trait, and the only possible result type is whatever the implementation says it is.

The real answer is that Rust requires full types written out for all arguments and return values. And this isn’t really a huge deal. You’re right. I know.

If you check out some Real Live Actual Rust Code from the standard library, for the implementation of a standard hash map:

Now I’m getting a little sadder. This code is 90% types.

It used to be worse: the keys of a hashmap were required to implement four traits (I think?), and I went off to write a generic trie that was backed by hashmaps, and there were just angle brackets out the wazoo.

This still isn’t a huge deal, I know. But it gets me thinking.

One thing I think about is how in Python, everything is already generic. I can write a function that operates on a list without doing anyting in particular to its elements, and it will just work, on any list. Everything is generic already. It doesn’t even have to be a list; it can be any sequence.

(Speaking of, that’s one of the downsides to static annotations, especially in a dynamic language: people fuck them up. Way too many times I’ve seen someone ask about how to do type checking in Python so they can enforce that someone passes in a list, even though there’s no reason the argument couldn’t be a tuple or dict or any other kind of iterable thing.)

This is kind of meandering a lot oops. The ultimate idea was that there should be regular expression syntax for composing types, and those resulting types should be runtime values. So if you want a list of strings you can say:

Or whatever. (I’d rather not use angle brackets actually but let’s pretend for now.) And that will be a value, a type. You can put it in a variable and use it anywhere you could put a type, or a value. Like typedefs on steroids.

I suppose a plain list would be List<Value>, then, and there would be a handful of slotted types in there that you would be replacing when you applied angle brackets. Which is really a lot like just having default arguments to functions. Hm hm hm.

One catch here is that for actually generic code, you end up with expressions like List<T>, where T is meant as generic and thus is not actually a known identifier. I suppose declaring these types is exactly what the angle-bracket annotation does in C++ and Rust! I’ll need something a little more clever to recognize when this is happening and do something useful and appropriate.

Shape types

Something I’m dimly aware Closure Compiler (for annotating JavaScript) has: types based on the contents of dicts. So you can declare types like “a dict that has a foo key”, or “a dict that has an x key with a numeric value and a y key with a numeric value and no other keys”.

JavaScript objects literally are dicts, so you don’t have much choice here. But this seems like a nice thing to have in general. Plenty of dicts we use are not truly arbitrary — consider dicts of HTTP headers, where we reasonably expect some set of fundamental headers to exist and can’t usefully do anything when they’re missing.

It’s also common enough to start out using a dict for some common bag of data and only realize much later that you really should’ve made it a class. Being able to slap a type on there would at least document what you intend to have, and give you an inkling of an upgrade path.

This knits well with signature types, too — you can see how the type of ..., foo=4, **kwargs might involve saying foo must be a number but other keys are allowed as well.

I haven’t thought too much about this. Just throwing it out there.

Class definitions

I don’t want classes.

Wait, wait, no, come back. I don’t want a thing I call a class. I don’t want a class keyword. I think it has way too much baggage. People expect Java conventions and complain when they’re missing. “Why is there no private? Encapsulation!!” People (myself included) feel uncomfortable when the class keyword is used for things that are not actually classes.

Let’s solve this problem and just not call them classes. Call them types, because that’s what they are.

Let me abruptly jump rails in a way I promise will make sense in a moment. I love metaclasses. The metaclass protocol in Python is super nifty and, in Python 3 especially, you can do some fascinating things with it. The stdlib enum module is implemented with metaclasses:

Magic! I love magic. I love constrained magic, anyway. More on magic later.

I’m writing a roguelike with an experimental entity-component system, and one of the things I do is define classes that implement exactly one of my interfaces:

I also do wacky things so that calling GenericAI() doesn’t actually create an object, but is used as a special initializer thing when defining entity types.. That’s neat. I can do neat things that use normal Python syntax but co-opt the semantics.

Something bothers me a little here though. When I do class Color(Enum):, the thing I’m making is not actually a class, but an enum. The superclass is called “Enum”, yes, but that’s really just there to attach the metaclass, because the metaclass syntax is a little foreign and clumsy and we want to insulate people from it.

It’s also a little silly that the keyword is class, but the thing I’m making is not actually a “class” (there is no value in Python called “class”, because it’s a keyword!) — it’s an instance of type, the base metaclass.

So what if we got rid of the class keyword entirely… and you just used the metaclass?

This would be pretty gnarly to parse so it probably wants to have a keyword in front:

That’s kind of wordy and even a bit C-like. Function and class statements in Python are really just assignment, so maybe we want to shuffle this around a bit:

Now this is getting kind of interesting. Just by changing the syntax, it’s obvious that metaclasses can be used for any kind of declaration where you want to receive a scope as an argument. Consider how we might use this to replace @property:

Or even make anonymous objects:

You may notice I keep calling the base type Value instead of Object, which is in line with avoiding the name “class”.

Downside: types would no longer actually know their own names, unless the entire x = new y: syntax were parsed as a single unit and “x” were told to y somehow. That seems like a hack. On the other hand, with a little thought, maybe it could solve the problem where descriptors in general don’t know their own names.

But there are some cool upsides. For example, this solves a whole lot of the anonymous function problems in Python, and removes a lot of the need for subclassing as an API. Say you have a UI library that wants to register event handlers. Instead of subclassing and defining on_foo methods, you could just do this:

Done and done. And without a class keyword glaring at you, you don’t have to feel dirty about doing it!

You could also define C types like this:

Wow! You could even define methods that just defer to C functions taking that struct type as their first argument, and have a little FFI with minimal cruft or effort.

The more I think about this the more I like it. It’s kinda like exposing prototypical inheritance (which Python has!) in a more deliberate way? Depends on the actual semantics of new when used with a type that’s not a metatype, I suppose.

Classes

Right, right, that was about classes.

The smart people around me seem generally agreed that inheritance is often not a good way to solve problems. It’s brittle in the face of upstream API changes, it tends to lead to having a god object somewhere up the hierarchy, and it gets really really hairy when you need to mix multiple behaviors together.

But it’s sooo convenient.

What I would love to do is figure out the major problems we use inheritance to solve, explore alternative solutions to those problems, and then make them easier than inheritance.

The most obvious alternative is proxying/delegation: wrap up an existing object, add your extensions, and transparently proxy anything else to the original object with __getattr__. Honestly this could probably replace most uses of inheritance, with the minor downsides that it’s cumbersome and it doesn’t work on magic methods and it adds overhead. But hey, this is a new language with a magical compiler that will fix all of that, right?

The other downside of proxying is that you can’t actually interfere with the inner workings of the original type. If the problem is that some of the type’s internals are actually wrong (or otherwise unsuitable to your purposes), you don’t have much choice but to inherit. (Or do you…? I feel there should be something else here.)

An extension of the same idea is composition, where multiple disparate components are stitched together into a whole. In my roguelike, there are multiple behavioral roles: “acts like a container”, “can perform actions”, etc. An entity can have one implementation of whatever set of roles it supports, and it has no other state — everything lives in the implementations. It makes the code a lot easier to reuse and provides plenty of namespacing, but it took a lot of effort to get going. This sort of approach, of populating “slots” in a composed object, could really stand to have some language support.

I recall reading a blog post recently that contained the following alarming snippet:

Ha ha hold on. So you have a type called Person which presumably represents a person. You have a method on it that sends email, meaning your type now depends on an entire email and networking subsystem somewhere. And to make that method work you mutate your person to tack that email subsystem on before you call the method.

Jesus christ.

But this is kind of an awkward problem, come to think of it. You could stick the method on the email subsystem, instead, but it has no particular reason to know anything about a Person or what Person thinks should be in an email. Also you probably want other types to send email differently, right?

So where do those functions go, if not on those types? Should you really have email-sending code (which, presumably, depends on template rendering and god knows what else) alongside your otherwise simple Person definition? What if that same code is loaded in a codebase that doesn’t actually have an email subsystem, and now your static types don’t exist? If you put it somewhere else, how do you ensure that it gets loaded?

Somewhere a LISP weenie is now smirking and saying something about multiple dispatch. Well, okay, sure, but you still have the same problem: where does the implementation actually live?

This is actually a problem I’ve run into a bit with Inform, the interactive fiction language. Text adventure games tend to have a lot of interactions in them, which frequently produces questions like: if rubbing the lamp with the cloth summons a genie, where does that code go? Is it a property of the lamp? Of the cloth? Of the very act of rubbing?

Traits

The problem above (though not the question of where the code lives) would be solved in Rust with a trait. Traits are like interfaces, but less terrible. They require that an implementor define some set of methods, and may have “default” methods as well that can be overridden (or not). Each trait your type implements goes in a separate implementation block. The Rust By Example book has a good, erm, example.

Each trait gets its own block, and any methods specific to the type get their own block as well.

This is very different from Java-like languages, with one massive advantage: method names are actually namespaced.

Compare to this Java strawman:

Pop quiz: which of those methods are part of the Animal interface?

Erm, oops. You could make some vague guesses, but you can’t actually know without going and looking at the Animal source code. (See: locality.)

And yet, it gets worse! What if you want to implement multiple interfaces, but two of them require methods with the same name? What if you have an existing class, and you want to add support for an interface to it, but you already have a method with the same name as one in the interface?

Effectively, all interface method names are global. They share a single giant namespace, just like C function names. This is the problem I have with Go’s implicit interfaces, too: you might happen to implement an interface on accident, just because you have methods of the right names. Does implementing length mean you’re a container type, or does it mean you’re modeling snakes?

Rust treats trait method names as belonging to the trait, eliminating this problem. I dig it.

Except…

In Rust, you can just call dog.wag_tail(). Rust knows, statically, exactly what traits a given type has implemented. So it can tell that Dog only has a single method anywhere that’s called wag_tail, and calls that one. If it’s not obvious to the programmer where the method is coming from, the language is still static, so tooling could figure it out.

In practice method collisions tend to be uncommon, so trait methods are fully scoped, but end up just as convenient as a flat namespace.

Sylph is not (fully) statically typed. In the general (unannotated) case, the Rust approach won’t work.

I’m not quite sure how to fix this. I’ve had a couple ideas swirling around, but I don’t know if they’re good enough.

First, a minor diversion: it’s sometimes asked why len in Python is a function, rather than a method. The answer is that of course it is a method, called __len__. The real answer is that Python pointedly and deliberately does not reserve any method or attribute names (save for __foo__), leaving your classes pristine. I imagine this is, at least in part, to avoid collision problems as described above.

(Strangely, no one ever asks why getattr and setattr are functions rather than methods, even though they work exactly the same way, merely deferring to dunder methods of the same names. Semantically, the actual work of getattr is done by the default implementation everyone inherits, object.__getattr__!)

Here’s how I might fix this minor oddity, and implement len, in Sylph:

This is the most syntax I’ve made up at once and I feel very conspicuous but let’s see what I’ve done here.

First, you can implement a method for a trait (here, Iterable) directly in the class body, by just using the trait method’s fully-qualified name. Now Container implicity does Iterable, like Go, but namespaced. If you implement part of a trait but not all of it, your code won’t compile.

Next, you can import names from traits, exposing the underlying method as a local function. (I have put zero thought into modules or importing yet, so I don’t know what this might actually look like.) If this is how it goes in practice, most likely this particular import would be in the prelude anyway.

Then you create a new Container value. At this point you could call len(c) to get its length, but we’re trying to avoid that. So instead you use lexical call syntax, which is merely sugar for doing exactly that. foo:bar(x, y, z) is exactly the same as bar(foo, x, y, z). When you use the colon, the method name is not looked up on the object — it’s taken from local scope.

Does this make everyone happy? Is it even useful? I don’t know. Seems interesting though.

Classes as traits

I’ve heard some advice from Java land: don’t use classes as the types of arguments. Instead, create an interface matching what your class does, and use the interface as the type. Then you can provide an alternative implementation without changing your API.

Well, screw that. How about this: anywhere you could name a trait, you can name a class instead.

If you write a type and promise to implement some existing class A as a trait, then:

You must implement all of A‘s methods.

You do not inherit any of A‘s method implementations — no defaults.

You must implement all the traits A implements.

Now you can fake out absolutely any other type, no matter how statically annotated the code is. (Though you’d have to provide an alternative that meets whatever annotated requirements are on A.)

Downside: any code assuming it’s going to receive an A might actually receive something else, so there’d still need to be at least one level of indirection on any attribute calls, and you’re not ever gonna get C++-level method dispatch speed. Maybe that’s okay. Or maybe the compiler can notice when there aren’t actually any fakes for A and skip the indirection, or maybe it can fall back to something interpreter-like when it gets something that’s not actually an A?

Another interesting question: how do you fake out built-in scalar types? In Python, for example, there’s nothing you can do to pass a “fake” string to .startswith(...), because there’s no way to emulate a string. You can subclass the string types, but all the built-in operations look at the underlying string value, so they just will not work on anything that isn’t a string.

I suppose when even Python doesn’t let you get away with patching something, I shouldn’t be trying to go out of my way to allow it.

Value types, mutability, type representations

Change is hard. I don’t know why we do it so often.

Probably types should be immutable by default (which is why I called the root type Value above). This produces two immediate obvious problems in my mind:

It’s a little weird if you have a custom constructor. Your type would look mutable in __init__, but nowhere else.

Sometimes someone else’s code might produce immutable values that, for whatever reason, I direly need to hit with a hammer.

I don’t know offhand how to solve these. Maybe they don’t need solving. Python already has namedtuple, after all, and I can’t recall direly needing to mutate those. But if everything were immutable by default… hmm.

(Note also that “mutable” is, itself, a slightly fuzzy concept. A type may be immutable in practice, but want to have indexes or caches internal to itself that need writing to. C++ has a mutable keyword as an escape hatch for this, and Rust likewise has the RefCell type.)

I suppose mutable types would want to inherit from something called Mutable, then?

I’m not sure these questions even quite make sense without knowing what type definitions look like. Are there explicitly listed attributes? I kinda want to say yes. Or, rather, I might have to say yes. If static annotations are to exist at all, you have to have somewhere to list the attributes a type has, so you can say what their types are supposed to be.

If you had that, you could avoid the dict for every object, too.

Speaking of.

Something that’s actually very interesting about Perl 5 is the way objects work. All they are is a namespace of functions tied to a reference to some data. Usually the data is a hash, so you can store names and values, but it doesn’t have to be. Nothing is stopping you from using an array as the underlying data store. The URI module actually uses a single string (the URI itself) as its data, so there’s no extra storage required at all!

Very little code ever took advantage of this quirk, but it’s a fascinating feature, and it vaguely reminds me of Python’s __slots__, which turns attribute storage into (roughly) a tuple.

I don’t know where I’m going with this. I like the idea of detaching behavior from the shape of the underlying state. (You can do that in Rust, too! Traits can be attached to integers and pointers and all kinds of things.)

Behavior detached from shape. Hm… that makes me think of…

Extending behavior

New languages often love to show off that they have all kinds of neat methods on core types, like 3.times or 10.spawn-threads which I swear I saw in the Perl 6 docs somewhere.

Those are great and all. The downside is that they put a bigger burden on the core implementation, and sometimes very convenient methods might (ahem) have dependencies you wouldn’t expect from the simplest types in the language.

So it would be pretty slick if you could extend types in a controlled way, lexically. Ruby has a thing for this, called “refinements” — it’s basically monkeypatching, except the changes aren’t transitive across calls. If you patch a type within your function or module, you can call whatever of the new methods you want, but if you call into other code, they only see the original type.

But if we’re gonna be all about compositional types, maybe we could just use one of those instead. Define whatever extra (or replaced) behavior you want in a proxy type, and (handwave, handwave) automatically wrap values of the underlying type in the proxy.

This is particularly suitable because you can’t usefully override the internals of a type with refinements anyway — as soon as you call any original method on the type, your refinements vanish. Wrapping is much closer to what you’re doing than monkeypatching.

If this were made a natural enough part of the language, it might even be possible to allow attaching new “outside” state to other, immutable objects.

Consider decorators, which often want to attach some sort of extra information onto a function. In Python, you’d just stick an attribute on the function… and hope that no other code wants to use the same attribute name, of course.

Imagine if you could use a proxy type instead. I’m pulling this syntax out of my ass:

I don’t really know how you’d apply that, or what the semantics of preserving the state would be (obviously you wouldn’t want to completely lose the label as soon as it fell out of scope), or really much of any of the important details.

But this seems like a much more structured way to keep the convenience of Python’s “you can assign to any attribute” in a more static way. And it feels, at least, like it would knit well with the idea of first-class support for componentized types — what I’ve defined above is effectively a component of a function, just one that I want to attach from the “outside”.

You could instead use the :foo syntax with an imported regular function and function overloading. But, well, I just don’t like function overloading. It’s nice for some cases, but all the interesting problems I think of involve having foreign types register their own new behavior, and that’s kind of ugly with function overloading — you’re injecting a new function into another module. It makes me frown. I am frowning right now.

Anyway, another example. Think of, say, SQLAlchemy in Python, where you can have a “metadata” object describing the schema of your database. All that stuff is fixed at compile (import) time. But the most convenient way to actually do anything with the metadata at runtime is to assign a database connection to a property of it. What if you could, with minimal effort, just define a wrapper that attached the database connection to the existing behavior?

I guess I’m kinda describing dependency injection now, but I would really like to be able to handwave it away with some language facilities.

This seems possibly related to the lexically-scoped method call operator, foo:len.

State

Consider, if you will, a file object.

Files have a clear set of states. They can, at the very least, be open or closed. In Python, files start out open, and can transition to closed by calling the .close() method. A closed file cannot be reopened.

Virtually every method of a file makes sense when the file is open, but not when the file is closed.

This isn’t terribly uncommon to see in mutable types. In more complex cases, you might even have initialization that takes multiple steps, during which calling methods is (or should be) illegal. Or you might have a type that can take one of two similar forms, and some methods may only make sense on one form or the other.

It would be super duper if we could make static assertions about this, right? My class has possible states X Y Z, these methods require state X, this method transitions it from state Y to Z.

This is already a thing, called typestate, but it doesn’t exist in very many languages at all (which perhaps is a bad sign). Do I dare dream of trying it out? Could I just emulate it with composition somehow?

Variant types

Well. Obviously.

Open ones? Not sure.

Actually… this reminds me of a curiosity I noticed. Say you have some family of related types, and you want to perform an operation on them that’s similar, but slightly different.

If you implement this operation as a method, you can factor out the similar bit as a separate method:

Now if anyone else wants to make a new subtype, they can just implement _different_part.

But if you implement the operation as a function…

def do_work

Show more