Lazure2.wordpress.com

Extensions to C# with minimal breaking changes and safe concurrency/parallelism by providing “pockets of imperative muta...

2014-01-03

This exceptionally ambitious and innovative effort (to be open sourced eventually) is lead by Joe Duffy who previously brought to us back in 2008 (with his 20 people strong team that time, as final product since 2010) Parallel Extensions to .NET with TPL (Task Parallel Library) and PLINQ (Parallel LINQ, Parallel Language Integrated Query).

After extensive research and exploratory development work going on for years Joe and his enlarged team is ready to undertake a much bigger, in fact ultimate language challenge described below. While this effort is definitely having much greater implications for Microsoft (especially in long term), but the immediate speculations about that might be exagerations so far.

Joe Duffy in C# for Systems Programming [Dec 27, 2013] (mirrored here differently),
as “an architect and developer on a research operating system at Microsoft”:

In the months to come, I will start sharing more details. My goal is to eventually open source this thing, but before we can do that we need to button up a few aspects of the language and, more importantly, move to the Roslyn code-base so the C# relationship is more elegant. Hopefully in 2014.

Update: The language I describe below is a research effort, nothing more, nothing less. Think of me as an MSR [MS Research] guy publishing a paper, it’s just on my blog instead appearing in PLDI proceedings [the most important, Programming Language Design and Implementation yearly conferences of the ACM SIGPLAN [Special Interest Group on Programming LANguages]]. I’m simply not talented enough to get such papers accepted.

I do expect to write more in the months ahead, but all in the spirit of opening up collaboration with the community, not because of any “deeper meaning” or “clues” that something might be afoot. Too much speculation!

[A kind of “best in class” speculation from Microsoft's Midori: The M# connection [by Mary Jo Foley on ZDNet, Dec 29, 2013]:

… I heard from two of my contacts that Midori — Microsoft’s non-Windows-based operating-system project — moved into the Unified Operating System group under Executive Vice President Terry Myerson. (Before that, it was an incubation project, without a potential commercialization home inside the company.) …
…
Midori: What we’ve gleaned so far

A skunkworks team inside Microsoft has been working on Midori since at least 2008 (which is the first time I blogged about the Midori codename and effort). The Midori team can trace its early roots to “Singularity” the Microsoft Research-developed microkernel-based operating system written as managed code.

Midori originally was championed by Microsoft Chief Technology Officer Eric Rudder. The Midori team consisted of a number of all-star Microsoft veterans (indluding Duffy), plus some additional high-flying developers from outside the company.

Early Midori design documents indicated that the Midori OS would be built with distributed concurrency and cloud computing in mind. Microsoft also supposedly planned to try to provide some kind of compatibility path between Windows and Midori (which isn’t based on Windows at all). The early design documents also indicated that Microsoft Research’s “Bartok” ahead-of-time compiler work influenced the team.

Duffy made a couple of public presentations and published papers in the ensuing years that indicated he and his colleagues were working on some kind of extensions to Microsoft’s C# language. …
…
Many Microsoft watchers, including yours truly, wondered if Midori would ever exit its incubation phase. But with one-time champion Rudder moving in November to a new advanced strategy post — plus the move of the Midori team into Myerson’s OS group — something seems to be afoot.

While Midori was in incubation, the Microsoft Research team working on the “Drawbridge” library OS managed to support Midori as a host implementation, alongside a number of other Microsoft operating-system platforms. (A library OS is a form of virtualization that seeks to replace the need for a virtual machine to run software across disparate platforms.)

One of my contacts said Myerson’s OS group is going to be determining which parts of Midori have a place in Microsoft’s future operating-systems plans. Based on Duffy’s post, it sounds like the M# piece of Midori will evolve throughout 2014, but it’s not clear when and if it ultimately will be open-sourced.]

[<my own inclusions>About: I am an architect and developer on an operating system incubation project at Microsoft.

I lead the team responsible for the developer platform. This includes responsibility for our programming language, core framework, async and parallel models, and overall developer tools and experience. I strive to write lots of code in all of these areas. We are focused on reinventing how large scale systems software is written, with a focus on reliability, security, scalability, and, above all else, correctness-by-construction.

Although I love coding, I also manage a group of very talented architects and developers.

Prior to this project, I worked in the areas of parallel computing, virtual machines, and managed runtimes, and have over 15 years of professional software experience. I’ve been granted 45 patents, with another 33 pending, and thoroughly enjoy writing books and speaking. ]

[According to his July 3, 2004 post he joined the CLR team at the time of the post, with a primary focus on concurrency in .NET and WinFX. At the time of publication of his book “Concurrent Programming on Windows” in November 2008 his author’s page said:
Joe Duffy is a Program Manager on the Common Language Runtime (CLR) Team at Microsoft, where he works on concurrency and parallel programming models. Prior to joining the team, he was an independent consultant, a CTO for a startup ISV, and an Architect and Software Developer at Massachusetts-based EMC Corporation. Joe has worked professionally with native Windows (COM and Win32), Java, and the .NET Framework, and holds research interests in parallel computing, transactions, language design, and virtual machine design and implementation. He lives in Washington with his soon-to-be wife, cat, and two crazy ferrets.
This is not correct as in Final manuscript for Concurrent Programming on Windows has been submitted [June 23, 2008] he describes his Microsoft carrier path as: “At the outset, I was on the CLR Team hacking on software transactional memory and PLINQ as an evening activity. Then I transitioned to doing it full time. Then I joined the Parallel Computing team as the dev for PLINQ. Then I kicked off the whole Parallel Extensions effort (which is 20 members and growing strong), became the lead architect, and here I am today.”

.Net Parallel Extensions [TPL and PLINQ] PM – Ed Essey [AsafShX YouTube channel, April 20, 2008]

Asaf Shelly, a Microsoft MVP interviewing Ed Essey about Parallel Computing. Ed is the Product Manager at Microsoft for the .Net Parallel Extensions group. So he is a member of Joe Duffy’s team, at that time of about 20 people.

Intel Software Conference 10 Steve Teixeria [Bruno Boucard YouTube channel, April 16, 2010]

During the Intel Software Conference 2010 in Barcelona, I interviewed Steve Teixeria, Product Manager of Parallel Computing Platform Developer Division, Microsoft Corporation. In this second part, Steve gives a good description of parallel libraries in Visual Studio 2010. Finally, we talk about Axum language (still an incubation project), that I love a lot. <end of my own inclusions>]

End of Update

My team has been designing and implementing a set of “systems programming” extensions to C# over the past 4 years. …
[so it has the following understanding of opinions that is supposed to drive its work till mid-2014 checkpoint about such an extension to C#]

Lifetime understanding.

C++ has RAII [Resource Acquisition Is Initialization], deterministic destruction, and efficient allocation of objects. C# and Java both coax developers into relying too heavily on the GC heap, and offers only “loose” support for deterministic destruction via IDisposable. Part of what my team does is regularly convert C# programs to this new language, and it’s not uncommon for us to encounter 30-50% time spent in GC. For servers, this kills throughput; for clients, it degrades the experience, by injecting latency into the interaction. We’ve stolen a page from C++ — in areas like rvalue references, move semantics, destruction, references / borrowing — and yet retained the necessary elements of safety, and merged them with ideas from functional languages. This allows us to aggressively stack allocate objects, deterministically destruct, and more.

Side-effects understanding.

This is the evolution of what we published in OOPSLA 2012, giving you elements of C++ const (but again with safety), along with first class immutability and isolation.

[Uniqueness and Reference Immutability for Safe Parallelism (ACM, MSR Tech Report[PDF])

Abstract: A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system’s flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.]

Async programming at scale.

The community has been ’round and ’round on this one, namely whether to use continuation-passing or lightweight blocking coroutines. This includes C# but also pretty much every other language on the planet. The key innovation here is a composable type-system that is agnostic to the execution model, and can map efficiently to either one. It would be arrogant to claim we’ve got the one right way to expose this stuff, but having experience with many other approaches, I love where we landed.

Type-safe systems programming.

It’s commonly claimed that with type-safety comes an inherent loss of performance. It is true that bounds checking is non-negotiable, and that we prefer overflow checking by default. It’s surprising what a good optimizing compiler can do here, versus JIT compiling. (And one only needs to casually audit some recent security bulletins to see why these features have merit.) Other areas include allowing you to do more without allocating. Like having lambda-based APIs that can be called with zero allocations (rather than the usual two: one for the delegate, one for the display). And being able to easily carve out sub-arrays and sub-strings without allocating.

Modern error model.

This is another one that the community disagrees about. We have picked what I believe to be the sweet spot: contracts everywhere (preconditions, postconditions, invariants, assertions, etc), fail-fast as the default policy, exceptions for the rare dynamic failure (parsing, I/O, etc), and typed exceptions only when you absolutely need rich exceptions. All integrated into the type system in a 1st class way, so that you get all the proper subtyping behavior necessary to make it safe and sound.

Modern frameworks.

This is a catch-all bucket that covers things like async LINQ, improved enumerator support that competes with C++ iterators in performance and doesn’t demand double-interface dispatch to extract elements, etc. To be entirely honest, this is the area we have the biggest list of “designed but not yet implemented features”, spanning things like void-as-a-1st-class-type, non-null types, traits, 1st class effect typing, and more. I expect us to have a handful in our mid-2014 checkpoint, but not very many.

[while the rationale about such an extension to C# is the following]

1. “Why a new language?” …

In the upper-left, you’ve got garbage collected languages that place a premium on developer productivity. Over the past few years, JavaScript performance has improved dramatically, thanks to Google leading the way and showing what is possible. Recently, folks have done the same with PHP. It’s clear that there’s a whole family of dynamically typed languages that are now giving languages like C# and Java a run for their money. The choice is now less about performance, and more about whether you want a static type system. …

In the lower-right, you’ve got pedal-to-the-metal performance. Let’s be honest, most programmers wouldn’t place C# and Java in the same quadrant, and I agree. I’ve seen many people run away from garbage collection back to C++, with a sour taste permeating their mouths. (To be fair, this is only partly due to garbage collection itself; it’s largely due to poor design patterns, frameworks, and a lost opportunity to do better in the language.) Java is closer than C# thanks to the excellent work in HotSpot-like VMs which employ code pitching and stack allocation. But still, most hard-core systems programmers still choose C++ over C# and Java because of the performance advantages. Despite C++11 inching closer to languages like C# and Java in the areas of productivity and safety, it’s an explicit non-goal to add guaranteed type-safety to C++. You encounter the unsafety far less these days, but I am a firm believer that, as with pregnancy, “you can’t be half-safe.” Its presence means you must always plan for the worst case, and use tools to recover safety after-the-fact, rather than having it in the type system.

Our top-level goal was to explore whether you really have to choose between these quadrants. In other words, is there a sweet spot somewhere in the top-right? After multiple years’ of work, including applying this to an enormous codebase, I believe the answer is “Yes!”

The result should be seen more of a set of extensions to C# — with minimal breaking changes — than a completely new language.

2. “Why base it on C#?”

Type-safety is a non-negotiable aspect of our desired language, and C# represents a pretty darn good “modern type-safe C++” canvas on which to begin painting. It is closer to what we want than, say, Java, particularly because of the presence of modern features like lambdas and delegates. There are other candidate languages in this space, too, these days, most notably D, Rust, and Go. But when we began, these languages had either not surfaced yet, or had not yet invested significantly in our intended areas of focus. And hey, my team works at Microsoft, where there is ample C# talent and community just an arm’s length away, particularly in our customer-base. I am eager to collaborate with experts in these other language communities, of course, and have already shared ideas with some key people. The good news is that our lineage stems from similar origins in C, C++, Haskell, and deep type-systems work in the areas of regions, linearity, and the like.

3. “Why not base it on C++?”

As we’ve progressed, I do have to admit that I often wonder whether we should have started with C++, and worked backwards to carve out a “safe subset” of the language. We often find ourselves “tossing C# and C++ in a blender to see what comes out,” and I will admit at times C# has held us back. Particularly when you start thinking about RAII, deterministic destruction, references, etc. Generics versus templates is a blog post of subtleties in its own right. I do expect to take our learnings and explore this avenue at some point, largely for two reasons: (1) it will ease portability for a larger number of developers (there’s a lot more C++ on Earth than C#), and (2) I dream of standardizing the ideas, so that the OSS community also does not need to make the difficult “safe/productive vs. performant” decision. But for the initial project goals, I am happy to have begun with C#, not the least reason for which is the rich .NET frameworks that we could use as a blueprint (noting that they needed to change pretty heavily to satisfy our goals).

“a few glimpses into this work over the years”:

InfoQ interview about safe concurrency [April 11, 2013]

I mentioned a few months back that my team had collaborated with MSR to publish a paper to OOPSLA about some novel aspects of our programming language (see here and here).

I was excited when Jonathan over at InfoQ asked to interview me about this work. We had a fun back and forth, and I hope the result helps to clarify some of the design goals and decisions we made along the way.

You can check it out here: Uniqueness and Reference Immutability for Safe Parallelism.

Imperative + Functional == [Dec 8, 2012]

I mentioned recently that a paper from my team appeared at OOPSLA in October:

Uniqueness and Reference Immutability for Safe Parallelism (ACM, MSR Tech Report[PDF])

It’s refreshing that we were able to release it. Our project only occasionally gets a public shout-out, usually when something leaks by accident. But this time it was intentional.

I began the language work described about 5 years ago, and it’s taken several turns of the crank to get to a good point. (Hint: several more than even what you see in the paper.) Given the novel proof work in collaboration with our intern, folks in MSR, and a visiting professor expert in the area, however, it seemed like a good checkpoint that would be sufficiently interesting to release to the public. Perhaps some day Microsoft’s development community will get to try it out in earnest.

There seems to have been some confusion over the goals of this work. I wanted to take a moment to clear the air.

…

This first goal is proving to be my fondest aspect of the language. The ability to have “pockets of imperative mutability,” familiar to programmers with C, C++, C#, and Java backgrounds, connected by a “functional tissue,” is not only clarifying, but works quite well in practice for building large and complex concurrent systems. It turns out many systems follow this model. Concurrent Haskell shares this high-level architecture, as does Erlang. Well-written C# systems do the same, though the language doesn’t (yet) help you to get it right.

Of course, as called out by the second goal, immutability and controlled side-effects are tremendously useful features on their own. Novel optimizations abound.

And it helps programmers declare and verify their intent. As mentioned in the paper, we have found/prevented many significant bugs this way. …

…

The effort grew out of my work on Software Transactional Memory in 2004, then Parallel Extensions (TPL and PLINQ), and then my book, a few years later. I had grown frustrated that our programming languages didn’t help us write correct concurrent code. Instead, these systems simply keep offering more and more unsafe building blocks and synchronization primitives. Although I admit to contributing to the mess, it continues to this day. How many flavors of tasks and blocking queues does the world need? I was also dismayed by the oft-cited “functional programming cures everything” mantra, which clearly isn’t true: most languages, Haskell aside, still offer mutability. And few of them track said mutability in a way that is visible to the type system (Haskell, again, being the exception). This means that races are still omnipresent, and thus concurrent programs expensive and error prone to write and maintain.

…

joeduffy December 28, 2013 at 7:42 pm

Sorry if my explanation was unclear on this.

Basically, a goal was that any C# compiles in this new language, and then there are a bunch of new features that are opt-in.

This entailed some sacrifices in the area of defaults — and is something we constantly revisit.

What I meant by needing to change the frameworks is that, in order to really take advantage of the language, the frameworks need to be designed a bit differently. The performance problems we see in .NET are as much due to the frameworks and allocation-heavy designs of them (e.g., it’s a minor thing, but see String.Split; layers of APIs get built atop something with O(N) allocations). In principle, I suppose you could do without this step, but you’d be leaving a lot on the table.

It’s still really unclear where we will land here once all is said and done. I like that we’ve left a few doors open for ourselves.

Joe Duffy’s credentials:

A HUGE CATEGORY ARCHIVES: TECHNOLOGY SINCE 2004

His Professional .NET Framework 2.0 (Programmer to Programmer) [April 10, 2006] book which according to his post “used primarily a breadth-oriented approach” while the next one (see below) was planned to “cover a smaller set of topics, albeit very depth-oriented.”

His “Concurrent Programming on Windows” [November, 2008] book for which he:
- collected what do you want to see? [Oct 21, 2006] input
- gave Book update [July 30, 2007] indicating that “it has taken so long are numerous, but the primary reason is that the content is quite deep and detail-oriented—more than I expected at the start—and I’ve wanted to take the time to get it just right rather than cut corners” as well that “some of the abstractions I’ve built while writing the book will likely become part of a future release of the .NET Framework”
- Final manuscript for Concurrent Programming on Windows has been submitted [June 23, 2008] where he describes his Microsoft carrier path as: “At the outset, I was on the CLR Team hacking on software transactional memory and PLINQ as an evening activity. Then I transitioned to doing it full time. Then I joined the Parallel Computing team as the dev for PLINQ. Then I kicked off the whole Parallel Extensions effort (which is 20 members and growing strong), became the lead architect, and here I am today.”
- “started down the long road of writing a” 2nd edition of Concurrent Programming on Windows [Sept 28, 2009] although as of Jan 3, 2013 there is nothing out of that

A new book: Notation and Thought [Nov 11, 2008] was published as a preliminary edition dowloadable freely. It traces the lineage of imperative, functional, logic, declarative and domain-specific family trees through the most influential languages–those that have deeply impacted the way that programmers think and write–and provides insight into the motivation behind them, their major influences, and the important features that each language contributed. The book is still in preparation.

A HUGE CATEGORY ARCHIVES: BOOKS SINCE 2004 showing an enormous list of readings for his professional self-education

Some classics he is writing about in his blog:
- Butler Lampson’s “Hints for Computer System Design” [June 8, 2007]
- Dijkstra: My recollection of operating system design [Nov 12, 2006]
- Paul Graham on great hackers [Dec 24, 2004]
- John McCarthy, Scheme papers etc. in Linkopedia [Oct 12, 2004]

Some programming language related stuff:
- Announcing the Axum programming language [May 8, 2009] as “the parallel computing team just shipped an early release Axum (fka Maestro), an actor based programming language with message passing and strong isolation” and noting “recently shifted my focus to a new project with the aim of applying these ideas very broadly for a whole new platform”. Note that the link of new project is pointing to Christopher Brumme’s (cbrumme’s) Sept 2006 post moving to “an incubation team about a year ago, exploring evolution and revolution in operating systems … a fascinating area that includes devices, concurrency, scheduling, security, distribution, application model, programming model and even some aspects of user interaction (where I am totally out of my depth) … and, as you might expect with my background, our effort also includes managed programming”.
(The “Framework Design Guidelines, Second Edition, Sept 23, 2008” is listing Christopher Brumme as “annotator” described as: “… joined Microsoft in 1997, when the Common Language Runtime (CLR) team was being formed. Since then, he has contributed to the execution engine portions of the codebase and more broadly to the design. He is currently focused on concurrency issues in managed code. Prior to joining the CLR team, Chris was an architect at Borland and Oracle”. There is no later information about him, Microsoft or not.)
- Longing for higher-kinded C# [Nov 4, 2008]
- Haskell, STM, and love [April 3, 2005] where he noted “I love Haskell. So much that I’m now writing a compiler for it. In my “spare” time, of course. (Which means just a couple hours a week since my book is priority #1 at the moment.)”

Talks (typically recorded as well):

UWTV [University of Washington TV] talk: Microsoft’s Parallel Computing Platform, Applied Research in a Product Setting [Oct 31, 2008]: “It was recorded and will eventually air on UWTV, but has also been posted online: …The goal of Microsoft’s Parallel Computing Platform (PCP) team … This talk examines PCP’s current progress, explicitly relating it to specific research of the past and present, in addition to surveying future efforts and possible research opportunities. … If you’re not aware of the work we’re doing in Visual Studio 2010 — both in .NET 4.0 and C++ — this talk gives a pretty good overview of all of it.”

A tour of the new Parallel Extensions CTP on Channel9 [June 5, 2008]

PDC’08: Concurrent, Multi-core Programming on Windows and .NET [May 29, 2008]: “PDC’08 is officially on for October 27-30th this year: http://microsoftpdc.com/. My team will certainly have some really fun stuff to show off, and just glancing at the preliminary list of teaser sessions, it’s going to be a blast. … Concurrent, Multi-core Programming on Windows and .NET… Joe Duffy leads development for Microsoft’s Parallel Extensions to .NET technology, a set of library and runtime technologies for concurrent and parallel computing. He founded the project in 2006 with Parallel Language Integrated Query (aka PLINQ), an innovative declarative parallel query analysis and execution engine. Prior to Parallel Extensions, Joe worked on transactional memory, library and VM support for concurrency in the Common Language Runtime (CLR) team, and has written 3 functional language compilers (Scheme, Common LISP, and Haskell). He has written two books, including Concurrent Programming on Windows (Addison-Wesley, 2008), and in his spare time reads and writes (code and text), plays guitar, and studies music theory.”

Channel9 on TPL [Feb 19, 2008]: “to discuss the Task Parallel Library component of Parallel Extensions.”

New Channel9 vid: Programming in the Age of Concurrency – PFX [Oct 13, 2008]: “Parallel Extensions for the .NET Framework (PFX), a managed programming model for data parallelism, task parallelism, scheduling, and coordination on parallel hardware.”

Upcoming speaking at JAOO’06: Concurrency and the composition of frameworks [June 29, 2006]: “… at JAOO’06 in Denmark this October … Concurrency and the composition of frameworks … This talk presents an overview of the problem, identifies some key challenges, and proposes some direction for enabling our software to both take advantage of concurrency and to avoid inhibiting it.”

PDC Talk: Writing a Compiler in One Hour [Oct 4, 2005]

PDC: Programming with Concurrency [July 17, 2005]: “my concurrency talk for PDC …talk’s focus is on the hows and whys of concurrency with a good mix of the realities of the Windows platform thrown in.”

PDC 2k5: talk [July 14, 2005]

Articles (MSDN or elsewhere):

ParallelFX MSDN mag articles [Sept 15, 2007]

MSDN Magazine: 9 Reusable Parallel Data Structures and Algorithms [April, 12, 2007]

New app responsiveness article in Dr. Dobb’s [Sept 22, 2006]: “… Using concurrency to enhance user experiences … Here are some good follow-up references: …“

Hello PLINQ [Sept 13, 2006]: “… Microsoft’s PLinq to Speed Program Execution (eWeek)… MS eyes multicore technology (InfoWorld) …”

MSDN magazine: Concurrency for scalability [Aug 1, 2006]

.NET Rocks interview and new deadlocks article [March 8, 2006]: “… .NET Rocks Interview on … guess what … Concurrency … No More Hangs: Advanced Techniques to Avoid and Detect Deadlocks in the latest issue of MSDN Magazine. It’s now available online. …”

MSDN magazine: Transactions for memory [Dec 13, 2005]

Filed under: system languages Tagged: Axum, Axum language, C#, concurrency, concurrent systems, extensions to C#, functional languages, imperative languages, Joe Duffy, large and complex concurrent systems, operating systems, Parallel Extensions to .NET, parallelism, performance, PLINQ, productivity, programming language, research, safe concurrency, safe parallelism, TPL