2014-10-13

Coincidentally three different people asked me in the last month, to write about new technologies that they should be knowing, to make them more eligible to get a job in a startup. All these people have been C/C++ programmers, in big established companies, for about a decade now. Some of them have had only glimpses of any modern technologies.

I have tried a little bit (with moderate success) to work in all layers of programming with most of the popular modern technologies, by writing little-more-than-trivial programs (long before I heard of the fancy title "full stack developer"). So here I am writing a "technology catchup" post, hoping that it may be useful for some people, who want to know what has happened in the technologies in the last decade or so.

Disclaimer 1: The opinions expressed are totally biased as per my opinion. You should work with the individual technologies to know their true merits.

Disclaimer 2: Instead of learning everything, I personally recommend people to pick whatever they feel they are connected to. I, for example, could not feel connected to node-js even after toying with it for a while, but fell in love with Go. Tastes differ and nothing is inferior. So give everything a good try and pick your choice. Also remember what Donald Knuth said, "There is difference between knowing the name of something and knowing something". So learn deeply.

Disclaimer 3: From whatever I have observed, getting hired in a startup is more about being in the right circles of connection, than being a technology expert. A surprisingly large number of startups start with familiar technology than with the right technology, and then change their technology, once the company is established.

Disclaimer 4: This is actually not a complete list of things one should know. These are just things that I have come across and experimented a little bit at least. There are a lot more interesting things that I would have have missed. If you need something must have been in the list, please comment :-)

With those disclaimers away, let us cut to the chase.

Version Control Systems
The most prominent change in the open source arena, in the last decade or so, is the invention of Git. It is a version controlled system initially designed for keeping the kernel sources and has since then become the de-facto VCS for most modern companies and projects.

Github is a website that allows people to host their open source projects. Often startups recruit people based on their github profile. Even big companies like microsoft, google, facebook, twitter, dropbox etc. have their own github accounts. I personally have received more job queries through my github projects than via my linkedin profile in the last year.

bitbucket is another site that allows people to host code and give even private repos. A lot of the startups that I know of use this, along with the jira project management software. This is your equivalent of MS Project in some sense.

I have observed that most of the startups founded by people who come from Banking or Finance companies to be using Subversion. Git is the choice for people from tech companies though. Mercurial is another open source, distributed VCS which has lost a lot of limelight in the recent times, due to Git. Fossil is another VCS, from the author of sqlite, Dr. Richard Hipp. If you can learn only one VCS for now, start with Git.

Programming Languages & Frameworks
Javascript has evolved to be a leading programming language of the last decade. It is even referred to as the X86 of the web. From its humble beginnings as a client-side scripting language to validate if the user has typed a number or text, it has grown into a behemoth and entered even the server-side programming through the node-js framework. For incorporating ModelViewController pattern, javascript has gained the AngularJS framework. JS is a dynamically typed language and to bring in some statically typed langauges' goodness, we have a coffeescript language too.

Python is another dynamically typed, interpreted programming language. Personally, I felt that it is a lot more tasteful than Javascript. It feels good on eyes too. It helps in rapid application development and is available by default in almost all the Linux distros and Mac machines by default. Django is a web framework that is built on python to make it easy to develop web applications. In addition to being used in a lot of startups, it is used in even big companies like Google and Dropbox. There are variants of Python runtime such that you can run it in the JVM using Jython or in the .NET CLR using the IronPython. I have personally found this language to be lacking in performance though, which is elaborated more in a subsequent section.

Ruby is an old programming language that shot into fame in the recent years through the popular web application framework Ruby on Rails, often called just Rails. I have learnt a lot of engineering philosophies such as DRY, COO etc. while learning RoR.

All these above languages and frameworks use a package manager such as npm, Bower, pip, gems etc. to install libraries easily.

Go is my personal favorite in the new languages to learn. I see Go becoming as vital and prominent a programming language as C, C++ or Java in the next decade. It is developed in Google for creating large scale systems. It is a statically-typed, automatic-memory-managed language that generates native-machine-code and helps writing concurrent-code easily.

Go is the default language that I use for any programming task in the last year or so. It is amazingly fast even though (just because?) it is still in the 1.X series. In my dayjob we did a prototype in both go and python, and for a highly concurrent workflow in the same hardware, Go puffed Python in performance (20 seconds vs 5 minutes). I won't be surprised if a lot of the python and ruby code gets converted to golang in their next edition of rewrites. Personally, I have found the quality of go libraries to be much higher compared to Ruby or nodejs as well, probably because not everyone has adapted to this language yet. However, this could be just my personal biased opinion.

If you like to get fancy with functional programming, then you can learn Scala (on top of JVM), F# (on top of .NET), Haskell, Erlang, etc. The last two are very old btw but in use even today. Most recently, Whatsapp was known to use Erlang. D is also seen in the news, mostly thanks to Facebook. Dart is another language that is from Google but still to receive any wide deployment afaik, even with Google's massive marketing machinery behind it. It has been compared to VBscript and is criticized, and as of now chrome-only. Dart has received criticism from Mozilla, Webkit (rendering engine that powers Safari (and chrome earlier)), Microsoft IE as well. Dart is done by Lars Bak et al. (the people who gave us V8, chrome's Javascript engine)

Rust is another programming language that is aimed for high-performance concurrent systems. But I have not played around with it, as they don't maintain a stable API and they are not 1.0 yet. Julia is another programming language aimed at doing distributed systems, about which I have heard a lot of praise, but it still remains a exotic language afaik. R is another language which I have seen in a lot of corporate demos where the presenters wanted to show statistics, charts. Learning this may be useful even if you are not a programmer and works with numbers (like a project manager).

There is a Swift programming language from Apple to write iOS apps. I have not tried Swift yet, but from my experience of using Objective C, it cannot be worse.

Bootstrap is a nice web framework from twitter, which provides various GUI elements that you can incorporate into your application, to rapidly prototype beautiful applications, that are fluidic even when viewed in mobile.

jquery is a popular javascript library that is ubiquitous. Cascading Style Sheets (shortly CSS) is a markup language that helps configure the style of the web page UI elements. CSS is becoming mature to the extent of showing animations too. You should ideally spend a few weeks to learn about HTML5 and CSS.

Text Editors

Sublimetext is what the cool kids use these days as the editor. I have found the tutorial on tutsplus to be extra-ordinarily good at explaining sublime. It is a free (as in beer) software and not open source.

Atom is a text-editor from github built using nodejs and chromium. I did not find a linux binary and so did not bother to investigate it. But I have heard it to be good for Javascript programmers than any others, as the editor could be extended by javascript itself.

Brackets is another editor that I have heard good things about. Lime is an editor that is developed in Go, aimed to be an open-source replacement for the sublimetext.

Personally, after trying various text editors, I have always comeback to using vim. There are a few good plugins for vim in the recent times. Vundle, Pathogen are nice plugin managers for vim to ease up installation of plugins. YouCompleteMe is a nice plugin for auto-completion. vim-spf13 is a nice distro of vim, where various plugins and colorschemes are pre-packaged.

Distributed Computing

In the modern day of computing, most programs have been driven by a Service Oriented Architecture (shortly SOA). Webservices are the preferred way of communication among servers as well. While we are talking about services, please read this nice piece by Steve Yegge.

memcached is a distributed (across multiple machines), caching system which can be used in front of your database. This was initially developed by Brad Fritzpatrick, while he was the head of the LiveJournal and who is now (2014) a member of the Go team at Google. While at Google, he has started GroupCache which as the project page says is a replacement for memcache in many cases.

GoogleFileSystem (GFS) is a seminal paper on how Google created a filesystem to suit their large needs of data processing. There is a database built on top of this filesystem named BigTable which powered Google's infrastructure. Apache Hadoop is an open source implementation of these concepts, which was originally started in Yahoo and now a top-level apache project. HDFS  is the equivalent of GFS for the Hadoop. Hive and Pig are technologies to query and analyze data from the Hadoop.

As with the evolution of any software, GFS has evolved into a Colossus filesystem and BigTable has evolved into a Spanner distributed database. I recommend you to read these papers even if you are not going to do any distributed computing development.

Cassandra is another distributed database which was started in Facebook initially, but is used in many companies such as Netflix and Twitter. I have used Cassandra more than any other distributed project and actually like it a lot. It uses a SQL like query language called CQL - Cassandra Query Language. It is modelled after the DynamoDB paper from Amazon. I am too tempted to write an alternative to this in Go, just to have the idea of writing a large scale distributed system, instead of just using it as a client, but have not got around to a good dataset or usecase with which I can test it.

MongoDB is another document oriented database, which I tried using for a pet project of mine. I don't remember exactly but there were some problems with respect to unicode handling. The project was done prior to go becoming 1.0, so the problem could be in any end.

Most of the new age databases are called NOSQL databases but what they really mean is that the database skips a lot of functions (such as datatype validation, stored procedures, etc.) and try to grow by scaling out instead of scaling up.

Cloud
OpenStack is a suite of open source projects that help you create a private cloud. DeltaCloud is a project which was initially started by RedHat, and now an apache top-level project, as a way to provide a single API layer which will work across any cloud in the backend. This project is done in ruby. I was initially interested in participating in its development, until I got introduced to Go and moved into a different tangent.

To start off a software company is a very easy task to do in today's world. The public clouds are becoming cheaper and cheaper everyday and their capacity can be provisioned instantly.

Amazon web services provides an umbrella of various public cloud offerings. I have used Amazon EC2 which is a way to create a Linux (and windows) VM that runs on Amazon's datacenters. The machines come on various sizes. Amazon S3 is a cloud offering that provides you way to store data in buckets. This is used by Dropbox heavily for storing all your data. There are various other services too. In some of our prototyping, we found the performance of Amazon EC2, to be consistent mostly, even in the free tier.

Google is not lagging behind with their cloud offerings either. When Google Reader was shut down, I used Google's Appengine to deploy an alternative FOSS product and I was blown away by the simplicity of creating applications on top of it. Google Compute is the way to get VMs running on the Google Cloud. As with Amazon, there are plenty of other services too.

There are plenty of other players like Microsoft Azure, Heroku etc. but I do not have any experience with their applications. While we are talking about Cloud, you should probably read about Orchestration and know about at least Zookeeper.

In-Process Databases

These are databases which you can embed into your application, without needing a dedicated server. They run on your process-space.

sqlite is the world's most deployed software and it competes with fopen to become the default way to store data for your desktop applications (if you are still writing them ;) ). A new branch is coming with the latest rage on storage datastructures, a log-structured merge tree as well.

leveldb is a database that is written by the eminent Googlers (and trendsetters of technology in the last decade or so) Jeff Dean and Sanjay Ghemawat who gave us MapReduce, GFS etc. It is forked by Facebook into RocksDB as well.

KyotoCabinet and LMDB are other projects on this space.

Linux Filesystems

Since we have covered GFS, HDFS, etc. earlier. We will look at other popular filesystems.

btrfs is a copy-on-write filesystem in Linux. It is intended to be the defacto linux filesystem in the future, possibly obsoleting ext series in the longer run.

XFS is a filesystem that initially came from SGI to Linux. This is my personal favorite and I have been using it on all my linux machines. In addition to good performance, this offers robustness and comes with a load of features that are useful to me, like defragmentation.

We also have the big daddy of filesystems zfs too on linux.

Ceph is another interesting distributed filesystem that works on the kernel space and is already merged in the linux kernel sources for a long time now. GlusterFS is another distributed filesystem which works in the userspace. Both of these filesystems focus on scaling out instead of scaling up.

Conclusion

Pick any of these technologies that you like and start writing a toy application on it, may be as simple as a ToDo application and learn through all the stages. This approach has helped me. It may help you also.

I have written this post from a Thinkpad T430 running openSUSE Factory and GNOME Shell with a bunch of KDE tools. I like this machine, However, in the past few months I have realized that, in today's world, If you are a developer, it is best if you run Linux on your server and Mac on your laptop.

Show more