2015-12-03



Check out this recent chart from Google Trends, which shows interest in PostgreSQL largely holding steady over the past ten years, even as MySQL faces fairly significant declines. It’s a good jumping-off point for a discussion about whether it’s time to switch to PostgreSQL for your database-management needs.

PostgreSQL, MySQL, and MariaDB are the three “main” open-source relational databases available today (there are four if you count FireBird, but for brevity I’m excluding it). If you don’t know MariaDB, it’s a drop-in replacement for MySQL, forked not long after Oracle bought Sun (which had previously bought and owned MySQL). MariaDB is more open than Oracle’s MySQL, but technically maintains very high compatibility.

There are other forks out there. Drizzle’s roadmap doesn’t go past 2012, so it seems dead, whereas WebScaleSQL is an ongoing collaboration by engineers from Alibaba, Facebook, Google, LinkedIn, and Twitter. The latter aims at large-scale users with a fork of MySQL 5.6 with a GPL 2 license. With commits in the last couple of weeks, this project is clearly alive and kicking.

For years, MySQL had a reputation of being faster than PostgreSQL, but much of that was due to the MyISAM database engine, which didn’t support transactions. On the flip side of things, PostgreSQL had a reputation for being slower but more reliable. In recent versions of both platforms, speed has been less of a problem for PostgreSQL, while MySQL now defaults to the InnoDB engine, which does handle transactions.

Perception Bias

These assumptions have long biased developers in selecting a database. But as the databases continue to improve, the choice gets a little more complicated. MySQL is less standardized that PostgreSQL, and it lacks schemas. (A schema is similar to a namespace, so you can have two tables called MyTable in the same PostgreSQL database; they might be named database1.schema1.Mytable and database1.schema2.MyTable. In MySQL, without schemas, you’d probably use two databases.)

For a simple web application that needs a relational database, MySQL is probably going to be your main choice. It’s mostly SQL 99 compliant, is easy to install (if not already installed by your hosting company) and there are plenty of good (and also free) client tools. I particularly like Heidi  SQL, which works well for both MySQL and PostgreSQL and Microsoft SQL Server. Generally, there are more client tools available for MySQL, which reached 20 years old in 2015.

PostgreSQL is actually based on Ingres, a database from the 1970’s. Ironically, PostgreSQL’s sheer number of advanced features harms it in the comparison with MySQL; sometimes simpler is better. Here are some other things that MySQL lacks:

Indexing

All relational databases let you add indexes to speed up searches, but PostgreSQL’s indexes go further. It’s in good company with both Oracle and Firebird in having expression indexes. These let you apply a function (say lowercase or some kind of hash) rather than just indexing on one or more columns. Partial indexes only apply to rows which match a ‘where’ clause. This keeps the index smaller.

BRIN (Block Range Index) breaks the data from very large tables into ranges, then stores those pages with extra data showing the minimum and maximum values. This avoids creating and maintaining a B-Tree index, which is what non-BRIN indexes use. It’s slower than a B-Tree index but takes up less space. You can read more about it on Michael Otacoo’s post.

For text searches there are GiST and GIN indexes. The GiST (Generalized Search Tree) lets you create custom indexes. You provide the customization to index your content and the GiST infrastructure code manages the index pages layout on disk, the search algorithms to search and delete from indexes, plus complex transactional details for page-level locking for high concurrency and write-ahead logging for crash recovery.

Generalized Inverted Indexes (aka GIN) offer an alternative to GiST by storing words, but it’s slower to index than GiST. According to the PostgreSQL documentation, GIN index lookups are three times faster than GiST.

Table Inheritance

PostgreSQL is described as object relational. There’s always been something of an Impedance Mismatch between the worlds of relational databases and object oriented programming. They have different paradigms and this is why Object Relation Mapping (ORM) exists. PostgreSQL tries to bridge the two worlds. For instance, table inheritance lets you define a base table then create a new table derived from it, just like in object oriented programming. Tables can inherit from multiple tables with identical names and types of columns merged into one. As you’d expect, you can select, update or delete but not insert into an inherited table. More details are listed in the PostgreSQL documentation.

JSON and NoSQL

Although MySQL has JSON support, PostgreSQL goes further with both JSON  and jsonb document support. This lets you do NoSQL-type operations, making PostgreSQL extra useful.

The previous versions (9.3, 9.4) have been adding to this documentation, increasing its usefulness. The jsonb format, which appeared in the 9.4 release, stores JSON data in a binary form and enables indexing of data in a JSON document. You can read more about JSON and jsonb.

Conclusion

PostgreSQL is a pretty complicated beast, but is very reliable and has features typically only found in Oracle and Informix. You can see these and other comparisons on Wikipedia.

Whether you need these functions, or prefer to use MySQL/MariaDB etc. depends on your application and its future database requirements. But it’s certainly worth checking out what’s possible. To get the best performance out of PostgreSQL, you’ll need to tune the config files.

The post Choosing PostgreSQL Over MySQL, MariaDB appeared first on Dice Insights.

Show more