2014-08-20

My name is 13 characters long. There are lots of DeWolfes, lots of Shawns, a few Shawn DeWolfes. My 13 character name doesn’t mean anything unique. Even my nine-digit social insurance number only goes so far. In my country, Canada, it defines me specifically, but any other nation with a nine digit social insurance number system will likely have a member with a number the same as my own.

What if one number could address every person alive, be shorter than a name and shorter than a social insurance number? You can’t do it with Base-10 numbers, but with Base-36, it’s a breeze.

We use base ten so much that we discount the utility of going up by orders of ten with each digit we add to a number. With two digits we can go from 0 to 99. Hexadecimal takes it further: with two hexadecimal digits, we can get to 255 - from 0 to FF.

Hexadecimal numbers get above the ten digit mark without having to invent new numbers. It does this by using A, B, C, D, E and F to reference the 11th through 16th digitals. Base-36 takes that one step further and uses all of the conventionally available characters we are familiar with. Base-36 uses numbers to handle the first ten digits. Digits 11 through 36 are referenced with the alphabet from A to Z. We know the order of numbers from 0 to 9, and we know the alphabet, so we can anticipate the progression.

By using the base-36 number, massively bigger numbers can be referenced with an economy of size. While a two digit number gets you to 99; ZZ, a two digit Base-36 expression gets to 1295.  Z,ZZZ,ZZZ is the base ten equivalent of 78,364,164,095.

With that seven digit number under base-36, you can reference every person alive and almost every person who was ever alive with their own unique 7-digit number.

When you get to eight digits, you could have the Internet of things covered. Eight digits at base-36 counts to above two trillion (2,821,109,907,455 to be exact).

Base-36 is a good practical ceiling to work with in lieu of base ten or a hexadecimal sequences. PHP and MySQL have conversion functions that can convert numbers to and from base-36. The functionality is there. It allows for the storage of more compact data.

From the human perspective, it has been said that many people can remember a list of 5 things plus or minus two. Many can recall important phone numbers. And just as most people can recall a seven digit phone number, so it can be argued that they can retain a seven character string representation of something big - instead of a 1-in-a-million phone number, a seven character base-36 figure will represent one of 78 billion references.

Why Are Big Numbers Important?

As shown above above, big numbers can be handy to addressing large masses of data. Facebook stores their posts with ID numbers that are spiralling upwards.

A post I just pulled has the ID number of 902352183124757. Fifteen digits -  902 trillion. If they’re at 902 trillion, and a guy like me throws in a crazy amount of posts per day and a tens of millions do as I do, that post odometer is going to roll over soon.

Were the posts formatted to be 10 base-36 digits, the database would have more leg room (eg. almost 4 quadrillion (3,656,158,440,062,980) references available). If Facebook got to this point through exponential growth and that exponential growth is leveling out, then 2+ quadrillion posts should give that database the room it needs to reference new posts without going to a googolplex value.

Aren’t Base-36 Numbers Processing Intensive?

Yes and no. Inside of a database, integers are the most economical way to store data. Base-36 numerals would be considered strings and strings are more expensive storage.

Likewise, auto-increment in MySQL will only increment integers. You can format strings to be consistent. For example, all 10 characters could be used with zeroes to the left of the number so that 0000000008 would be eight while 00000000ZZ would be 1295. When sorted alphabetically, the progression would look like a numerical progression. While auto-incrementation is built into MySQL and most other relational databases, it’s not the only game in town. You can create new auto generated base-36 numbers by associating a trigger to a table (which we'll discuss momentarily) to introduce new, orderly values when new records are inserted.

Where Base-36 Can Be Used

The goal of base-36 is compaction and relevance. Instead of 10 digits to reference the people on Earth, seven characters will address them all. Instead of 16 digits to address all of the Facebook’s status updates ever, 10 characters can be used. When it comes to relevance, the sequence can be both an incrementing value and some of the value can be set aside to declare additional qualities in what is being defined.

Base-36 can be used to reference these sorts of items:

People. A seven digit base-36 number can reference 78 billion people. If you convert a user’s reference into 7 characters.

Country Codes. Country codes are already two character representations. There are 193 recognized countries (by writing that, I just know some country is going to split into two by the time I get to the close bracket). The ISO-3166 standard is a list of two digit country codes. With two alphabetical characters, 676 specific countries can be referenced, country codes are good at using just two characters. Using the ISO-3166 standard leaves over 400 references unused, but it still provides a common and recognizable reference.

Cities. China, with its one billion people, has over 1020 cities. Those communities could be referenced inside of two base-36 digits. Many countries will have fewer than 1000 communities. Let’s say that community references get really particular and to satisfy all of the references,  three digits can associate 46,655 communities inside of one country.

Devices. The Internet of Things is coming, I’m sure. I have three devices with their own wireless needs. Some tech-friendly people could have many more wired devices. If this were 36 devices per person, then one digit could cover off all of those devices. Two digits to reference devices and things covers off 1295 possibilities.

Amalgamated Serial Numbers

These strings can be combined to make unique by amalgamating the characters in an orderly sequence. In the following example, you can make a reference to people, their location and their devices. The whole string can be unique while elements therein repeat.

For example: US001200GHK4 could actually mean:

US - country code

001 - Manhattan

200GHK4 - A person’s unique code.

Maybe their devices get added to the identification process. Let’s say the laptop is their primary device. When their cellphone is put into the figuring it is the second device that is associated with the user: US001200GHK42. The “2” stands for that second device.

If that is how the base-36 were worked into creating some identification, the length of the string will speak to what it associates.

Two digits long = country using the ISO-3166 standard codes

Five digits long = community in a country

Twelve digits long = a person as they reside in a country

Thirteen digits long = a reference to an IP addressable device owned by a user in a particular community and country.

With 13 digits, a MySQL search for “US%” for will return all US citizens. “US001%” will return all of the people in Manhattan. “US001%1” will reveal the primary / preferred device used by all of those Manhattan residents. With a logic like that, communication can be routed to a preferred chunk of a network.

Of course, there are a lot of what if:

What if they change cities? The third to fifth characters change.

What if they hop to another country? The first five digits change to reflect new digs.

What if they own more than 36 devices? If that happens, then the last two digits can represent their device instead of solely the last-- a fourteen digit ID# would say, “this dude has a lot of gadgets.”

Storage in a Database

The main purpose of creating these large numbers and storing them as base-36 references is to practice a sort of economy. These need to be sequential like index keys, but you do not have to do any particular math with them.

In MySQL, base-36 strings stored as VARCHAR data types behave like integers. The strings can be compared through aggregate functions like MAX() and MIN() to get the highest and lowest available numbers, respectively.

You can also fetch a base-36 string by sorting in descending order to get the highest number first. Unlike integers, base-36 strings can be filtered with LIKE statements should the strings be a combination of amalgamated series and incrementing values.

Using Values in MySQL

In MySQL there is the CONV() function that can convert from anything from a base-2 through to a base-36 number. To get a base-36 to its base 10 equivalent, do CONV(‘ZA’, 36, 10). To get it from a base 10 over to a base-36 you can go the other way. CONV(‘1294’, 10, 36). You can nest these functions to create something that increments: CONV(CONV('ZA', 36, 10) + 1, 10, 36)  will output ‘ZB.’

Incrementing Base-36 Keys in MySQL

This can be put into a custom procedure and that procedure can be triggered when new records are inserted into a database table. In the example below, the trigger is added to the base_example table to execute and create a base-36 key when a new record is added to the base_example table.

Figure 1. A triggered procedure to create and incrementing value.

In this example, there are two assumptions added to the mix. First, the VARCHAR field is to be 12 characters long. Second, the values in the VARCHAR field are left-padded with zeros so that all of the output look consistent and can be sorted in a predictable fashion.

Math With Base-36

Base-36 is cool, but most languages still reference things in base 10 and binary. PHP can do base conversions however and it is clever enough to extrapolate the the letters A through Z cover the 11th through 36th digits.

With a simple function, base-36 numerals can be passed a function (which we'll see momentarily) for conversion, calculation, and a return value.  It does this by pulling the 0-9A-Z characters from the formula, carrying out a base-36 calculation and then converting the output back to base-36.

Doing Base-36 Formulas in PHP

There is a limit to how complex the math is, but I wrote an example function b36math() that converts a base-36 formula into a base-36 result.

Figure 2. The b36math conversion function to execute functions executed with base-36 numbers.

Conclusion

Our world is data hungry. That data has to be well referenced. In the crunch to access bigger bodies of data, using references stored as base-36 numbers is a way to store bigger numbers in less space.

There is a foot race to what is a valuable commodity: processing speed, bandwidth, or storage. When one is in generous supply, you can spend it compensating for the other. If you have a lot of available cycles for processing, you can do store data in a cumbersome format and use processing to make it useable.

While there is a cap of how many digits can be referenced in an integer, varchar fields can go to 255 characters, and text fields are open-ended. Very large base-36 numbers can be stored to reference individual elements in very large bodies of data.



Show more