2016-06-03

A wise man (or woman – the quote is unattributed) once said that assumption is the mother of all cock-ups.

This is especially true in the wonderful world of databases.

The term NoSQL covers databases as different from each other as they are from the traditional Relational Database Management Systems (RDBMS).

The assumption implicit in that last sentence is that Relational Databases are broadly the same.

The problems with this assumption begin to manifest themselves when a team is assembled to write a new application running on an Oracle RDBMS.

Non-Oracle developers may have been used to treating databases as merely a persistence layer. Their previous applications may well have been written to be Database Agnostic.

This is a term which is likely to cause consternation among Oracle Developers, or at least, Oracle Developers who have ever tried to implement and support a Database Agnostic application running on Oracle. They may well think of this approach as the “Big Skip” anti-pattern where the database is treated as a dumping ground for any old rubbish the application feels like storing.

As a consequence, they will strongly favour the application being “Front-End Agnostic”. In other words, they will lean toward the Thick Database Paradigm as a template for application architecture.

With all of this Agnosticism about it’s amazing how religious things can get as the relative merits of these opposing views are debated.

These diametrically opposing views on the optimum architecture for a database centric application all stem from that one assumption about Relational Databases.

To make things even more interesting, both sides in this debate share this assumption.

The fact of the matter is that Oracle is very different from other RDBMSs. Oracle Developers need to appreciate this so that they can accept that the Database Agnostic Architecture is a legitimate choice for some RDBMSs and is not simply the result of non-Oracle Developers not knowing anything about databases.

The other point to note is that Oracle is very different from other RDBMS – OK, it’s technically the same point, but it’s such an important one, it’s worth mentioning twice.

Non-Oracle Developers need to understand this so that they can accept that the Thick Database Paradigm is a legitimate choice for the Oracle RDBMS and not simply the result of technological parochialism on the part of Oracle Developers.

Whatever kind of developer you are, you’re probably wondering just what I’m banging on about right now and where this is going.

Well, the purpose of this post is to take several steps back from the normal starting point for the debate over the optimal application architecture for a Database Centric Application on Oracle and set out :

Why Relational Databases are different from each other

Why the Thick Database Approach can be particularly suited to Oracle

Under what circumstances this may not be the case

Hopefully, by the end I’ll have demonstrated to any non-Oracle Developers reading this that the Thick Database Paradigm is at least worth considering when developing this type of application when Oracle is the RDBMS.

I will also have reminded any Oracle Developers that Oracle is a bit different to other RDBMS and that this needs to be pointed out to their non-Oracle colleagues when the subject of application architecture is being discussed.

I will attempt to keep the discussion at a reasonably high-level, but there is the odd coding example.

Where I’ve included code, I’ve used the standard Oracle demo tables from the HR application.

There are several good articles that do dive into the technical nitty-gritty of the Thick Database Paradigm on Oracle and I have included links to some of them at the end of this post.

I can already hear some sniggering when the term Thick Database gets used. Yes, you there in the “Web Developers Do It Online” t-shirt.

In some ways it would be better to think of this as the Intelligent Database Paradigm, if only to cater to those with a more basic sense of humour.

Assumptions

Before I go too much further, I should really be clear about the assumptions all of this is based on.

Application Requirements

To keep things simple, I’m going to assume that our theoretical application implements some form On-Line Transaction Processing (OLTP) functionality.

Of course, I’m going to assume that Oracle is the chosen database platform (or at least, the one you’re stuck with).

Most importantly, I’m going to assume that the fundamental non-functional requirements of the application are :

Accuracy

Performance

Security

Maintainability

Terminology

On a not entirely unrelated topic, I should also mention some terms, when used in the context of the Oracle RDBMS, have a slightly different meanings to that you might expect…

database – normally a term used to describe the database objects in an application – in Oracle we’d call this a schema. This is because database objects in Oracle must be owned by a database user or schema.

stored procedure – it’s common practice in PL/SQL to collect procedures and functions into Packages – so you’ll often hear the term Packaged Procedures, Packages, or Stored Program Units to cover this

database object – this is simply any discrete object held in the database – tables, views, packages etc

transaction – by default, Oracle implements the default ANSI SQL behaviour that a transaction consists of one or more SQL statements. A transaction is normally terminated explicitly by the issuing of a COMMIT or a ROLLBACK command.

The HR Schema

This is normally pre-installed with every Oracle database, although your DBA may have removed it as part of the installation.

If you want to follow along and it’s not installed, you can find the build script for it in :

Note that the script requires you to provide the SYS password for the database.

I’ve created copies of two of the tables from this application, the EMPLOYEES and DEPARTMENTS tables, for use in the examples below.

Database Agnostic and Thick Database – definitions

To keep things simple, we can explain each in the context of the Model-View-Controller(MVC) design pattern.

In MVC, the application components are divided into three categories :

The View – the GUI

The Controller – where all of the application logic exists. This layer sits in the middle between the view and the…

Model – the persistence layer – traditionally a Relational Database implementing a Physical Data Model

The Database Agnostic Approach is to treat the Model simply as a persistence layer. Implementation of Referential Integrity in the database is minimal and the implementation of any business logic is done entirely in the Controller layer, little or none of which impinges upon the RDBMS in the Model.

The main idea behind this approach is that it is trivial to migrate the application from one RDBMS to another.

The Thick Database Paradigm takes a very different approach.

The Referential Integrity is rigorously applied in the RDBMS, and the Data Model is done in some approximation of Third Normal Form.

The Controller layer is in fact implemented as two physical layers.

The code outside of the database – the Data Access Layer (DAL) accesses the model by means of a Transactional API (XAPI) which is held in Stored Procedures inside the RDBMS engine itself.

We’re going to explore the advantages of this approach in the context of the Oracle RDBMS.

There we are then, something for everyone to object to.

The thing is, both of these approaches have their place. The trick is to know the circumstances under which one is more appropriate.

It may help at this point then, if we can return to the question of…

Why RDBMSs are different from each other

Maybe that heading should read “Are RDBMSs different from each other ?” Superficially at least, they do seem to have a fair bit in common.

To start with, they all implement the relational model to some degree. This means that data is arranged in tables and that (in the main) it is possible to define relationships between these tables.

For circumstances where a Business Transaction may require multiple DML statements, the RDBMS will enable the creation of Stored Procedures to enable such transactions to be done in a single call to the database.

The most obvious similarity is, of course, that any retrieval of or amendment to data stored in the database is ultimately done by means of a Structured Query Language (SQL) statement.

A fundamental characteristic of SQL is that it is a Declarative Language. You use it to tell the database what data you want to access. It is the Database Engine that then has to figure out how to do this.

Whilst the implementation of SQL is (more-or-less) standard across RDBMSs, the underlying Database Engines behave very differently.

One example of the differences between Database Engines can be seen when you need to execute a query that contains many table joins.

If you were running such a query on MSSQL, it may well be more efficient to do this in multiple steps. This would be done by writing a query to populate a temporary table with a result set and then joining from that table to return the final results.

This contrasts with Oracle, where the optimal approach is usually to do this with a single SQL statement.

For the moment, I’ll assume that the above has been sufficient to persuade you that Relational Databases are in fact different from each other in a fairly fundamental way.

Feel free to put this to the test yourself. Go ahead, I’ll wait….

OK. We’re all agreed on that then.

The question you’re now asking is this –

If RDBMSs are different from each other is the Database Agnostic approach the best architecture for all of them ?

The next thing we need to understand is….

Why Oracle is Special

“Because it’s so expensive !” may well be your first thought. Remember that we’re assuming that Oracle is the RDBMS platform that you have chosen ( or been lumbered with) for your application. This being the case, we come back to the question of why the Thick Database Paradigm is worthy of consideration for your application architecture.

Returning to our list of non-functional application requirements, can you guess which of the application components is likely to have the biggest impact on performance of an Oracle Database Application ? Clue : It’s also the most expensive thing to change after go-live as it’s the card at the bottom of the house of cards that is your Application….

The Physical Data Model

This aspect of the Thick Database Paradigm is often overlooked. However, it is by far the most important aspect in maximizing the success of the implementation of this architectural approach.

Oh, that’s your sceptical face, isn’t it. You’re really not entirely sure about this. You’re probably not alone, even some Oracle Developers will be giving me that same look about now. I hope this next bit is convincing because my flame-proof underpants are currently in the wash.

OK, as I said a little while ago ( and I think you pretty much agreed at the time), any interaction with stored in an RDBMS will ultimately require the execution of an SQL statement by the Database Engine.

The particular bit of the Oracle Kernel that works out the how is probably called KX$ something. Friends however, tend to refer to it as the Cost Based Optimizer (CBO).

The CBO is pretty sophisticated. The more information you can provide Oracle about your data model the better the execution plans the CBO generates.

The upshot is that the better the data model, the faster that statements against it will run.

For example, the CBO understands RI constraints and can account for them in it’s execution plans as I will now demonstrate…

I’ve copied the EMPLOYEES and DEPARTMENTS tables, including data, from the standard Oracle demo – the HR Application.

The DEPARTMENTS table looks like this :

…and the EMPLOYEES like this :

Note that whilst DEPARTMENT_ID is listed in both tables I’ve not implemented any RI constraints at this point.

Now consider the following query

If we ask the CBO for an execution plan for this query…

… it will come back with something like this :

If we now add a constraint to ensure that a DEPARTMENT_ID in the EMPLOYEES table must already exist in the DEPARTMENTS table…

…and then get the execution plan…

…we can see that the CBO is smart enough to know that the RI constraint eliminates the need to read the DEPARTMENTS table at all for this query.

A sensible data model has some other key benefits.

For example…

…results in …

Simply by typing a one-line statement to add this constraint, we’ve prevented the possibility of orphaned records being added to our application.

Better still, this rule will be enforced however the data is added – no just records added via the application.

About now, non-Oracle developers may well be making the point that this logic needs to be implemented in the application code anyway. By adding it to the data model, aren’t we effectively coding the same functionality twice ?

Well, as we can see from the example above, the code required to create an RI constraint is minimal. Also, once it’s created, it exists in Oracle, there is no need to explicitly invoke it every time you need to use it.

Additionally, if you fully adopt the Thick Database approach, you don’t necessarily have to write code to re-implement rules enforced by constraints.

One other point that may well come up is the fact that most of the world now uses some variation of the Agile Development Methodology. Producing a complete data model in Sprint 1 is going to be a bit of a tall order for an application of even moderate complexity.

This is true. However, by implementing the Data Access Layer (DAL) pattern and separating the application code from the underlying data model, it’s possible to create stubs in place of parts of the data model that haven’t been developed. This does make it possible to fit Data Modelling into the structure required by these methodologies.

The key point here is that, even if this is the only bit of the thick db paradigm you implement your app will be far more maintainable.

The tuning tools at your disposal will be far more effective and useful if your application is based on a well defined, relational data model.

Whilst we’re on the subject of Application Code, it’s probably worth asking….

What are Stored Procedures Good For ?

In order to understand this, we need to look at the concept of a Database Transaction.

The ANSI Standard for SQL mandates that a transaction consists of one or more DML statements. In general terms, if the transaction is committed then all of the changes made by each statement in the transaction is saved. Otherwise, none of them are.

By default, many RDBMSs implement a transaction as a single SQL statement. In Oracle, the default behaviour conforms to the ANSI Standard.

In circumstances where a Business Transaction requires multiple DML statements, things can get a bit tricky without a Stored Procedure.

The application needs to issue multiple individual statements and commit each one in turn.

If a second or subsequent statement fails for any reason then you find that your data is left in an inconsistent state.

Stored Procedures solve this problem by bundling these statements up into a single transaction.

We’ll have a look at a specific example of this approach in Oracle using…

PL/SQL

The typical approach taken by vendors to implement Stored Procedures in an RDBMS involves providing some extensions to SQL to make it Turing Complete.

These extensions ( variable declaration, conditional statements, looping) are normally fairly minimal.

Oracle took a rather different approach with PL/SQL.

They took the ADA programming language and provided it with SQL extensions.

From the start then, PL/SQL was rather more fully featured than your average Stored Procedure language.

In the almost 30 years of it’s existence, PL/SQL has been further integrated within the RDBMS engine. Also, the addition of thousands of Oracle supplied libraries (packages) have extended it’s functionality to the point where it can be used for tasks as diverse as inter-session communication, backup and recovery, and sending e-mail.

Being a fully-fledged 3GL embedded into the heart of the database engine, PL/SQL is the fastest language for processing data in Oracle.

This is partly due to the fact that the code is co-located with the data, so network latency and bandwidth are not really an issue.

Yes, and you thought the idea of co-locating code and data was invented when those whizzy NoSQL databases came along, didn’t you ?

PL/SQL allows the developer to take a set-based approach to working with data. You can pretty much drop a SQL DML statement straight into a PL/SQL program without (necessarily) having to build it as a string first.

Furthermore, remember that transactions can encompass multiple database changes. By implementing these in PL/SQL, the entire transaction can be completed with a single database call, something that is not necessarily the case when the Controller code is outside of the database.

Implementing Business Transactions in PL/SQL is commonly done using the Transactional API (XAPI) pattern.

There is one particular aspect of ADA which has become central to the way that PL/SQL applications are written and that is the Package.

Rather than having lots of standalone procedures and functions, it is common practice to group these “stored procedures” into PL/SQL packages.

This approach has several advantages.

Grouping related functionality into packages reduces the number of individual programs you need to keep track of.

PL/SQL packages are stored in Oracle’s Database Catalogue ( the Data Dictionary) as two distinct objects – a Package Header or Specification – essentially the signature of all of the functions and procedures in the package ( package members) – and a Package Body – the actual code.

The Package Header is the object that is called to invoke a member procedure.

Provided you are not changing the signature of a public package member, you can amend the code in the package body without having to re-compile the header.

This means that you can make changes to the transactional code “under the hood” without necessarily requiring any re-coding in the caller to a packaged procedure.

Right, it’s time for an example.

Say we want to change the Manager of a Department. In this case, the current IT Department Manager – Alexander Hunold has decided that he’s far too busy to attend all of those planning meetings. I mean he took this job so he didn’t have to speak to anybody. You can tell he’s not really a people person, I mean just look at that T-shirt.

Diana Lorentz on the other hand, whilst also having the required technical background has a much better way with people.

So, in order to change the manager in the IT Department from Alexander to Diana we need to :

Update the record in the DEPARTMENTS table with the ID of the new manager

Update the EMPLOYEES records for members of that department so that they now report to the new manager

Update the EMPLOYEES record for the new manager so that she now reports to the Department’s overall boss

Among other things, we’ll need to know which DEPARTMENT_ID we need to make these changes for. This would normally be selected from a drop-down list in the Application’s UI, with the name of the Department being displayed to the user but the ID being passed to our procedure.

Whilst the list of Departments is static/reference data and may well be cached on the mid-tier of our application to save repeated database calls, we’ll still need a means of getting this data out of the database initially.

Therefore, we may well have a package that contains two members :

a function to return the department information

a procedure to assign the new manager

Such a package will probably look something like this. First the Package Header…

… and now the body…

Using the Oracle CLI, SQL*Plus to act as the caller, we can see how the function works :

Now we need to call the procedure to change the manager. In order to keep things simple, I’ve cheated a bit here and not included the code to lookup the EMPLOYEE_IDs of Alexander (103) and Diana ( 107).

So, using SQL*Plus once again :

NOTE – it is also possible (and often preferred) to pass parameters by reference when calling PL/SQL. So, the following code would work equally well ( and possibly be a bit more readable) :

We can now see that both of the DML changes have been applied :

The fact that Packages are stored in the Data Dictionary means that Oracle automatically keeps track of the dependencies that they have on other database objects.

This makes impact analysis much easier. For example, if we were going to make a change to the DEPARTMENTS table, we could see what other database objects might be impacted by running the following query on the Data Dictionary :

One more significant benefit of using PL/SQL is that any parameters passed into a stored procedure – whether part of a package or standalone – are automatically bound.

Bind variables are advantageous for two reasons.

Firstly, use of them enables Oracle to re-execute frequently invoked statements from memory, without having to re-validate them each time. This is known as a soft parse. This offers significant performance benefits.

The second, and perhaps more important advantage is that bind variables tend not to be susceptible to SQL Injection strings.

Effectively, calling a PL/SQL stored program unit is the equivalent of making a Prepared Statement call.

Whilst this automatic binding does not render PL/SQL completely immune from SQL Injection, it does greatly reduce the attack surface for this kind of exploit.

In-Memory Processing

In-Memory processing is big at the moment. It’s one of those things like Big Data in that there is lots of enthusiasm around something which, to be frank, has already been happening for many years.

Oracle has some rather sophisticated memory management out of the box.

As already mentioned, SQL and PL/SQL code that is frequently executed, together with the meta-data required to parse it, is cached in memory.

The same is true for frequently used data blocks. In other words, if you have data that is frequently accessed, Oracle will look to store this in memory, thus reducing the amount of physical I/O it needs to do.

This has nothing to do with Oracle’s newfangled “In-memory” option. It’s a core part of the product.

Generally speaking, the more application code you add to the RDBMS, the more efficiently Oracle will work.

Benefits of the Thick Database Paradigm

When measured against the non-functional requirements for our application, the Thick Database approach ticks all of the boxes.

Accuracy

Referential Integrity in the Data Model means that we can prevent incorrect data from being stored.

The flexibility of PL/SQL and it’s close coupling with SQL means that we can easily implement business rules to ensure system accuracy.

By implementing a XAPI layer in PL/SQL, we ensure that there is a single point of entry into the application. Because business transactions always execute the same code, we can ensure that the results of those transactions are repeatable, and accurate.

Performance

As we have seen, a well-defined Data Model allows the CBO to choose the optimum execution plan for each query.

The use of bind variables ensures that frequently executed statements are cached in memory.

The fact that most of the processing happens inside the database engine means that network latency is minimized as a performance overhead.

By it’s very nature, any application that manipulates and stores data will increase the amount of data it handles over time.

This increase in data volumes will start to affect performance.

Oracle is designed and optimized to handle data stored in relational structures. Having a properly defined data model will enable you to maximise the effectiveness of the tuning tools at your disposal.

Maintainability

Having your application code in a single location ( i.e. the PL/SQL XAPI layer) means that code is not replicated across multiple application layers.

As PL/SQL is tightly coupled with SQL, it also means that you tend to need fewer lines of code to implement application functionality.

Having the application code in the database means that dependency tracking comes “for free” by means of the Data Dictionary.

This is especially handy when doing Impact Analysis on any application changes that may be required down the line.

Security

PL/SQL parameters are bound auto-magically. Unless you’re being careless with some dynamic SQL inside of the PL/SQL code itself, these parameters are pretty much immune to SQL Injection.

Still feeling sceptical after reading that ? Good. Whilst I have provided some evidence to support these assertions, it’s not what you’d call incontrovertible.

But I’m getting ahead of myself. Before summarising, I did say that there may be some circumstances where this approach may not be suitable…

When the Thick Database Paradigm may not be appropriate

By it’s very nature the Thick Database approach on Oracle RDBMS puts an Application smack in the middle of an Oracle “walled garden”.

If you ever want to migrate to another RDBMS, the task is unlikely to be straight forward.

Yes, PostgresSQL is similar in nature to PL/SQL. As I’ve never attempted a migration from Oracle to Postgres, I can’t comment on whether this lessens the effort required.

So, if you’re in a situation where you know that your application will need to move to another RDBMS in the short term, the pain of sub-optimal performance on Oracle may be worth the gain when you come to do the migration.

A word of warning here – I have personal experience of applications that we’re only supposed to be on Oracle for six months after Go-live…and we’re still in Production several years later.

Alternatively, you may be a software vendor who needs to support your application across multiple database platforms.

The benefit of having a single code base for all supported platforms may outweigh the overhead of the additional effort required to address the issues that will almost certainly arise when running a Database Agnostic application on an Oracle RDBMS.

If you do find yourself in this situation then you may consider recommending a database other than Oracle to your clients.

It is worth pointing out however, that in either case, a well-designed physical data model where Referential Integrity is enforced by means of constraints will provide substantial mitigation to some of the performance issues you may encounter.

This is certainly not going to help with an application using the Entity-Attribute-Value (EAV) model.

I would suggest that if EAV is absolutely essential to your solution then a Relational Database almost certainly isn’t.

Summary and Further Reading

If you’ve made it this far, I hope that you have at least been persuaded that the Thick Database Paradigm is not a completely bonkers way of writing an application against an Oracle database.

That’s not to say that you’re sold on the idea by any means. As I’ve said already, what I’ve attempted to do here is provide some illustrations as to why this approach is preferred among Oracle Developers. It’s not cast-iron proof that this is the case with your specific application.

What you’ll probably want to do now is read up a bit more on this approach following which, you may well want to do some testing to see if all of these claims stack up.

So, if you want some proper, in-depth technical discussions on the Thick Database Paradigm, these links may be of some use :

Bryn Llewellyn, Oracle Distinguished Product Manager for PL/SQL, has written an in-depth White Paper on this subject – Why Use PL/SQL ?

Dr Paul Dorsey’s presentation is worth a read

There’s an excellent article challenging the conventional wisdom about Oracle Stored Procedures from Morten Braten

If and when you do come to do some testing, it’s important to remember that the benefits of the Thick Database approach – certainly in performance terms – become more apparent the greater the volume of data and transactions the application needs to handle.

Running performance tests against the tiny HR application that I’ve used here is probably not going to tell you too much.

Filed under: Oracle, PL/SQL, SQL Tagged: CBO, database agnostice, dbms_xplan, Ref cursors, Referential Integrity, thick database paradigm, USER_DEPENDENCIES

Mikesmithers.wordpress.com

What’s Special About Oracle ? Relational Databases and the Thick Database Paradigm