So, the World Cup is in full swing.
Now the lesser teams have fallen by the wayside ( England), we can get on with enjoying a feast of footie.
As well as a glut of goals, the current tournament has given us a salutory reminder of the importance of diet for elite athletes.
After predictably (and brilliantly) destroying England single-handedly, Luis Suaraz found himself a bit peckish and nipped out for an Italian. Now the whole world seems to be commenting on his eating habits.
Like Luis, you may find yourself thinking that you’ve bitten off more than you can chew when confronted by DBMS_DATAPUMP.
The documentation does offer some help…to an extent. However, the whole thing can seem a bit fiddly, especially if you’re used to the more traditional command-line interface for Datapump.
What follows is a tour through DBMS_DATAPUMP based on my own (sometimes painful) experience, broken down into bite-sized chunks.
Much of the functionality to filter object types and even data is common to both Exports and Imports.
So, the approach I’ve taken is to cover the Export first, with a view to finally producing a Full Database export.
I’ve then used the Import process against this to demonstrate some of the package’s filtering capabilities.
So, what’s on the menu today ?
Privileges required to run a DBMS_DATAPUMP job from your current schema and for the whole database
Running a consistent export
Running datapump jobs in the background
Monitoring running jobs
Importing from one schema to another
Specifying the types of objects to include in Exports and Imports
Specifying subsets of data
DDL only Jobs
How to Kill a Datapump Job
The full code examples have all been written and tested on Oracle XE 11gR2.
I’ve tried to maximise the use of in-line hard-coded values and minimise the number of variables in an attempt to make the code easier to follow.
Also, in these examples I’ve made use of the default DATA_PUMP_DIR directory object, but you can use any directory object to which you have the appropriate privileges.
For dessert, there are a couple of other DBMS_DATAPUMP features that I have found useful that are specific to Enterprise Edition ( in one case, with the Partitioning Option) ;
Including specific table partitions
Parallel processing
So, a fair bit to get through then. I hope you have an apetite…
Privileges required for using DBMS_DATAPUMP
Obviously, the first thing you need is execute privileges on DBMS_DATAPUMP. By default, this is granted to PUBLIC, although the more security conscious DBAs will have instituted more rigorous controls over just who gets access to this package.
The other thing you’ll need is READ and WRITE privileges on a database Directory Object.
DATA_PUMP_DIR
The DATA_PUMP_DIR directory object is created by default (but any old Directory Object will do)…
The HR user, for example, would need to be granted…
That’s pretty-much it. With just these privileges( along with those for creating/altering the relevant object types), you can run DBMS_DATAPUMP to export/import objects and data in your current schema.
Once again, assuming we’re connected as HR :
It’s probably worth noting that dbms_datapump jobs contain four main components as a rule.
First DBMS_DATAPUMP.OPEN creates a kind of container to hold the definition of the job.
Next, you specify any particular requirements for the job. This must always include reference to a dump file.
Then, use DBMS_DATAPUMP.START_JOB to kick-off the job you’ve defined.
Finally, you can either wait for the job to finish (as in this case), or leave it to run in the background. We’ll come onto background execution in a bit.
Roles required for Full Export/Import
If you want the God-Like powers to export/import the entire database then you will need the roles :
DATAPUMP_EXP_FULL_DATABASE
DATAPUMP_IMP_FULL_DATABASE
These roles are granted to the DBA role and the SYS user by default.
Incidentally, the fact that roles play quite a significant part in DBMS_DATAPUMP priviliges means that, if you do intend to wrap this functionality in a package, it would probably be wise to create it with Invoker Rights – e.g.
Now that’s all sorted…
Pretty much the same as with the schema export we’ve already done. However, this time, we’ve specified the job_mode as FULL rather than SCHEMA.
With this script saved as full_exp_simple.sql…
Depending on the size of your database ( and the capacity of the server you’re running on), this may take a while.
When it’s all finished, you can wander over to the DATA_PUMP_DIR on the OS and you’ll see the fruit of your labours…
If we look at the logfile, we can see it starts like this :
… and ends with …
We now have an export. If, like me, you’ve just run this from a database where you know that there are no other active sessions, then you should have a nice consistent export. Unfortunately, this isn’t always the case…
Making your Export Read Consistent…and other tweaks
To adress this issue of consistency, we’ll need to change our script a bit. Whilst we’re at it, it would also be nice to have some record in the logfile of when the job started so we can work out how long it actually ran for.
Finally, we’d like to be able to run the export in the background so we don’t have to hang around waiting for it.
Consistent = Y – the DATAPUMP equivalent
Those of you old enough to remember the original export utility will recall that you could ensure that data in an export was referrentially consistent by the simple expedient of specifying consistent = Y in the exp command.
The equivalent in DBMS_DATAPUMP is to specify a value for FLASBACK_SCN.
NOTE – in order for this to work, your database must be running in ARCHIVELOG mode. This is especially relevant if you’re playing along on XE, which runs in NOARCHIVELOG by default.
To check the current status of archiving on the database :
If the query comes back NOARCHIVELOG then you need to enable archiving.
To do this, you need to connect as SYSDBA, shutdown and then mount the database, before starting archiving and then re-opening the database.
These steps can be achieved as follows once connected to the database as sysdba :
NOTE – these steps are intended if you are messing around in your own database ( e.g. XE on a laptop, like I am here). If you’re on a controlled environment, then you need to get your friendly neighbourhood DBA to do the honours.
Assuming the database is now in archivelog mode, the next thing we need to do is find SCN that we can pass to DBMS_DATAPUMP for it to use as a reference point for what we mean by consistent.
The SCN – System Change Number – is incremented every time a transaction completes in the database.
A quick demonstration is probably in order….
The fact that the SCN increments by more than one after each transaction completes in this session is explained by the transactions being run by the oracle background processes on my database.
Anyway, if we do want to find the current SCN, or even an SCN for a given time and date, we can simply use the
SQL TIMESTAMP_TO_SCN function :
In order to pass this information to DBMS_DATAPUMP, we need to use the SET_PARAMETERS procedure.
This should do the trick :
Adding a message to the logfile
Next on our to-do list to improve our export is a message in the logfile recording when the export job started.
Oh look, there’s a package member called LOG_ENTRY. I wonder if that will help …
Incidentally, we could include the SCN we’re specifying in the message as well ( although it would need to be captured in a variable before use in both the SET_PARAMETER call above and the LOG_ENTRY call).
One point to note – any entry in the logfile that results from a call to this procedure always starts with “;;;”.
Running the job in the background
Just in case you don’t fancy the idea of hanging around waiting for the job to finish, you can replace the call to WAIT_FOR_JOB with this…
The New, Improved Datapump Export Script
If we now apply all of these changes to our original script, it should look something like ….
Monitoring the job
The USER_DATAPUMP_JOBS view (there is also a DBA_ version available) contains details of any currently defined datapump jobs.
Additionally, during the export, DATAPUMP will create a temporary table with the same name as the job.
In this table, you can see which object (if any) it’s currently working on…
There is an example of how to get “real-time” updates in the Oracle Documentation.
Unfortunately, it relies on DBMS_OUTPUT so the messages don’t get echoed to the screen until after the job is completed.
The simplest way to find out what’s happening right now is to check the logfile.
On Linux, for example, you could simply do this :
If you haven’t got command line access to the database server, or simply prefer to keep everything in the database, then you could just create an external table based on the logfile. After all, you already have the required privileges on the directory …
If we want to check progress, we can simply “tail” the file like this :
So far, we’ve looked exclusively at Exporting everything either from a given schema or an entire database.
Datapump also allows you to filter the data. Generally speaking, these filters apply whether you are importing or exporting.
Importing data from one schema to another
One of the many reasons that you may want to take a DataPump export may be to refresh a development environment.
It’s possible that the schema you’re refreshing on the Development database has a different name from the one on your Production environment. Let’s say, for the sake of argument, that we have a HR_DEV user in our development environment…
Fortunately, importing the HR objects in the export file into the HR_DEV user is fairly simple :
Specifying schemas and object types
At this point, let’s say that we only want to import the HR schema. The rest of the objects in the export file aren’t really relevant to us.
Furthermore, let’s assume we only want to import Tables and Sequences as we’ll re-create all of our PL/SQL stored program units, views etc from source code files.
Filtering by schema
Allow me to introduce the METADATA_FILTER procedure :
Once again, not as painful as it might appear. However, it’s here where we begin to see the benefits of quote delimiters.
Filtering by object types
Here’s where things get a bit more interesting.
Remember the export logfile. At the beginning, there were a number of entries like :
These paths are the basis for how datapump determines what to export.
Fortunately they are stored in some publicly available views :
DATABASE_EXPORT_OBJECTS
SCHEMA_EXPORT_OBJECTS
TABLE_EXPORT_OBJECTS
We’re doing a SCHEMA import so we can check to see that the relevant object_path is available to us by :
The Path Map looks to be a flattened hierarchy ( possibly an XML representation). The point here is that, by specifying a node in this hierarchy, you can persuade DBMS_DATAPUMP to do your bidding.
As both TABLE and SEQUENCE are nodes in the object_path, we should be able to use that here…
This will give us everything at the level at and below the TABLE and SEQUENCE nodes. Therefore, you will also get INDEXES for the tables…as well as Triggers.
Remember, we don’t want any PL/SQL program units, so we need to filter these out. Fortunately, calls to the METADATA_FILTER procedure seem to be addative. As TRIGGER appears below TABLE in the hierarchy, we can filter them out with an additional call to the procedure :
The finished HR Import Script
Here it is then, a script to Import Tables and Sequences from HR to HR_DEV :
Now to give it a whirl ( saved as hr_dev_imp.sql)….
Looking at the logfile, all appears well…
If we now connect as HR_DEV and check the objects we have in our schema :
Sequences are special
There is one thing to be aware of with importing sequences.
Most database objects ( procedure, packages, triggers, views etc), can be overwritten by using CREATE OR REPLACE.
Like tables, you cannot do this with Sequences.
Unlike tables, DBMS_DATAPUMP does not have an option to re-create existing sequences.
This means that, if we were to refresh the HR_DEV schema again using the same script, we’d be likely to run into a bit of a problem, as you can see from this log file entry :
With this in mind, it’s probably a good idea to drop any sequences prior to refreshing with an import…
If we run this we’ll get…
Specifying a subset of data
Sometimes you don’t want to export/import everything. You might just want a few tables with a subset of data.
In our HR_DEV environment we want to focus on the EMPLOYEES table. We don’t want all of the rows – let’s just have one department…
Specifying which tables to use
Once again, METADATA_FILTER can be used here :
NAME_LIST tells datapump to look for object names rather than in object type paths.
Specifying the object type path as TABLE means that datapump will only look for a table called EMPLOYEE and not any other type of object with the same name.
Getting a subset of data
Here, we need to use the DATA_FILTER procedure. Unlike it’s METADATA counterpart, you need to provide a syntactically correct predicate for it to work…
The Data Sub-set Import Script
Before running this, I’ve taken the precaution of dropping all of the objects from the HR_DEV schema as I don’t want to run into any pesky constraint errors…
Notice that I’ve also specifically excluded the REF_CONSTRAINTS and TRIGGERS from the import.
Run this and we get the following output in the logfile :
We can see that only the EMPLOYEES table and it’s associated indexes have been imported :
…and no Foreign Key constraints :
and only those EMPLOYEES in DEPARTMENT_ID 20…
DDL_ONLY Datapump operations
Once again the DATA_FILTER procedure comes in here. However, this time, the call is a bit different :
Here the value is effectively a boolean – 0 = false i.e. don’t include rows. The default is 1 – do include rows.
Incidentally, this time I’ve dropped the user HR_DEV altogether before importing as it will be re-created by the Import if it does not already exist.
This next script will import all of the DDL from HR to HR_DEV, but will not import any data…
After running this we find that the HR_DEV user has been created. However, you will need to connect using the password of the HR user included in the export.
We can see now that all of the HR objects have been imported into HR_DEV :
…and just to prove that we’ve imported no data at all…
Killing a runaway job
There will be times when things don’t quite work as expected. This is especially true if you’re playing around with Datapump parameter settings.
Just occasionally when testing the code in this post I’ve come across some variation on the theme of …
Yep, I’v fixed my runtime error, re-executed the scrpt and got this insistent little message.
The first thing to do when you get this ( especially if you’re running in the same session that you started the original job) is to logout and log back in again.
The second thing to do is to check to see if the job is still hanging about…
If so, then you can attach to the job and stop it.
If I’m connected as the job owner and I’m only running the one job, the following script should do the trick…
This should work almost every time. On the off-chance it doesn’t, and the job does not appear in the USER_DATAPUMP_JOBS view anymore, then it’s possible that the temporary table created as part of the job has not been tidied up.
Remember, the temporary table is created in the job owner’s schema with the same name as the datapump job itself.
So, if I were to hit this problem whilst running the job IMP_HR_DEV, I could check :
If the table does still exist and there is no entry in the _DATAPUMP_JOB view, we can simply drop it :
This should finally persuade datapump that the job is not in fact running.
A couple of Enterprise Edition Features
For those of you on a richer diet, there are a couple of further morsels which may well be worthy of consideration.
If you want to speed up your Datapump job by making use of the CPU cores available you could try :
If you’ve really pushed the boat out and have the partitioning option, you can tell datapump to take only specific partitions for tables.
For example, say you have a number of tables that are partitioned by date ranges.
The tables may be partitioned by quarter-year, with the partitions following a consistend naming convention (e.g. Q12014, Q22014 etc).
If we only want to export/import the latest partition for each of these tables( say Q22014), we can do something like this :
Despite having gorged ourselves on this feast of DBMS_DATAPUMP delights, it’s fair to say that I’ve by no means covered everything.
Doubtless you will want to do things with this API that I’ve not covered here. Indeed, you’ll probably also find better ways of implementing some of the same functionality.
In the meantime, I’m off for a digestif.
Bon Apetite.
Filed under: Oracle, PL/SQL Tagged: archivelog mode, database_export_objects, dba_datapump_jobs, dbms_datapump examples, dbms_datapump.add_file, dbms_datapump.data_filter, dbms_datapump.detach, dbms_datapump.log_entry, dbms_datapump.metadata_filter, dbms_datapump.metadata_remap, dbms_datapump.open, dbms_datapump.set_parallel, dbms_datapump.set_parameter, dbms_datapump.start_job, dbms_datapump.stop_job, dbms_datapump.wait_for_job, DDL Only import, drop sequence, EXCLUDE_PATH_EXPR, external table, flashback_scn, INCLUDE_PATH_EXPR, INCLUDE_ROWS, NAME_LIST, ora-31634, PARTITION_EXPR, REF_CONSTRAINTS, remap_schema, Running a Datapump job in the background, schema_export_objects, SCHEMA_EXPR, scn, SUBQUERY, TABLE_EXISTS_ACTION, table_export_objects, tail a logfile from sql, timestamp_to_scn, user_datapump_jobs, v$database.current_scn, v$database.log_mode