Tuning Java Database Performance: Understanding the Role
of the Driver
Isolating performance issues in Java database applications is not a
straightforward affair. Few developers know that the
ability to fine-tune the JDBC driver can be just as
influential as the database itself or the SQL queries that
call it. Get the basics of JDBC optimization.
by Robert Brunner
o
one likes to wait, especially your customers. If you can't
keep them satisfied, your competitors will. This leads many
developers to spend countless hours trying to optimize the
performance of their applications. Fortunately for Java
developers, many tools and techniques have been developed
that facilitate this effort, from graphical integrated
development environments with interactive profiling and
debugging capabilities to language features, such as
multi-threading, that simplify interactive programming.
When using a database management system from Java,
however, the landscape isn't as clearly marked. How do you
know if performance is being limited by your application,
the database, or your JDBC driver?
Simply put, when using a database, your application must
use a JDBC driver in addition to the database system—all
communicating over the network. This means that multiple
elements besides your code affect your application's
performance. These include:
Often overlooked in this sea of potential pitfalls, yet
vitally important, is the JDBC driver itself—because, as you
will see, all JDBC drivers are not created equal.
JDBC Drivers—Simple! Or Are They?
The basic tenants of using JDBC to communicate with a
database are rather simple: register the driver, create a
connection, execute a SQL statement, and process the
results. This simplicity often leads programmers to think
they know JDBC, when they have in reality barely scratched
the surface. Any car can get you between point A and point
B, but we all know that not all cars are the same—certain
cars make the trip faster (and more enjoyably).
Of course, despite their similarity in functionality,
automobile manufacturers distinguish their products by
utilizing unique features that improve security, safety, or
performance. The same is true for JDBC drivers, where
different vendors bring different backgrounds and
experiences to the table when interpreting the JDBC
specification. Unlike the situation for automobiles,
however, the JDBC specification provides a great deal of
leeway to driver vendors. In fact, large parts of the
specification are optional, leading to the creation of many
JDBC drivers that accomplish the basics at the expense of
degrees of performance and stability.
Take, For Example, Database Transactions
One of the most important areas where this is evident is
with database transactions. By default, JDBC applications
utilize autocommit mode, in which every SQL statement is
executed in its own transaction. While easier on the vendor
writing the JDBC driver, this severely impacts performance,
due to the implicit database overheads in creating and
completing a transaction. In a multi-user environment, such
as an e-commerce site, an additional complication arises due
to the visibility of a query.
A given transaction, and therefore a given JDBC
application, is always aware of any changes that it makes to
the underlying data. But what about modifications made by
other transactions? Tackling this issue involves balancing
concurrency with reliability and properly utilizing
different levels of database locking.
Dirty Reads, Nonrepeatable Reads, and Phantom Reads
Controlling data consistently through database locking
generally means deciding whether to allow dirty reads,
nonrepeatable reads, or phantom reads. A dirty read
occurs when the results in one transaction are modified
by another uncommitted transaction, which is later
discarded. The data in the first transaction is invalid,
or "dirty." A nonrepeatable read occurs when one
transaction repeatedly retrieves data, while a different
transaction alters the underlying data. This causes
different, or nonrepeatable, results to be read by the
first transaction. A phantom read is similar to a
nonrepeatable read, except that the number of rows is
changed either by insertion or deletion, so that the
first transaction contains "phantom" data.
Fortunately, the JDBC specification allows developers
to decide how transactions should interact by passing
parameters to the appropriate constructor during the
creation of Connection and ResultSet objects.
Transactions also can now utilize savepoints, which
reduce transaction overhead by providing checkpoints for
saving or ignoring modifications during a long-running
transaction. The specification also provides for batch
processing and updateable ResultSet objects, which also
affect performance and transaction behavior. Of course,
how a JDBC driver implements these features—if it even
does—greatly affects the performance of your
application.
Handling Network Communication
Another area where the performance of JDBC drivers
differs is in handling network communication. Primarily
this arises when the JDBC driver transfers the results
of a query from the database server to the Java
application. Depending on the needs of the application,
a developer might only want a few rows transferred, such
as in a search engine, or all the rows transferred, such
as in a billing service. The JDBC specification allows a
developer to specify the order and number of rows that
are fetched by the driver via the setFetchDirection and
setFetchSize methods in the Statement interface. It is
up to the driver vendor, however, in how, or even if, to
implement these features.
Choosing a Vendor—Easier Than You'd Think
Using the car analogy, a car's performance can be
improved by a knowledgeable mechanic. In fact, the first
step taken by a good mechanic is to obtain as much
diagnostic information as possible. The same is true
when using a JDBC driver. The beauty of Java
specifications, like JDBC, is that different vendors
provide competing implementations, which can and
generally do work in different ways.
By now, most developers understand that Type 4 JDBC
drivers generally offer the highest performance. But how
do you choose a JDBC driver? Many developers
(incorrectly) assume that a database vendor provides the
optimal JDBC driver; after all, they wrote the database
so they should be able to write the best JDBC driver for
their database. This is not the correct line of
reasoning because a JDBC driver is in reality a
middleware product, not a database product.
A Diagnostic Tool Helps Too
One of the most recognized anecdotes in the computer
industry is the 80-20 rule. When it comes to the
performance of JDBC applications, this rule states that
80 percent of performance bottlenecks are due to either
the actual JDBC application or how the application
interacts with the JDBC driver, while 20 percent is due
directly to the JDBC driver. Telling the difference can
be difficult, even for the most advanced JDBC developer.
What you need is the right diagnostic information, in
order to know whether your performance is hindered by
your SQL statements or locking convention.
Many driver vendors also provide a variety of
diagnostic tools that allows a developer to "spy" on a
JDBC driver, even when it is used by other users. In
this way they can see what is truly going on "under the
hood," and identify bottlenecks, whether they are in the
JDBC code or the driver.