ESRI White Paper – ‘An Overview of Distributing Data with Geodatabases’

by Nate on July 18, 2007

This past week, ESRI released several new white papers. The one that I’m most interested in is “An Overview of Distributing Data with Geodatabases“. This is the white paper that was promised during the “Managing Distributed Data with Geodatabase Replication” session at the ESRI 2007 UC.

The first several pages are full of the now obligatory replication/synchronization diagrams which, while maybe helpful to some, I’ve seen at least a hundred times now. The post then, however, goes into the guts of the different techniques, including geodatabase replication, DBMS replication, and simple copying and loading of data. Here’s a quick summary of each technique:

  • Geodatabase replication replicates data from a source (parent) geodatabase to a destination (child) geodatabase and creates the infrastructure to allow for tracking and synchronization of changes. Geodatabase replication is fairly powerful; it allows parameters to be defined that restrict the data that are replicated. At 9.2, geodatabase replication can only occur on SDE geodatabases, but at 9.3 SDE geodatabases will be able replicate one-way with a file or personal geodatabase. There are different types of replicas, including check-out/check-in, one-way, and two-way.

Developers can work with replicas through the GeoDatabaseDistributed ArcObjects library components. If you just need coarse-grained access to the geodatabase, you can use the GeodataServer object model. If you need more fine-grained access, however, you will want to use some of the older ArcObjects object models which allow you more capabilities but restricts you to working with geodatabases that live on your Local Area Network.

Geodatabase synchronization can be automated using the synchronize changes geoprocessing tool (create a Python script, call it from a script (.bat or otherwise), and schedule it using Windows scheduler).

In cases where large amounts of data are stored in a geodatabase and need to be moved to a remote office and synchronized, it may be best to detach the database from the parent, send it to the child (mail it, etc.), and then reattach it. You can then use the ‘register with existing data’ tool to create the replica.

  • DBMS replication is good for replicating an entire geodatabase to a read-only geodatabase in a connected system. Therefore, this is a good option for load balancing and fail over. The major limitation is the inability to choose what gets replicated.

To setup and use DBMS replication, you must first understand how a geodatabase interacts with its underlying database.

  • Data copying and loading is the “low-tech” way of setting up replication between multiple geodatabases. There are multiple geoprocessing tools available to help automate and schedule this process. Using this method, however, there is not a great way to verify data integrity during the replication. You can write your own code to avoid redundancy, but this is a lot of work compared to just using one of the other replication techniques.

This white paper didn’t go as in-depth as I was hoping for. It does, however, tie the concepts together, and it contains links to other helpful resources. We’ve been using one-way replication successfully for a while now, and are looking to setup several two-way replicas with some remote offices here shortly. I’m sure you’ll hear back from me about our experiences.

Here are a couple of other resources to help you learn more about distributed geodatabases. The technical brief is especially useful:

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post:

Next post: