Welcome back to the second installment of my blog series on Deconstructing the Document Repository. In the first installment, I tried to give you a foundation in the fundamentals of the repository. This second installment is intended to give you a better perspective on the operation of the repository by deconstructing a real world example: the content upload component of the ALUI knowledge directory.
Rather than deconstruct the actual functionality of the KD, I'm going to focus more on the upload component's 'bridge' role as it relates to the KD. By the end of this post you should have a good idea of what the DR is used for, and exactly how it is used in this capacity.
Content Upload
The content upload component of the ALUI portal product allows us to directly upload files into the Knowledge Directory. If you are unfamiliar with the KD and its basic operation, I'll sum it up in one sentence: a directory of links to documents and other web sites that is searchable and organizable based on a structure similar to your file system. If you're still in the dark, check out the Administratoration Guide to the portal for details.
The upload component itself is nothing more than a way to provide document upload and download capability directly in the KD. Most of the time, the KD is used to index other sources of content, such as other web sites or document stores (intranet sites, content management systems, etc). In the case of content upload, the goal is to allow users to directly upload files to the KD, apply security, and then directly download them, without a back end repository (other than the DR).
Knowledge Directory Cards
The KD operates using a 'card' indexing system, sort of like a library card catalog (or, at least, that's how it was explained to me). The directory is merely a folder system of index cards, organized according to a taxonomy. The cards themselves are pointers to an HTTP based content source. Requests for the document are redirected through the portal to the underlying content source, the data being provided by the underlying provider rather than the directory itself (a content source, in portal parlance).
How does the Document Repository fit in?
How indeed? The problems with integrating the document repository into this system are manifold:
- First, the document repository has no central directory of documents like a traditional content management system. In other words, it doesn't know how many documents it contains, or where those documents are.
- Second, even if the DR knew where those documents were located, it does not know the content type or names of the documents.
Thus the creation of the content upload component of the portal, which serves to alleviate both of these problems. The content upload service is really just another web service that wraps the document repository and provides an HTTP link for downloading documents, as well as a way to translate between the document in the repository and the type of document the user is downloading.
How does it work?
The content upload service is actually spread across multiple pieces of the portal:
- The card creation and document upload interface (the part of the KD where you actually choose the document you want to submit)
- The web service that allows for single URL download of a document from the repository
- The PTUpload application folder in the document repository
The actual process works something like this:
- User navigates to knowledge directory, chooses to upload a document directly
- User is redirected to the Directory (Dir) activity space
- Dir activity space posts multi-part form data (the uploaded document) to the upload data source, via a gatewayed URL
- The document is read into the upload component and redirected to a store in the document repository
- Upon upload completion, the DR repsonds with a document ID and a 302 redirect back to the portal (see part 1 for more info on document IDs)
- The document ID is forwarded back to the portal via the querystring in the redirect URL
- The Dir activity space uses the ID in the URL, along with the document content type and name, to create a card in the knowledge directory
- Subsequent requests for the document via the card are responded to by the upload component, which grabs the document from the repository, sets the content-type in the response and sends it back to the user (again via a gatewayed URL)
All of the information that relates to the individual document (where it is in the repository, what its original name is, and what its content type is) is stored in the Knowledge Directory card for the document. That means the upload component really is just a middle man, with no underlying database or knowledge store. Cool, huh?
You can find most of the source code for this entire process in the UI source code distributed for portal developers. The functions that create KD cards are located in com.plumtree.portalpages.browsing.directory.DirModel. Specifically, check out the SubmitCardWithPropertyBag function. Source code for the document submission user interface can be found, among other places, in the com.plumtree.portalpages.browsing.directory.documentsubmitsimple package.
KD Card Properties
As I mentioned above, the actual information about the document is stored in various properties of a knowledge directory card. If you click the Details link for an uploaded document, you can view these properties with the insight I've provided below:
- Open Document URL - URL the knowledge directory will use to open the document
- Document Upload Repository Server - the file ID, including directory, in the document repository
- Document Upload DocID - the application name in the document repository (this is always ptupload)
- URL (Customized Document Property) - an encoded value which specifies the following:
- Document ID in the repository (including folder ID)
- Content Type of uploaded document (this is important so that the user's browser correctly recognizes downloaded documents)
- Original file name of the document
The URL property can be decoded using the below (simple) function, which references the class com.plumtree.portalupload.common.Utilities (this is in documentupload.jar in the ptupload.war file installed with the upload component):
public static final String[] parseDocumentProperty(String propertyValue) {
if (!propertyValue.startsWith("download")) return null;
StringTokenizer tokenizer = new StringTokenizer(propertyValue, "/");
tokenizer.nextToken();
try
{
String id = Utilities.simpleDecode(tokenizer.nextToken());
String contentType = tokenizer.nextToken() + "/" + tokenizer.nextToken();
String fileName = Utilities.simpleDecode(tokenizer.nextToken());
return new String[] { id, contentType, fileName };
}
catch(Exception e)
{
return null;
}
}
Document Repository Configuration
The document repository itself is configured in much the same way as discussed in part 1 of this series. That is to say, there is an application defined for the upload service called PTUpload, which in turn contains an Active/Archived setup. The content upload service configuration contains the requisite dr.xml under settings/config.
PTUpload's Dirty Little Secret
There is one problem with this whole setup, which I have alluded to with some of my comments on repository functionality. That is, there is no way for the repository to know when a document is deleted from its client application. The application itself has to tell it. It may be possible that I've missed something with the weekly housekeeping agent, but it has been my experience that documents which are uploaded directly via the content upload service are not removed from the repository when they are deleted. That being the case, there are two possible solutions to this problem:
- An application could be developed which would search the repository for all uploaded document IDs, then compare them to the current file/folder structure in the PTUpload directory of the repository.
- A Model/View override for the Directory space could be developed which would signal the repository that a card has been deleted.
AFAIK, Neither has been developed at the time of this writing, but would not take an undue amount of effort, if it were determine this was a problem.
Summary
Hopefully, this has been another enlightening installment on my blog. Comments or questions welcome; as always I make no guarantees as to the accuracy of this information in the past, present or future (but I'm pretty sure it's right). Next time, the mysteriously named third portion of this series: Part 3 - Utilizing the Repository.
Leave a comment