hross: September 2007 Archives

Collaboration Clustering Explained

| | Comments (0) | TrackBacks (0)

Of all the products shipped in the ALUI line, other than the portal itself, the most likely candidate for load balancing/failover/clustering is the Collaboration server. Here's a brief explanation of how Collaboration clustering works, and what you can do to watch it in action.

The clustering mechanism itself uses an open source implementation of JChannel/JGroups. This allows most of the mechanism itself to be abstracted, even from the product code, and leave the implementation details to someone else. That said, if you want to know how it works, take a look at the cluster.xml file under $PT_HOME/ptcollab/4.2/settings/config (you can turn clustering on and off in config.xml).

By default, collab uses a multicast UDP approach to clustering. When something happens on one collaboration instance, it broadcasts this event to a multicast UDP address and continues on its merry way. It uses a custom JChannel implementation to ensure reliable delivery, find out which nodes in the cluster are alive, and order messages. If you are curious to know more, check out the lan-multicast-cluster element in cluster.xml. You can find some interesting documentation on this here.

Some networks, however, don't support multicast UDP as a transport mechanism. After all, it can be chatty in its implementation, and is blocked a lot of times at the router level. As such, JGroups allows for implementation of the same protocol using unicast TCP or UDP. You can configure this in the cluster.xml as well (although you will have to specify which hosts to broadcast to). Take a look at the comments in this file to find out more.

Finally, there are going to be times when you are going to want to know what's going on with the cluster. Luckily, I recently stumbled across a built-in utility while digging around in the collaboration code. It can be run from a command line, and only requires that you extract the javagroups.jar from the collaboration.war before running it (thanks, Kenan, for the script):

SNOOP_HOME=$PT_HOME/ptcollab/4.2
export SNOOP_HOME
java -cp $SNOOP_HOME/lib/java/collab-core.jar:$SNOOP_HOME/webapp/temp/WEB-INF/lib/javagroups.jar
com.plumtree.core.cluster.tool.ClusterSnoop $SNOOP_HOME/settings/config/cluster.xml

Here is some sample output from the utility:

Message Source = localhost:51044
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51044
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51044
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.GadgetCacheInvalidateClusterMessage[projectID=00000,functionalArea=1]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.ProjectModifiedClusterMessage[projectID=00000]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.PreferenceInvalidateClusterMessage[preference=G_0_tcic]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.GadgetCacheInvalidateClusterMessage[projectID=00000,functionalArea=16]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.GadgetCacheInvalidateClusterMessage[projectID=00000,functionalArea=63]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.GadgetCacheInvalidateClusterMessage[projectID=00000,functionalArea=16]
Message Source = localhost:51044
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51026
Cluster Message = com.plumtree.core.cluster.message.PingClusterMessage[cluster ping]
Message Source = localhost:51044

Cool, huh?

Monitoring Notification Server

| | Comments (0) | TrackBacks (0)

You know those pesky collaboration notifications you're always trying to keep track of? Who's to know if they're really getting sent off to the people you think they are? Unfortunately, the notification logs themselves aren't much help, since they only log errors, not successes. What that means is you have to wait for the notification queue to fill on Collaboration server or an error to occur if you want to see something go wrong.

But what if we could send out our own notifications without using the UI, so they could be controlled and monitored via a script? What indeed...

import com.plumtree.collaboration.messaging.notification.messages.NotificationUtil;
import java.util.ArrayList;

public class TestNotifications {
    public static void main(String args[]) {
        
        if (args.length <= 0) {
            System.out.println("usage: TestNotifications {userIds..}");
            return;
        }
        
        System.out.println("Initializing notification utility...");
        NotificationUtil nUtil = NotificationUtil.getInstance();
    
        // parse each argument as a user id and add it to the recipients list
        System.out.println("Creating new recipient list...");
        ArrayList emails = new ArrayList();
        for (int i = 0; i < args.length; i++) {
            try {
                Integer ix = new Integer(args[i]);
                emails.add(ix);
                System.out.println("Added userid: " + args[i]);
            } catch (Exception ex) {
                System.out.println("Could not parse userid: " + args[i]);
            }
        }
        
        System.out.println("Sending error notification email...");
        nUtil.sendEmailErrorNotification("This is a test notification.", emails);
        System.out.println("Mail send completed.");
        
        try {
            Thread.sleep(10000); // sleep long enough to send the message (10s)
        } catch (InterruptedException iex) {
        } finally {
            // we need to kill all threads, since notification is event driven
            System.exit(0);
        }
    }
}

The code above will allow you to send notifications (it's written in Java) via a command line. You simply need to include the jar's in $PT_HOME/ptnotification/4.x/lib/java in your classpath (probably not all of them, but I haven't bothered to figure out which ones). The notification itself will be an error message with the text "This is a test notification." in the body.

Here is the script I used to launch it (DOS users will need to do some conversion).

Here is the jar file with source.

You can set up a simple notification monitor by creating a new user in the portal and changing the email address property to the email account or distribution list of your choice. Next, record the user ID and pass it on the command line to this utility. Have fun!

(generic disclaimer: I make no guarantees this code will work, properly, etc, etc)

What's in a Card?

| | Comments (0) | TrackBacks (0)

We fear what we do not understand. As a consequence, one of the things that has scared me in the past is how the Knowledge Directory (KD) and search work (cards, indexing and metadata, oh my!). However, an emphasis on preserving KD cards and ensuring crawlers operate properly at a major client means that my fear has been greatly reduced. As has become a theme with this blog, the reader should benefit as a result of my suffering.

Knowledge Directory Cards

The KD has a very simple concept at its heart: a centrally located  list of information about documents and their locations. The documents themselves are elsewhere: on a file system, in a content management system, on a web site. They are accessible via the crawler concept, which basically allows for a categorization of HTTP accessible document links and corresponding information about those documents.

The folder structure you see in the KD user interface is a bit misleading, since it's really just a flat set of document metadata. The folders are just for human understanding and classification. That's why duplicate documents will show a (1), (2), etc, no matter where they are in the KD (you can "fix" this with a simple portal customization, but that's another post).

What's in a Card?

So what is all this stuff that's in a card, and how can you check it out ? The easiest way, in my opinion, is to query the portal database. All of the data for KD cards is in the following tables in the portal database: PTCARDS, PTCARDPROPERTIES, PTINTERNALCARDINFO. If it's not there, it's not in the KD. How about a bit more detail?

PTCARDS - This is all basic information: the name of the document, its description and instructions to the portal on what URL opens it.

Brief Sidebar: Why doesn't the portal build URL's dynamically? Well, for starters there's your browser: you want someone downloading a file to get the relevant txt, mpg or jpg at the end of a file name so they know what to do with it.  The second reason is that you may or may not want to point the user's browser directly at the document location (rather than gatewaying the request through the portal). Remember, there are some content sources you'll want to allow to use their own security mechanisms to prevent unauthorized access and there are other sources, like web sites, that are better off being accessed directly, both from a traffic load standpoint, and from a usability perspective (when I click a link to Google, I want by browser's URL to say http://www.google.com).

PTCARDPROPERTIES - This table contains all of that juicy metadata. When you view card properties in the portal, this table pretty much sums it up. It also has meta information that tells a crawler or search indexing agent what to do with the document (e.g. should it be searchable?). You'll notice there is a weird XML format to the data in this table. That's because the data is stored using the somewhat controversial PropertyBag structure.

You can find some good examples of the card submission internals by going back to our old friend the UI source. The key thing to look for in the case of card properties is the com.plumtree.server.PT_CARD_SETTINGS class, which contains constants for the internal crawler metadata. It should also give you a good idea of what I mean by property bag.

PTINTERNALCARDINFO - The fun table. This table tells the card refresh agent about what to do with the card. Most of these settings can be changed/manipulated in the crawler options screen. A few relevant fields are listed below, with some explanations:

  • CRAWLERID - object ID of the crawler this card belongs to.
  • DATASOURCE - object ID of the data source this card belongs to
  • REFRESHDATE, LASTREFRESHED, REFRESHRATEUNITS - Properties that determine and store the last time the document properties were refreshed in the KD, as well as the next time these properties should be refreshed.
  • EXPIRATIONDATE - Controls the expiration date of the document. If this date/time passes, the document is deleted. This property is set in the crawler configuration under Document Settings | Document Expiration
  • MISSINGDELETEUNITS - Set in the crawler configuration under Document Settings | Broken Links. If set, this property determines the amount of time past the refresh date a card will be deleted if it is not found in the crawler (NULL means it won't be deleted).
  • LOCATIONA_CRC, LOCATIONB_CRC - CRC values calculated from the document location. I'm not quite sure what these are actually used for.

Update (great info from reader danyadsmith -- thanks for the info and the kudos):

The PTCARDSTATUS table holds link and property refresh settings and contains the following columns:

OBJECTID(int, not null)
STATUS(int, not null)
INDEXLASTUPDATED(datetime, null)
LASTMODIFIED(datetime, null)

The STATUS field is the most useful of the four. By modifying the integer value, you can delete, refresh, or re-crawl cards into the directory. The available values are:

0 - Do Nothing
1 - Refresh Properties
2 - Not Used/Disregard
3 - Delete
4 - Recrawl and Refresh

Any changes to this table kick in with the run of Search Update and Doc Refresh jobs.

As always, I make no guarantees as to the past, present, or future accuracy of this information and I encourage you to do your homework before doing anything related to the portal database. Official sources would probably tell you not to touch it. I might be inclined to disagree, but that's mostly because I like causing headaches for our support guys.

About this Archive

This page is a archive of recent entries written by hross in September 2007.

hross: August 2007 is the previous archive.

hross: October 2007 is the next archive.

Blogroll


Integryst

Function1

Fabien Sanglier

Bill Benac

Jordan Rose

Chris Bucchere

Robert Herrera

Nanek Blog Aggregator

Spartan Java




if you'd like to be listed here.




I don't blog about non-tech issues here, but you can check my Google Reader Shared Items if you want to know what I'm currently interested in.

Categories