EC2/Cloudwatch Gaming Results

| | Comments (0) | TrackBacks (0)

As I mentioned in my previous post, I wanted to capture some real world info on hosting a game server in the cloud. The results were a rousing success. We had 5 or 6 people connected at various times, played some Deathmatch and Capture the Flag, and everyone had a ping of 40 or less the entire time. I didn’t notice any latency whatsoever and there were absolutely no packet loss or lag complaints throughout.

Cost

I haven’t broken down the numbers yet, but all told I started up an EC2 instance and hosted a game for 2 hours. I also attached an elastic IP for ease of use. That cost me less than $0.50. I’d say that’s a pretty good deal.

Usage

Below are the usage stats for network I/O and CPU usage. I gathered these using my simple Java application and created these no-frills charts in Microsoft Excel (all told, this took about 5 minutes to put together):

image

Figure 1 – Network I/O over a 2 hour F.E.A.R. game.

 

image

Figure 2 – CPU Usage over a 2 hour F.E.A.R. game.

Conclusion

This is a short and imperfect analysis, but overall I’d say the “small” EC2 instance could easily have handled a 16 person game, both from a load and network traffic standpoint, and it would have cost me a dollar or so to host for 2 hours. That seems like great bang for your buck if you’re looking to crank up a quick game and then move on to something else.

I have recently been working on a utility for porting ALUI databases from a production environment to a development environment. Fabien Sanglier started this effort, and I hope to have some code to contribute to his ALUI toolbox project very soon.

In the meantime, however, I have been banging my head against the pain that is migrating Publish and Preview target URL’s in Publisher. These URL’s are stored in a binary BLOB in the Publisher database, and are actually serialized Java classes, making them extremely difficult to update (especially when you don’t have access to the original Publisher source code).

My original plan was to wrap all of this stuff into one “uber-utility” and then blog about it. Recently, though, I saw this post on the Oracle Webcenter Interaction discussion forums: http://forums.oracle.com/forums/thread.jspa?threadID=900736&tstart=0 and it made me think I should probably post the code for migrating Publishing Targets, for the benefit of the sanity of the community at large.

Here is a link to a jar file which will update Publisher publish targets. If you crack the jar file with a zip editor, you will be able to update the configuration.properties file in the root directory to suit your needs.

I took the liberty of including the Publisher classes in my own jar, making it simpler to run from a command line. To run it, you will only need to download the correct jdbc driver for your database:

Oracle JDBC Driver

SQL Server JDBC Driver

Next, simply execute it from a java command line with the driver in your classpath, like so:

java -cp updatepublishtargets.jar;ojdbc14.jar net.hross.content.UpdatePublishTargets

Note that the utility is in debug mode by default, so nothing will happen to your Publisher database until you set debug to false in the configuration, although now is probably a good time to let you know that I provide no warranties of any kind with this code.

In order to build and run the source, you will need the content.jar and dom4j.jar found in the WEB-INF/lib directory of your ptcs.war. Here is the relevant source code, in case you are looking to build your own version of the utility (source is also in the jar):

package net.hross.content;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import net.hross.utility.Configuration;

import com.plumtree.content.data.AttributeKey;
import com.plumtree.content.data.impl.RdbiPublishingTarget;

public class UpdatePublishTargets {

    public static void main(String[] args) {
        Connection connection = Configuration.getConnection();

        if (null == connection) {
            System.out.println("Unable to connect to database. Exiting.");
        }

        int directoryId = Integer.parseInt(Configuration
                .getString(Configuration.CONFIG_DIRECTORY_ID));
        boolean debug = Boolean.parseBoolean(Configuration
                .getString(Configuration.CONFIG_DEBUG_MODE));
        String newPublishTarget = Configuration
                .getString(Configuration.CONFIG_PUBLISH_TARGET);

        System.out.println("Updating publish targets for directory ID: "
                + directoryId);

        System.out.println();
        if (debug) {
            System.out.println("** DEBUG MODE ON ** Nothing will be updated.");
        } else {
            System.out.println("** DEBUG MODE OFF ** This is happening for real.");
        }
        System.out.println();

        updatePublishTarget(connection, directoryId, newPublishTarget, debug);
    }

    /***
     * Update the publishing target for a specified directory ID (-1 for all
     * items).
     * 
     * @param connection
     *            Publisher database connection.
     * @param directoryId
     *            Directory ID to update. -1 for all items.
     * @param newPublishTarget
     *            - New publishing target.
     * @param debug
     *            - if true, no replace will be made, data will just be output.
     */
    public static void updatePublishTarget(Connection connection,
            int directoryId, String newPublishTarget, boolean debug) {

        // create a statement to query the directory id
        try {

            // create prepared statement for directory query
            PreparedStatement psDirectory = null;
            if (directoryId > 0) {
                psDirectory = connection
                        .prepareStatement("SELECT * FROM PCSDIRECTORY WHERE ITEMTYPE=0 AND DIRECTORYID=?");
                psDirectory.setInt(1, directoryId);
            } else {
                psDirectory = connection
                        .prepareStatement("SELECT * FROM PCSDIRECTORY WHERE ITEMTYPE=0");
            }
            ResultSet rs = psDirectory.executeQuery();

            // loop through any rows we need to check
            while (rs.next()) {

                // get basic info about the object
                String itemName = rs.getString("ITEMNAME");
                int size = rs.getInt("DATASIZE");
                
                // reset directory ID in case it was generic
                directoryId = rs.getInt("DIRECTORYID");

                // get binary input stream
                InputStream input = rs.getBinaryStream("DATABYTES");

                // if there's actually some settings, let's check them
                if ((null != input) && (0 != size)) {

                    // generic catch statement for problems with this item
                    try {
                        byte[] buffer = new byte[size];
                        input.read(buffer);

                        // load the hash map from the database
                        Map map = (HashMap) deserialize(buffer);

                        // loop through the keys in the hash map
                        Iterator keys = map.keySet().iterator();
                        while (keys.hasNext()) {
                            Object key = keys.next();

                            // this should probably always be true
                            if (key.getClass().equals(AttributeKey.class)) {
                                AttributeKey akey = (AttributeKey) key;

                                // if we found a publishing target...
                                if (akey.getKeyString().equals(
                                        "PUBLISHING_TARGET")) {
                                    System.out.println();
                                    System.out.println("--------------------");
                                    System.out
                                            .println("Updating publishing target for:");
                                    System.out.println(directoryId + " - "
                                            + itemName);

                                    // get the publishing target info
                                    RdbiPublishingTarget val = (RdbiPublishingTarget) map
                                            .get(key);
                                    String publishTarget = val
                                            .getPublishDetail()
                                            .getTargetLocation();
                                    String publishBrowser = val
                                            .getPublishDetail()
                                            .getBrowserLocation();
                                    String previewTarget = val
                                            .getPreviewDetail()
                                            .getTargetLocation();
                                    String previewBrowser = val
                                            .getPreviewDetail()
                                            .getBrowserLocation();
                                    String ftpUser = val.getPublishDetail()
                                            .getUsername();
                                    String ftpPassword = val.getPublishDetail()
                                            .getPassword();

                                    System.out
                                            .println("Publish  browser location: "
                                                    + publishBrowser);
                                    System.out.println("Preview target: "
                                            + previewTarget);
                                    System.out
                                            .println("Preview browser location: "
                                                    + previewBrowser);
                                    System.out.println("FTP user: " + ftpUser);
                                    System.out.println("FTP password: "
                                            + ftpPassword);
                                    System.out.println("Old publish target: "
                                            + publishTarget);
                                    System.out.println("New publish target: "
                                            + newPublishTarget);

                                    // if we are doing this for real, update
                                    // values
                                    if (!debug) {
                                        val.setTargetValues(newPublishTarget,
                                                publishBrowser, previewTarget,
                                                previewBrowser, ftpUser,
                                                ftpPassword);

                                        map.put(key, val);

                                        // update the directory
                                        serializeToDirectory(connection,
                                                directoryId, map);
                                        System.out.println("Update successful.");
                                    }
                                    System.out.println("--------------------");
                                    System.out.println();
                                }
                            }
                        }

                        // clean up
                        input.close();
                    } catch (IOException ex) {
                        System.out.println("Something bad happened.");
                        ex.printStackTrace();
                    }
                } // if null
            } // while next rs
        } catch (SQLException ex) {
            System.out.println("Something bad happened.");
            ex.printStackTrace();
        }

        System.out.println("Procedure successfully completed.");
    }

    private static Object deserialize(byte bytes[]) {
        try {
            ByteArrayInputStream byteStream = new ByteArrayInputStream(bytes);
            ObjectInputStream objectStream = new ObjectInputStream(byteStream);
            return objectStream.readObject();
        } catch (Exception ex) {
            return null;
        }
    }

    private static void serializeToDirectory(Connection conn, int directoryId,
            Object obj) throws IOException, SQLException {
        byte bytes[] = getBytes(obj);
        ByteArrayInputStream byteStream = new ByteArrayInputStream(bytes);

        PreparedStatement ps = conn
                .prepareStatement("UPDATE PCSDIRECTORY SET DATASIZE=?, DATABYTES=? WHERE DIRECTORYID=?");
        ps.setInt(1, bytes.length);
        ps.setBinaryStream(2, byteStream, bytes.length);
        ps.setInt(3, directoryId);
        ps.execute();
        conn.commit();
    }

    public static byte[] getBytes(Object obj) throws java.io.IOException {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(bos);
        oos.writeObject(obj);
        oos.flush();
        oos.close();
        bos.close();
        byte[] data = bos.toByteArray();
        return data;
    }
}

It is rare that I am on the bleeding edge of technology. Normally, I don’t think its worth the time and effort necessary to learn something brand new unless it has been at least somewhat widely adopted and accepted by the community at large.

Oddly enough, my blog post about running a game server on EC2 turned out to be perfectly timed, as Amazon launched its new CloudWatch, Elastic Scaling and Load Balancing services on Sunday. And since, as I discussed earlier, I have been looking at ways to monitor the usage of my EC2 game server, I somehow find myself on the bleeding edge of the cloud.

Why CloudWatch?

As I discussed in my previous post, setting up monitoring on an EC2 instance wasn’t that hard to do. However, it did come with some drawbacks:

  • Maintenance – Although it can be fun to install new software and learn its in’s and out’s, the actual task of upgrading that software, maintaining it, patching it, watching it for security risks, etc, etc is a major pain in the rear end. CloudWatch solves this problem by providing a simple service for retrieving performance data, no maintenance or special setup required.
  • Granularity – As I discovered with munin, there are limitations to the frequency with which you can store performance data, not to mention the storage requirements for vast quantities of it. Again, this is hidden from us in the case of CloudWatch.
  • Performance – Last but certainly not least, monitoring something usually incurs a performance hit. In my previous article I was sampling data on the same host I was tracking statistics from. The very act of collecting performance data could cause that data to be skewed. Since CloudWatch abstracts this away from individual instances, this is no longer a problem.

Getting Started With CloudWatch

There are quite a few resources available to get you started with CloudWatch. I recommend taking a look at the javascript scratch pad and the other various developer libraries already available (more on this later).

If you really want to get down to the nitty gritty, you should start with the CloudWatch command line interface (CLI). Here are some simple steps to get you started:

  1. Download the EC2 API Tools first (you’ll need them to set up monitoring). Check out the Getting Started Guide for instructions on extracting the tools and setting up the proper environment variables.
  2. Download the CloudWatch API Tools. Check out the included readme for details on environment variable setup.
  3. Start up an EC2 instance like you normally would (see my previous post).
  4. Enable monitoring on your running instance using the EC2 API Tools command: ec2-monitor-instances <instanceId>.
  5. Take a look at the CloudWatch Getting Started Guide for details on the available monitoring parameters, etc.
  6. Run the CloudWatch command mon-get-stats to get some statistics from your running instance (mon-get-stats –help should give you some examples).

Here are a few things to keep in mind when running the command line utility:

  • I normally output data to a CSV file so I can create fancy graphs in Excel. Here is an example command (Windows) that delimits stats by comma and outputs to a CSV file:
    mon-get-stats CPUUtilization --start-time 2009-05-19T21:00:00
     --end-time 2009-05-19T22:00:00 --period 60 --statistics Average 
    --namespace AWS/EC2 --delimiter "," 
    --dimensions "InstanceId=i-2bb5cc42" > stats.csv
  • Timestamps – As per the forums, input timestamps are in ISO-8601 format with the default timezone UTC (Eastern Standard Time + 4 hours). Output timestamps are in UTC and cannot be changed (so start thinking in Greenwich Mean Time).
  • Virtually as soon as monitoring is enabled, statistics are retrieved from your instances. Data is available up to a per-minute frequency and is stored for two weeks.

Writing a Simple Java Monitoring Utility

As much fun as I was having trying to parse and decipher various command line inputs, I was somewhat disappointed in the output. For one thing, there was the time formatting problem. For another, only one set of statistics (CPU utilization, network I/O, etc) were available at one time.

I am not one to do more work than I need to, so instead of setting off to invent an uber-utility for aggregating data, I simply downloaded the Java library for CloudWatch and hacked up some of the sample code until I had a very basic utility for downloading and aggregating the data I wanted. I present it below in case someone finds it useful:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Properties;

import com.amazonaws.cloudwatch.AmazonCloudWatch;
import com.amazonaws.cloudwatch.AmazonCloudWatchClient;
import com.amazonaws.cloudwatch.AmazonCloudWatchException;
import com.amazonaws.cloudwatch.model.Datapoint;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsRequest;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsResponse;
import com.amazonaws.cloudwatch.model.GetMetricStatisticsResult;

public class GrabStats {

    public static void main(String[] args) {
        
        String fileName = "C:\\stats.csv";

        String startTime = "2009-05-19T20:00:00";
        String endTime = "2009-05-20T00:00:00";
        
        String[] statList = { "CPUUtilization","NetworkIn","NetworkOut" }; //(%, bytes, bytes)
        
        HashMap<String, HashMap<String, Double>> map = new HashMap<String, HashMap<String, Double>>();
        
        // grab stats for each stat value
        for (int i = 0; i < statList.length; i++) {
            HashMap<String, Double> stats = getStatistics(startTime, endTime, statList[i]);
            map.put(statList[i], stats);
        }
        
        // write to disk
        try {
            FileWriter fw = new FileWriter(fileName);
            
            // write the header
            fw.write("Date");
            for (int i = 0; i < statList.length; i++) {
                fw.write(",");
                fw.write(statList[i]);
            }
            fw.write("\n");
            
            // get a date iterator from our first statistic
            Iterator<String> dateIterator = map.get(statList[0]).keySet().iterator();

            while(dateIterator.hasNext()) {
                String date = dateIterator.next();
                fw.write(date);
                
                // get values for each stat at this date
                for (int i = 0; i < statList.length; i++) {
                    Double value = map.get(statList[i]).get(date);
                    fw.write(",");
                    fw.write(value.toString());
                }
                
                fw.write("\n");
            }
            
            fw.close();
        } catch (IOException ex) {
            // error storing data
            System.out.print("Error writing file: " + fileName);
        }

    }

    // define the cloudwatch service (should be a singleton)
    private static final String _accessKeyId = "<insert key here>";
    private static final String _secretAccessKey = "<insert access key here>";
    private static AmazonCloudWatch _service = new AmazonCloudWatchClient(
            _accessKeyId, _secretAccessKey);

    public static HashMap<String, Double> getStatistics(String startTime,
            String endTime, String statName) {
        HashMap<String, Double> map = new HashMap<String, Double>();

        // build the request with some defaults
        GetMetricStatisticsRequest request = new GetMetricStatisticsRequest();
        ArrayList<String> stats = new ArrayList<String>();
        stats.add("Average");
        request.setStartTime(startTime);
        request.setEndTime(endTime);
        request.setPeriod(60); // statistics every minute
        request.setMeasureName(statName);
        request.setNamespace("AWS/EC2");
        request.setStatistics(stats);

        try {

            GetMetricStatisticsResponse response = _service
                    .getMetricStatistics(request);

            if (response.isSetGetMetricStatisticsResult()) {
                GetMetricStatisticsResult getMetricStatisticsResult = response
                        .getGetMetricStatisticsResult();
                java.util.List<Datapoint> datapointsList = getMetricStatisticsResult
                        .getDatapoints();
                for (Datapoint datapoints : datapointsList) {
                    map.put(datapoints.getTimestamp(), datapoints.getAverage());
                }
            }

        } catch (AmazonCloudWatchException ex) {

            System.out.println("Caught Exception: " + ex.getMessage());
            System.out.println("Response Status Code: " + ex.getStatusCode());
            System.out.println("Error Code: " + ex.getErrorCode());
            System.out.println("Error Type: " + ex.getErrorType());
            System.out.println("Request ID: " + ex.getRequestId());
            System.out.print("XML: " + ex.getXML());
        }

        return map;
    }

}

Conclusion

The CloudWatch tools and utilities are nothing less than I’d expect from Amazon. Everything worked as expected, the documentation was well put together and there were no real surprises with the API. Overall, I am very satisfied with the finished product of my meager efforts.

There are, of course, a few shortcomings:

  1. It would be nice to have more statistics available (memory usage being the main one I’m thinking of). Having the ability to define and collect your own statistics via an API would be even better. Since the API already has a flexible way of defining statistic and type, I have to assume this is coming.
  2. Output visualization is certainly lacking. It would be great to see someone hack a Google Chart generator into the javascript scratch pad (given my lack of copious amounts of free time, this person won’t be me).
  3. Adding some statistic collection and enablement to ElasticFox would certainly make things easier to set up and administer.

I have to assume these drawbacks will be addressed in future updates, as they have been in the past. I am willing to accept them as the price to pay for being on the bleeding edge of the cloud.

I recently came across a project where I had a need to display the results of a large SQL query in an HTML table using Java. Of course, I wanted to paginate it, style it, use AJAX to update it, and avoid the need for bulky toolkits or large frameworks. Oh yeah, I was also on an extremely tight deadline (read: proper coding and design principles were not used).

I looked into a couple of options:

  1. Display Tag – I used this for another project, but it uses session variables to paginate and doesn’t lend itself well to AJAX updates.
  2. GWT – One of the guys over at Function1 used it recently and it looked pretty slick. Unfortunately, it seemed like a lot of overhead and a styling headache for simple “display a table” functionality.
  3. DWR and jQuery – As it turns out, I found a great series of blog posts (part 1, part 2) over at Spartan Java that pretty much laid out a solution to my problem.

As you can see from the posts on Spartan Java, creating the framework to display SQL results using DWR and jQuery was simple, fast, and fairly straightforward. Because the poster makes some assumptions about your knowledge of DWR and jQuery, I would suggest combining the above with the getting started guide for DWR and the jQuery tutorials, if you are unfamiliar with either.

If you know basic DHTML and Java the learning curve should be no problem.

DWR and the Portal

As usual, the code I was writing was eventually destined to show up in the portal. Because jQuery is just a simple javascript library, it works great in the portal without issue. Unfortunately, DWR, like many other AJAX frameworks, has its issues when it is gatewayed.

After combing through the dustier nooks of the documentation, googling profusely, and downloading the source code, I discovered the secret to making DWR work. Since some of you may be thinking “wow, DWR looks like something I want to use in my next portlet”, I thought I’d elaborate:

  1. Add an anchor tag to whatever portlet you will eventually display in the portal. This anchor tag should have an id (let’s call it “gatewaybase”) and it should reference the base path of DWR in your application (this will almost always be /dwr/). So, for any portlet I want to use DWR in, I would always have the following (this is in a JSP, you might need to change <%=path%> to another base path):
  2. <script src="dwr/interface/ClassNameExample.js"></script>
    <script src="dwr/engine.js"></script>
    <a id="gatewaybase" target="<%=path%>/dwr/" href="<%=path%>/dwr/"></a>
  3. The trick to getting DWR working is intercepting its initialization javascript. As it turns out, DWR unofficially supports this, but does not document it. Assuming you’re using jQuery, adding this javascript to either an external .js file, or directly in the page, should do the trick:
    jQuery(document).ready(function() {
        if (typeof(PTPortalPage)!="undefined") {
            //TODO: this check won't work if JS in gateway
            dwr.engine._urlRewriteHandler = doInterceptUrl;
        } else if (document.getElementById("gatewaybase") != null) {
            dwr.engine._urlRewriteHandler = doInterceptUrl;
        }
    });
    
    function doInterceptUrl(data) {
        // this function intercepts http requests from DWR
        // and gateways them using an anchor on the main page
        //TODO: is there a better way? AJAX request for base?
        var rooturl = document.getElementById("gatewaybase").href;
        var nongateroot = document.getElementById("gatewaybase").target;
    
        data = data.replace(nongateroot, rooturl);
        return data; 
    };

What this will do is effectively intercept any javascript requests from your page and add a properly gatewayed URL (via that anchor tag you added in step 1) to the HTTP request.

Also note that you may need to modify your web.xml with the following init-param for DWR:

<servlet>
  <servlet-name>dwr-invoker</servlet-name>
  <display-name>DWR Servlet</display-name>
  <servlet-class>org.directwebremoting.servlet.DwrServlet</servlet-class>
  <init-param>
     <param-name>debug</param-name>
     <param-value>true</param-value>
  </init-param>
  <!-- added for gateway compatability -->
  <init-param>
     <param-name>crossDomainSessionSecurity</param-name>
     <param-value>false</param-value>
  </init-param>
</servlet>
I’m sure there are probably a million ways to build a better mousetrap when displaying tables with Java, but using the above technologies was quick, easy and rewarding.

Yes, it’s true that I haven’t posted in quite a while. My bad. Hopefully you enjoy this little tidbit, even though it’s my first non-ALUI post on this blog…

Game Night

Recently, after a long workday some co-workers and friends of mine started discussing a “game night”. All of us have jobs, lives outside of work, and are no longer college students, but all of us remember the glory days of Counterstrike, Quake and the like.

Of course, none of us has anything more than a decently performing laptop, and all of us have an aversion to spending money. And so it was that we happened upon a game called F.E.A.R. Combat. Perhaps its a bit long in the tooth, and perhaps it is behind the times, but it sure is fun, and it sure is FREE.

The point of that longwinded story is that every other Wednesday has become game night, or more specifically, F.E.A.R. night. And since we are all computer geeks, and we all work in the web technology world in some way or another, someone brought up the idea of running a F.E.A.R. instance on Amazon’s EC2.

Recently, I had a bit of time on my hands and an urge to try it out, and thus this post was born…

Starting and Connecting to an EC2 Instance

First, I signed up for Amazon EC2 (actually, I had already signed up when I wrote this blog post). The invaluable Getting Started Guide contained all the basics I needed to start instances, make images, sign up, etc..

Next, I made sure to download the ElasticFox plugin for Firefox. This makes managing and running EC2 instances much easier. If you want to get started quickly, here is a great Getting Started Guide for the plugin.

After installing and setting up ElasticFox, I was ready to start up a base image. I chose to run an Ubuntu image, since package management and documentation is readily available. This site has a few base AMI’s which I used to get started. I simply searched for the AMI ID I wanted and followed the Elasticfox instructions on starting an instance.

One thing I had to keep in mind was that I wanted to allow the proper TCP/UDP access so that people could connect to my server. In this case, I allowed the following ports:

Application Protocol Port
SSH TCP 22
HTTP TCP 80
F.E.A.R. TCP/UDP 27888
TeamSpeak UDP 8767

 

The other thing I did, in order to keep things simple for future connections, was associate a static IP with my running instance (these are called Elastic IP’s in EC2 parlance). The procedure is mind-numbingly simple in ElasticFox, so I’ll refer you to the Getting Started Guide if you need more information on how to do it.

At first, I had some issues actually connecting to my image using SSH (Elasticfox will auto launch an SSH client). The problem ended up being that I was using Putty for SSH and it does not recognize the private key format used by EC2. Doh. Fortunately, you can convert your keys using Puttygen. Amazon was nice enough to dedicate an appendix in their Getting Started Guide for this exact problem.

Problem solved.

My next steps were the steps you’d take to install and configure any server so that it could host F.E.A.R. Combat, a TeamSpeak server (an in-game voice communication server), and a Munin monitoring instance (so I could get some stats to see how well EC2 performed in a real world scenario).

Preparing for the Installation

After everything was running, I wanted to make sure I had the prerequisites to run F.E.A.R. and install any optional components. As it turned out, my Ubuntu instance was fairly locked down. In order to download/install what I needed, I had to update my sources list to included the multiverse and universe repositories.

Once this was done, I updated the list of installable applications via:

apt-get update

And installed some C++ compatibility libraries for the dedicated server via:

apt-get install libstdc++5

At this point I was all set to install the base components of my server.

Installing and Configuring F.E.A.R. Combat

My first step was to download the F.E.A.R. dedicated linux server here.

Since I already had the prerequisites installed (see above), all I had to do was extract the archive to disk, modify the included start.sh to my liking (I used a custom configuration via the –optionsfile argument, used nohup to prevent it from shutting down accidentally, etc), and start the server.

Installing and Configuring TeamSpeak

TeamSpeak is an in-game voice communication server. Since my game night buddies are mostly remote, I figured it would be nice to provide some voice communication for trash talk and strategy.

I logged in as root and ran:

apt-get install teamspeak-server

A teamspeak user was added, the server started and I was ready to rock and roll. As for configuring the server… it seemed to work okay, so I didn’t bother =). However, you can find some instructions for configuration here.

Installing and Configuring Munin

Munin is a monitoring tool that allows you to capture CPU, memory, process data, and all kinds of other stats in 5 minute increments. It can be used for monitoring many systems with many kinds of statistics, but that is outside of the scope of this post. For now let’s just say I wanted a simple way to capture statistics for my AMI.

The installation also turned out to be very simple. It involved using apt-get to install apache and Munin. Rather than regale you with the details, I’ll just point you to this simple tutorial.

Note: I did have some issues getting Munin to work at first, but once I made sure my local node was listening on the loopback adapter only, it seemed to work. See Section 1.3 (Configuring the Node) of the tutorial for details.

Creating and Registering an AMI

At this point I had everything I needed to run a game server. I tested client connections to my Teamspeak host, Apache server hosting Munin, and the F.E.A.R. server itself and everything worked great.

The only problem was that if I ever shut down the running instance, all of my work would be gone and I would have to re-install everything the next time I wanted to host a game. Thus, I needed to create an AMI from my base image.

The procedure for this was relatively simple, and well documented in the Getting Started Guide here. However, there are a few things you might want to know before you dive in:

  • You’ll need at least a basic working knowledge of Amazon S3, since you’ll need it to store your finished AMI. I suggest grabbing S3Fox and using it to create an Amazon S3 bucket. This process is fairly simple, but still a minor annoyance.
  • The base image I used did not have the EC2 API tools installed on it, which meant that I could not register my EC2 instance without installing them. I did this by running:
apt-get install ec2-api-tools

After that, all I needed to do was set my JAVA_HOME environment variable and follow the rest of the Getting Started Guide.

Final Thoughts

Security:

As you probably noticed, the configuration on my AMI is hardly secure. I ran things as root, didn’t bother changing passwords or restricting IP’s, etc, etc.. I offer no excuses, save my own laziness.

However, the nice thing about an AMI is that it is only going to be used on game night for a few hours. I’m hardly worried about being hacked. Any time there is a problem, all I have to do is terminate the instance and boot up another AMI. Since nothing is persistent, and there are no credentials on the box, this is great.

Imagine if I had set up a dedicated server for this. I’d have to worry about all kinds of hardening due to the longevity of the configuration. Yuck.

Going Further:

Of course, as always, there are some things I could have done that would have taken this post further:

  • Capturing “real” usage stats and anecdotal performance data (is this a feasible, reliable, and cost effective solution?). This will probably follow in a future blog post (after the next “game night”).
  • Writing a wrapper for the AMI so that it can be started and stopped on-demand via the web. Someone could definitely write a dedicated hosting web site if they could figure out all the possible licensing restrictions.

Otherwise… that’s about it. After reading this you should be in a position to create your own AMI’s using EC2. The overall experience for me was rather pleasant, though there were some things I think Amazon could have done to simplify the process.

You know, it's funny how some things can seem extremely complicated and then when you crack them open they turn out to be fairly easy to understand. Remember the mystery behind how a G.I. Joe stayed together, but then you broke one and found it was simply a rubber band holding his guts together? Turns out search server is much like that. A terribly complicated-seeming C program that, fundamentally, is held together by a rubber band.

What is a Search Node?

From my previous posts, you've probably inferred that search nodes are the fundamental building blocks of ALUI's search capability. In fact, search nodes are actually the *only* building blocks of the search capability. Everything you need to set up a clustered or non-clustered search environment is contained in one simple install, a few directories and an executable.

All this seemingly complicated system amounts to is the following breakdown:

  1. An executable running somewhere listening for requests
  2. An open TCP port that receives text based search queries
  3. Two directories that contain everything search needs to operate: a cluster directory and a node directory.

Here's a more complicated picture of what I just listed:

search_node_architecture 

Figure 1 - Search Node Architecture

Search Requests

Let's start with the executable. When you start it up using the command line (from the bin directory in a *nix environment, or via a service on Windows), it uses environment variables to find its various configuration files, starts up a process, opens a TCP socket on whatever port you tell it to, and sits around waiting for stuff to happen.

The "stuff that happens" turns out to also be fairly simple. Search server doesn't actually know anything about portals, documents or anything else for that matter. It sits around and waits for one of two things:

  • An index request (put some information into the search index so it can be searched for later)
  • A search request (search for something in the current index)

These two things are specified in a text-based custom language over a TCP port. What I mean is that you, Joe Six-pack, could open up a telnet session to your search server port and type a search query (index or request) freehand, were you so inclined. You would type something like the following:

( FIELDALIAS ptsearch,[2]PT1,[2]PT1_en,[0.1]PT2,[0.1]PT2_en,[0.1]PT50 ) (((ptsearch:a) TAG phraseQ OR (ptsearch:a*) TAG nearQ) AND ((subtype:"PTCARD")[0])) AND ((((@type:"PTPORTAL")[0]) OR ((@type:"PTCONTENTTEMPLATE")[0])) AND (((ptacl:"u2") OR (ptacl:"51"))[0]) AND (((ptfacl:"u2") OR (ptfacl:"51"))[0])) 
METRIC logtf [1] RESULTS 10 PRINT FIELDS parentids,ptacl,ptfacl,PT51,PT56,@type,subtype,ancestors,PT58,PT7,PT53,abstracttype,
PT1,PT1_en,PT2,PT2_en,PT3,PT4,PT5,PT6,PT8,collab_properties,collab_project_url,collab_project_name,collab_icon_alttext_index,collab_acl,publisheduser,portletid TERMS 10000 results[1-10] KWIC 15

 

Obviously, this kind of a query isn't very pretty or intuitive, but the point is you could type it via telnet and search server would spit out an XML formatted response to your query. You can see these types of queries in your search node logs if you set your logging levels high enough. Lucky for you, the search API takes care of all of this heavy lifting and converts those XML results into the pretty HTML you see when you perform a search in the portal.

Building a Search Index

"Okay Ross," you're probably thinking, "I can run search queries over telnet to see what's in my search index. That's all well and good, but how does all that junk get in the index in the first place?"

How indeed. As I mentioned above, that junk gets in there via an index request, which is much like a search request (runs over a TCP port, follows a specific querying language), but allows whoever or whatever to put information into search instead of extract it.

If you look closely at your Publisher content.properties file, Collaboration config.xml file or even at the portal database (PTSERVERCONFIG table), you will see an "Indexing Search Port" and "Indexing Search Host" specified. What these values really do is tell each product (Portal, Publisher, Collaboration) where to submit their new document data (i.e. when someone publishes something, uploads something to a project, or a crawler runs). That data is submitted over the same TCP port to the same type of node that handles queries.

How an Indexing Request Works

Here's a brief explanation followed by a couple of pictures:

  1. An index request is submitted to a search node. Since that search node may be part of a multi-node cluster, the request goes straight to the cluster file system (remember, all nodes share this directory).
  2. The request is assigned a transaction ID and added to a queue on the cluster (you can see this in the form of the requests folder in the cluster folder of your search node).
  3. Every search node in the cluster independently maintains its own transaction ID, which corresponds to the last index request it processed. These nodes continually poll the shared requests folder. If they find a transaction that has a higher ID than the one they maintain, they pull the information for that transaction and add it to their local search index. They then update their local transaction ID to match the transaction they just processed.

You can actually see this process in real time by amping up your search logs and watching the transaction ID's increment when you upload a collab document, create and admin object, etc.. Here's a few Powerpoint diagrams I created of this process:

index_request1

Figure 2 - Adding an index request to the cluster's transaction queue.

index_request2

Figure 3 - Updating a local search index from the transaction queue.

Conclusion

As far as node operation goes, that should clear up most of the mystery. At this point, you should understand most of the how's and why's of search operation. The last piece to this puzzle is the "checkpoint" feature, which I'll review in the final exciting chapter of this blog series.

Back in the Saddle (Again)

| | Comments (0) | TrackBacks (0)

As you may have noticed (or not noticed, if your read this site from your aggregator), my blog has undergone some much needed renovation. I realize that as a blog reader myself, I appreciate a dearth of "State of the Union" posts in the blogs I read. Nonetheless, there have been quite a few changes, so here's an infrequent update...

  • I use Movable Type to publish, and despite its complexity, I love the customizability and performance. I found a great style from the CMR Movable Type Styles Blog, and have customized it to my liking. Finally, no more blah "Minimalist Red" scheme.
  • Dev2Dev is finally gone for good, which means some of my blog images were 404'ing. Thank you helpful readers for pointing this out. I finally went back and reset them to point to my own blog.
  • Bill Benac recommended we Plumtree bloggers start linking to each other, so you'll notice a blogroll on the right side of the page. If you're blogging and you want to be listed here, give me a shout.
  • The blog slogan has now officially become "Tech + Caffeine = blog", since I'm tired of dealing with these product name changes. As are other people, apparently.
  • That Jeep in the picture is mine, featured in some mind-blowingly isolated terrain right outside the Badlands.

So what's in store for the future? Well, I still have some posts to write in my Search Series, I have a long backlog of posts regarding various technical minutiae, and then there is your input... Thoughts?

Let's take a quick timeout from Search for a more basic post...

I don't have a "Cool Tools" section of my blog, like some other notable ALUI bloggers, but I do know of a few "cool tools" that have helped me do my job. One of my favorites is a fancy diff utility called WinMerge.

(go download it now if you haven't already)

One of the primary things I use it for is validating product upgrades. If you're as lazy and/or paranoid as I am, you have probably given pause during an ALUI upgrade when you saw the step "re-import the PTE". As most of us know, re-importing a PTE is a mixed bag, as it comes along with a lot of dependencies and can frequently wipe out customizations to web services, portlets, etc. Worse yet, you never quite know what's happening when you import.

What if we could analyze a PTE and figure out what changes were made so that we could either:

    • make the changes ourselves
    • not bother re-importing
    • at least know what changes were going to be made to our existing data?

Turns out this is rather simple (and, obviously, involves WinMerge).

Let's use a relevant example to demonstrate: a Publisher upgrade from 6.4 to 6.5. This is an upgrade of a minor revision number, so you would think there would be relatively few changes to the PTE's. Nonetheless, the install guide tells me to re-import, re-import, re-import.

Yuck.

Instead, I'll take an alternate approach. First, I run the Publisher 6.5 upgrade installer as I normally would. However, once I get to the re-import step, I navigate to the ptcs/6.4/serverpackages directory of my previous Publisher install and grab the publisher.pte file therein. Next, I grab the same PTE file from my ptcs/6.5/serverpackages directory.

Now I have both default install PTE's. Any differences between them will be the changes due to the 6.4 to 6.5 upgrade. Since these PTE's are really just XML files with fairly obvious naming conventions, I simply open them up side by side in WinMerge and compare the differences...

pte_diff

As it turns out, the only changes to the Publisher package in 6.5 are some /jspell URL's that have been added to the gateway settings for some web services. Since I can read the new URL in WinDiff, I can copy the gateway URL's and add them manually. Now I no longer need to import the PTE.

... and even if there were more changes and I had to re-import, I would be well informed of what they were before running the import.

Okay. We now return you to your regularly scheduled programming.

Here we are, back again for another installment in my new blog "mini-series" about search. When I first started researching these posts (er... presentation, actually) the mini-series might have been more aptly titled "Lost" (not to be confused with ABC's hit series, except for the mass confusion and never ending storyline).

Last time I promised some hard-hitting dirt on Search Administration, and as always, I deliver on my blog promises. Okay, maybe hard hitting is a bit of a stretch... let's talk about Search Administration. Most of you are probably familiar with the Search Cluster Manager and Search Service Manager in the Administrative Utilities drop down, but what are they and how do they work?

Let's start tackling this with a diagram:

search_admin_1 

This diagram represents the end-all be-all of the search administration process. There are two parts:

  1. Portal communication with a search node directly. This is the Search Service Manager (left side of the diagram). It is basically the portal asking the node about the health and topology of the search server and the node replying with this information. This node is extremely important, since it tells the portal front end how and which search nodes to query. The query is performed over the same port as any other search request, using the same mechanisms, and will show up in your search logs if you have them at a high enough verbosity.
  2. Portal communication with the search topology indirectly. This is done via the Search Cluster Manager (right side of the diagram). I have heard much rumor and hearsay regarding the Search Cluster Manager, so let me clear up any misconceptions you might have with a properly bolded and formatted statement:

The Search Cluster Manager is a Java web application that reads and writes files on the Cluster File System.

What this really means is that the Search Cluster Manager is totally unnecessary. All administration can be done with the cadmin tool (in your search server's bin directory) or via direct changes to specific initialization files (this is what the Search Cluster Manager does, anyway). So basically, the diagram above actually looks like this:

search_admin_2

Wrap Up

So that's it. Basically, the take-away's here are:

  1. Search Cluster Manager is simply a prettied up version of the command line utility and does not need to run for search to function in the portal.
  2. Search Service Manager controls the contact node and determines search topology for the portal front end.

Pretty simple, eh? Next up... some more interesting details on node operation.

Once again, I'm back from the dead. I admit it, I haven't been that busy lately, just had a hard time motivating myself to get through this search series. Perhaps more coffee will do the trick...

Breaking Down a Search Collection

Last time I listed the various functions of search and reposted my first search slide. It was fairly simple, just an abstract "Search Collection" diagram. This time let's break that diagram down a bit more:

what_is_search_3

What we see above is a less abstract view of the same diagram. Instead of one giant "Search" lump, we actually have an API, which makes the communication decisions, and a collection of search nodes. These nodes are just processes running somewhere, listening on a specific port. More about them later.

Partitions

That was pretty simple, right? Let's throw in one more wrinkle before moving on to the complicated bits: Partitions. A partition is simply a grouping of search data into a set of nodes. Applying that concept to the above diagram, a partitioning of our search collection might look something like:

what_is_search_partitions

In other words, some of the data indexed by search (search results) will reside in Partition 1 on Node 1, and some of the data will reside in Partition 2 on Nodes 1 and 2. If we draw out the partitions in a more abstract manner, they look like this:

what_is_search_partitions_abstract

As you can see, there are two separate "bins" of data. When new information is indexed it goes into one of these two bins. It is important to note that neither partition contains duplicate data, so when you search for something the results from Partition 1 and Partition 2 must be aggregated together. Duplicate data will, however, exist on Nodes 1 and 2 in Partition 2 (see above).

Search Coordination

With all this data moving about, being partitioned, searched, etc, you may be wondering how all of the search nodes communicate with one another. How do they know which partition they belong to, which node they are and what data has already been indexed?

The answer, it turns out, is extremely simple. They all must share at least one common set of files and directories, which I'll call the "Cluster File System". There is no special port-to-port communication, magic pixie dust, or any other way for search nodes to talk to each other. The cluster file system contains configuration information about the entire search topology, as well as a common queue/locking mechanism for incoming search indexing requests (more detail later). In other words, our previous diagram now looks like this:

what_is_search_cluster_file_system

And that's really all there is to it. I've just covered all of the concepts you'll need for a basic understanding of search.

Wrap Up

Alright, well we've covered the basics, but as you know, I'm never fully satisfied with the basics. Hopefully you now have a base understanding of search operation and are ready to stick with me for the under-the-covers part. Most of the information I've provided to this point is covered in the docs, just (in my opinion) not very well. Next time look for some more detailed information on how search administration works and under the covers node operation.