Adventures in Search - Part 2 - Search Architecture

| | Comments (0) | TrackBacks (0)

Once again, I'm back from the dead. I admit it, I haven't been that busy lately, just had a hard time motivating myself to get through this search series. Perhaps more coffee will do the trick...

Breaking Down a Search Collection

Last time I listed the various functions of search and reposted my first search slide. It was fairly simple, just an abstract "Search Collection" diagram. This time let's break that diagram down a bit more:

what_is_search_3

What we see above is a less abstract view of the same diagram. Instead of one giant "Search" lump, we actually have an API, which makes the communication decisions, and a collection of search nodes. These nodes are just processes running somewhere, listening on a specific port. More about them later.

Partitions

That was pretty simple, right? Let's throw in one more wrinkle before moving on to the complicated bits: Partitions. A partition is simply a grouping of search data into a set of nodes. Applying that concept to the above diagram, a partitioning of our search collection might look something like:

what_is_search_partitions

In other words, some of the data indexed by search (search results) will reside in Partition 1 on Node 1, and some of the data will reside in Partition 2 on Nodes 1 and 2. If we draw out the partitions in a more abstract manner, they look like this:

what_is_search_partitions_abstract

As you can see, there are two separate "bins" of data. When new information is indexed it goes into one of these two bins. It is important to note that neither partition contains duplicate data, so when you search for something the results from Partition 1 and Partition 2 must be aggregated together. Duplicate data will, however, exist on Nodes 1 and 2 in Partition 2 (see above).

Search Coordination

With all this data moving about, being partitioned, searched, etc, you may be wondering how all of the search nodes communicate with one another. How do they know which partition they belong to, which node they are and what data has already been indexed?

The answer, it turns out, is extremely simple. They all must share at least one common set of files and directories, which I'll call the "Cluster File System". There is no special port-to-port communication, magic pixie dust, or any other way for search nodes to talk to each other. The cluster file system contains configuration information about the entire search topology, as well as a common queue/locking mechanism for incoming search indexing requests (more detail later). In other words, our previous diagram now looks like this:

what_is_search_cluster_file_system

And that's really all there is to it. I've just covered all of the concepts you'll need for a basic understanding of search.

Wrap Up

Alright, well we've covered the basics, but as you know, I'm never fully satisfied with the basics. Hopefully you now have a base understanding of search operation and are ready to stick with me for the under-the-covers part. Most of the information I've provided to this point is covered in the docs, just (in my opinion) not very well. Next time look for some more detailed information on how search administration works and under the covers node operation.

0 TrackBacks

Listed below are links to blogs that reference this entry: Adventures in Search - Part 2 - Search Architecture.

TrackBack URL for this entry: http://hross.net/mt/mt-tb.cgi/35

Leave a comment

About this Entry

This page contains a single entry by hross published on October 15, 2008 5:05 PM.

Adventures In Search - Part 1 - What is Search? was the previous entry in this blog.

Adventures in Search - Part 3 - Search Administration is the next entry in this blog.

Blogroll


Integryst

Function1

Fabien Sanglier

Bill Benac

Jordan Rose

Chris Bucchere

Robert Herrera

Nanek Blog Aggregator

Spartan Java




if you'd like to be listed here.




I don't blog about non-tech issues here, but you can check my Google Reader Shared Items if you want to know what I'm currently interested in.

Categories