@geodist through Ultrasphinx

February 24th, 2008

For most of our projects over at Syndeomedia, we have been using Andrew Aksyonoff’s Sphinx search engine to handle the search functionality, and Evan Weaver’s Ultrasphinx Ruby on Rails plugin to serve as an API to the search daemon. Sphinx is an incredibly efficient full-text stand-alone search engine that has the ability to gracefully handle large bulks of data, while Ultrasphinx brags a configurator that can automatically generate a Sphinx configuration file from specifications in the models themselves. Ultrasphinx also handles generation of the actual result objects with extras, of course.

Sphinx/Ultrasphinx served our purposes together perfectly until we needed to sort some search results according to the distance between the objects that they represent and a varying location in order to increase relevance of the search results. The search engine actually already handles @geodist computations (and sorting), but there seems no way to do this with Ultrasphinx. While it is relatively easy to sort all search results, which would later be paginated, through a MySQL query, and even more convenient when using the :origin specifier from the GeoKit plugin, doing these involve incredible overload, especially if the application is dealing with more than just a thousand rows.

Now, both projects are relatively young, and I can imagine how the Ruby on Rails plugin always has to hurry after changes in the search engine. I got myself a bit more familiar with Sphinx and Ultrasphinx (and Pat Allen’s Riddle plugin, which Ultrasphinx uses), and made a few changes.

Patch anchor_support_added_to_search:

Index: lib/ultrasphinx/search/internals.rb
===================================================================
--- lib/ultrasphinx/search/internals.rb (revision 1701)
+++ lib/ultrasphinx/search/internals.rb (working copy)
@@ -18,6 +18,7 @@
           @offset = opts['per_page'] * (opts['page'] - 1)
           @limit = opts['per_page']
           @max_matches = [@offset + @limit, MAX_MATCHES].min
+          @anchor = opts['anchor']
         end
           
         # Sorting
@@ -329,4 +330,4 @@
     
     end
   end
-end
\ No newline at end of file
+end
Index: lib/ultrasphinx/search.rb
===================================================================
— lib/ultrasphinx/search.rb (revision 1701)
+++ lib/ultrasphinx/search.rb (working copy)
@@ -103,7 +103,8 @@
       :weights => {},
       :class_names => [],
       :filters => {},
-      :facets => []
+      :facets => [],
+      :anchor => {}
     })
     
     cattr_accessor :excerpting_options

Simple. Riddle already supports use of anchors. What the patch does is just allow the anchor to be set in the Riddle client instance that Ultrasphinx::Search instantiates.

If the anchor is set for the search session and the "extended" sort_mode is used, @geodist already becomes available for use in the sort_by declaration. Below would be an example of a search that would sort matches according to @geodist primarily, and name secondarily:
@search = Ultrasphinx::Search.new(:query => "east", :class_names => ["branch"], :per_page => 20, :page => 3, :sort_mode => “extended”, :sort_by => “@geodist asc, name asc”, :anchor => {:latitude_attribute => “lat”, :longtitude_attribute => “lng”, :latitude => degrees_to_radians(user.location.lat), :longtitude => degrees_to_radians(user.location.lng)})
@search.run(false)

Yes, that is “longtitude”.

:-) I hope this saves people some time.

5 responses

  1. Jeremy Seitz comments:

    Hi thanks for the great post — I happened to be working on the exact same problem at the same time with much frustration. Your post over on the ultrasphinx board put me in the right direction. I wish I had found this blog post sooner :(

    One of the things that threw me off too was the misspelling of longitude in the Riddle client - I submitted a patch and that change is now in UltraSphinx too.

    The other annoying thing was degrees/radians - Sphinx expects radians, geokit and others expect degrees. The conversion can be done on the fly in UltraSphinx using :function_sql in your models.

    Curious to know: have you been able to limit the search radius for results? I’d like to filter on a range for @geodist (like 0-10km or whatever), but I can’t figure out how yet.

  2. kristina comments:

    Hi, Jeremy! Sorry for the really late response. I had just set up WordPress on the new web host then, and I did not realize I failed to configure comment moderation until just now.

    One of the things that threw me off too was the misspelling of longitude in the Riddle client - I submitted a patch and that change is now in UltraSphinx too.

    Great work. :-) Yeah, I actually read about that in the Ultrasphinx forum.

    The other annoying thing was degrees/radians - Sphinx expects radians, geokit and others expect degrees. The conversion can be done on the fly in UltraSphinx using :function_sql in your models.

    True, true, it has to be done using :function_sql. Sphinx might have kept their default angle measurement at radians for simplicity when computing, and I would actually like it to stay that way. Maybe on the Ultrasphinx side? For Riddle::Client.set_anchor (Riddle, again, which Ultrasphinx uses), we actually see:

    # Set the geo-anchor point - with the names of the attributes that contain
    # the latitude and longtitude, and the reference position.
    #
    # Example:
    #   client.set_anchor('lat', -37.767899, 'lon', 145.002451)
    #
    def set_anchor(lat_attr, lat, long_attr, long)
      @anchor = {
        :latitude_attribute   => lat_attr,
        :latitude             => lat,
        :longtitude_attribute => long_attr,
        :longtitude           => long
      }
    end
    

    The example call seems to imply that the author of Riddle would have it in degrees if he had a choice (although Sphinx would still interpret the value in the query generated as in radians). Personally, I’d still like to have the degree vs radian option in Ultrasphinx itself, and in no lower operating layer, though.

    (Actually, I think I’ll write the Riddle author a note about that example.)

    Curious to know: have you been able to limit the search radius for results? I’d like to filter on a range for @geodist (like 0-10km or whatever), but I can’t figure out how yet.

    Not yet. Have you tried the Fauna forums again? Last time I checked, there were developments (even involving geodist). I’ve been really busy in the past weeks.

    I looked around — Good musical compositions. :-) I actually downloaded all of them, partly out of curiosity too.

  3. kristina comments:

    Oops, my bad. That was an outdated Riddle version, the one with the Ultrasphinx version I have now. The example seems to be fixed in the current version already.

  4. Anatoliy comments:

    help me please to solve my problem:

    i have the following configuration of UltraSphinx:
    ——————————————————————————
    config/ultrasphinx/development.conf:
    ——————————————————————————

    sql_query = \
    SELECT user.id as user_id, \
    user.created_at as created_at, \
    profile.first_name as first_name, \
    profile.last_name as last_name, \
    profile.city_name as city_name, \
    profile.interests as interests, \
    country.id as country_id, \
    space.name as space_name, \
    profile_language.language_id as language_id \
    FROM users as user INNER JOIN my_prof as profile on profile.user_id=user.id \
    LEFT OUTER JOIN my_country as country on profile.country_id=country.id \
    LEFT OUTER JOIN my_space as space on space.owner_id=user.id \
    LEFT OUTER JOIN my_languages as profile_language on profile_language.profile_id=profile.id \
    WHERE user.id >= $start AND user.id ‘kjhui’).run” get a hash of the following keys(columns name) with value(result query):
    for example:
    :user_id => 1, :created_at => ‘12.05.08′, :first_name => ‘kjhui’, :last_name => ‘test’, :city_name => ‘Chech’, :interests => ‘football’, :country_id => 12, :space_name => ’street’, :language_id => 45

    How can i do it?

    Help me please.

  5. Anatoliy comments:

    sorry…

    help me please to solve it:

    i have the following configuration of UltraSphinx:
    ——————————————————————————
    config/ultrasphinx/development.conf:
    ——————————————————————————

    sql_query = \
    SELECT user.id as user_id, \
    user.created_at as created_at, \
    profile.first_name as first_name, \
    profile.last_name as last_name, \
    profile.city_name as city_name, \
    profile.interests as interests, \
    country.id as country_id, \
    space.name as space_name, \
    profile_language.language_id as language_id \
    FROM users as user INNER JOIN my_prof as profile on profile.user_id=user.id \
    LEFT OUTER JOIN my_country as country on profile.country_id=country.id \
    LEFT OUTER JOIN my_space as space on space.owner_id=user.id \
    LEFT OUTER JOIN my_languages as profile_language on profile_language.profile_id=profile.id \
    WHERE user.id >= $start AND user.id ‘kjhui’).run” get a hash of the following keys(columns name) with value(result query):
    for example:
    :user_id => 1, :created_at => ‘12.05.08′, :first_name => ‘kjhui’, :last_name => ‘test’, :city_name => ‘Chech’, :interests => ‘football’, :country_id => 12, :space_name => ’street’, :language_id => 45

    How can i do it?

    Help me please.

Leave a comment