ppolv’s blog

February 29, 2008

esolr, an erlang text search client library for Apache Solr

Filed under: erlang — Tags: , — ppolv @ 4:42 am

From the Apache Solr website:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

Nice, a full text search engine easily accessible from anywhere. Just HTTP, no special binding required.

I’ve just hacked esolr, a simple, almost untested and featureless erlang client for Solr ;-). Well, there wasn’t so many operations to implement really. The basic and usefull ones are

  • Add/Update documents esolr:add/1
  • Delete documents esolr:delete/1
  • Search esolr:search/2

Also, there are functions to perform commits to the index (to make all changes made since the last commit available for searching) and to optimize the index (a time consuming operation, see Solr documentation). Besides of issuing commits and optimize operations explicitly, the library also allows to perform that operations periodically at user-defined intervals. In the case of commits, these can also be specified to automatically take place after each add or delete operation (mainly usefull for development and not for production code).

Quick start:

  1. Install Solr 1.2
  2. Run it with the sample configuration provided (/example$ java -jar start.jar)
  3. Make sure that is correctly running, open a browser at http://localhost:8983/solr/admin/
  4. Get esolr from the trapexit forum
  5. Look at the html API documentation
  6. Compile the sources (RFC4627.erl, from http://www.lshift.net/blog/2007/02/17/json-and-json-rpc-for-erlang is included)
  7. Start the esolr library esolr:start_link()
  8. Play around

To compile, open an erlang console on the directory where the .erl files resides, and type:

28> c(rfc4627).
{ok,rfc4627}
29> c(esolr).
{ok,esolr}

then start the esolr process, using default configuration:

30>esolr:start_link().
{ok,}

Add some documents. Here we are adding two documents, one in each call to esolr:add/1. The id and name fields are defined in the sample Solr schema, id is.. you know, the ID for the document.

31> esolr:add([{doc,[{id,"a"},{name,<<"Look me mom!, I'm searching now">>}]}]).
ok
32> esolr:add([{doc,[{id,"b"},{name,<<"Yes, searching from the erlang console">>}]}]).
ok

Commit the changes.

33>esolr:commit().
ok

Search. We search for the word “search”, and specify that we want all the normal fields plus the document score for the query, that we want the result in ascendant order by id, and that we want the matchings highlighted for us.

34> esolr:search("search",[{fields,"*,score"},{sort,[{id,asc}]},{highlight,"name"}]).
{ok,[{"numFound",2},{"start",0},{"maxScore",0.880075}],
    [{doc,[{"id",<<"a">>},
           {"sku",<<"a">>},
           {"name",<<"Look me mom!, I'm searching now">>},
           {"popularity",0},
           {"timestamp",<<"2008-02-28T23:42:15.642Z">>},
           {"score",0.628625}]},
     {doc,[{"id",<<"b">>},
           {"sku",<<"b">>},
           {"name",<<"Yes, searching from the erlang console">>},
           {"popularity",0},
           {"timestamp",<<"2008-02-28T23:43:26.997Z">>},
           {"score",0.880075}]}],
    [{"highlighting",
      {obj,[{"a",
             {obj,[{"name",
                    [<<"Look me mom!, I'm <em>searching</em> now">>]}]}},
            {"b",
             {obj,[{"name",
                    [<<"Yes, <em>searching</em> from the erlang "...>>]}]}}]}}]}

Read the API docs to find all the functions/options implemented so far.

Have fun!

Advertisements

Create a free website or blog at WordPress.com.