ppolv’s blog

February 29, 2008

esolr, an erlang text search client library for Apache Solr

Filed under: erlang — Tags: , — ppolv @ 4:42 am

From the Apache Solr website:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

Nice, a full text search engine easily accessible from anywhere. Just HTTP, no special binding required.

I’ve just hacked esolr, a simple, almost untested and featureless erlang client for Solr ;-). Well, there wasn’t so many operations to implement really. The basic and usefull ones are

  • Add/Update documents esolr:add/1
  • Delete documents esolr:delete/1
  • Search esolr:search/2

Also, there are functions to perform commits to the index (to make all changes made since the last commit available for searching) and to optimize the index (a time consuming operation, see Solr documentation). Besides of issuing commits and optimize operations explicitly, the library also allows to perform that operations periodically at user-defined intervals. In the case of commits, these can also be specified to automatically take place after each add or delete operation (mainly usefull for development and not for production code).

Quick start:

  1. Install Solr 1.2
  2. Run it with the sample configuration provided (/example$ java -jar start.jar)
  3. Make sure that is correctly running, open a browser at http://localhost:8983/solr/admin/
  4. Get esolr from the trapexit forum
  5. Look at the html API documentation
  6. Compile the sources (RFC4627.erl, from http://www.lshift.net/blog/2007/02/17/json-and-json-rpc-for-erlang is included)
  7. Start the esolr library esolr:start_link()
  8. Play around

To compile, open an erlang console on the directory where the .erl files resides, and type:

28> c(rfc4627).
{ok,rfc4627}
29> c(esolr).
{ok,esolr}

then start the esolr process, using default configuration:

30>esolr:start_link().
{ok,}

Add some documents. Here we are adding two documents, one in each call to esolr:add/1. The id and name fields are defined in the sample Solr schema, id is.. you know, the ID for the document.

31> esolr:add([{doc,[{id,"a"},{name,<<"Look me mom!, I'm searching now">>}]}]).
ok
32> esolr:add([{doc,[{id,"b"},{name,<<"Yes, searching from the erlang console">>}]}]).
ok

Commit the changes.

33>esolr:commit().
ok

Search. We search for the word “search”, and specify that we want all the normal fields plus the document score for the query, that we want the result in ascendant order by id, and that we want the matchings highlighted for us.

34> esolr:search("search",[{fields,"*,score"},{sort,[{id,asc}]},{highlight,"name"}]).
{ok,[{"numFound",2},{"start",0},{"maxScore",0.880075}],
    [{doc,[{"id",<<"a">>},
           {"sku",<<"a">>},
           {"name",<<"Look me mom!, I'm searching now">>},
           {"popularity",0},
           {"timestamp",<<"2008-02-28T23:42:15.642Z">>},
           {"score",0.628625}]},
     {doc,[{"id",<<"b">>},
           {"sku",<<"b">>},
           {"name",<<"Yes, searching from the erlang console">>},
           {"popularity",0},
           {"timestamp",<<"2008-02-28T23:43:26.997Z">>},
           {"score",0.880075}]}],
    [{"highlighting",
      {obj,[{"a",
             {obj,[{"name",
                    [<<"Look me mom!, I'm <em>searching</em> now">>]}]}},
            {"b",
             {obj,[{"name",
                    [<<"Yes, <em>searching</em> from the erlang "...>>]}]}}]}}]}

Read the API docs to find all the functions/options implemented so far.

Have fun!

About these ads

8 Comments »

  1. Very nice work! Excellent idea.

    Comment by Bruce Kissinger — February 29, 2008 @ 2:57 pm

  2. That is a great idea. I just picked up Erlang two days back and i am very much excited about the concurrency. Do you think running Solr in the embedded mode is better? Should we have to use thrift or is there a better way to integrate java apps into erlang.

    Thanks
    Bharani

    Comment by Bharani — May 20, 2008 @ 3:10 pm

  3. Hello,
    I don’t think that embed solr directly into your application would give you any advantage. Embed relational/object databases could be useful, both for performance reason and easy of use (mnesia,sqlite,zodb,..). But for a full-text search engine, the network latency and marshalling-unmarshalling cost won’t be a factor IMHO, as the cost of searching is generally higher. People from solr didn’t recommend embedding solr neither:
    “The simplest, safest, way to use Solr is via Solr’s standard HTTP interfaces. Embedding Solr is less flexible, harder to support, not as well tested, and should be reserved for special circumstances.” (http://wiki.apache.org/solr/EmbeddedSolr)
    As for erlang-java integration, you should take a look at jinterface http://www.erlang.org/doc/apps/jinterface/index.html and http://www.theserverside.com/tt/articles/article.tss?l=IntegratingJavaandErlang
    I don’t have any experience using it.
    Thrift seems to be gaining lots of popularity now, after facebook announcement. There have been an interesting discussion in the erlang maillist about that, http://www.erlang.org/pipermail/erlang-questions/2008-May/035036.html

    Comment by ppolv — May 22, 2008 @ 3:12 pm

  4. hi, this is so cool!
    i’ve tested esolr for a couple of hours.
    and i think i found a bug. the Option setting for URL didn’t work for me.
    “search_url” and “select_url” should have the same name in the source code, i think.
    and i recommend you the name “select_url” as i did :)

    and thanks for nice work!

    M.W.Park

    Comment by manywaypark — June 30, 2008 @ 2:19 pm

  5. 버그 리포팅의 즐거움(?) – esolr bug report…

    esolr를 사용하려고 좀 만지작 거리다가, 버그를 하나 발견해서, 리포팅했다. 설명에 나온 기본 세팅으로 하면 잘 되는 것처럼 보이지만, select_url과 search_url 두개의 key가 Option을 설정할 때와 읽…

    Trackback by 개발, 검색, 함수 — June 30, 2008 @ 2:31 pm

  6. Hi Park, glad you like the idea :-)

    thanks for the report, indeed the documentation is wrong.
    Afraid I’m not using this library currently, but I’ll try to fix the documentation and do a little refactoring when I manage to get more free time ;-).

    Did you find the library useful?, It was mostly a personal,proof-of-concept hack; but worked fine for me so I made it public available.

    Comment by ppolv — July 2, 2008 @ 2:55 am

  7. “esolr, an erlang text search client library for Apache Solr | ppolv’s blog” was in fact a superb blog, can not help but wait to look over even more of your blog posts. Time to waste several time on-line hehe. Thanks for your effort -Porfirio

    Comment by Selena — May 31, 2013 @ 5:57 am

  8. I consider this amazing article , “esolr, an
    erlang text search client library for Apache Solr | ppolv’s blog”, quite compelling and also the post was a superb read. Thanks,Fay

    Comment by Jacquelyn — August 3, 2013 @ 11:04 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: