Adding search to a Django site in a snap

16 August 2008

django, python, search


Search is a feature that is -- or at least, should be -- present on most sites containing dynamic or large content.

There are a few projects around to tackle that. Here's a non-exhaustive list: djangosearch, django-search (with a dash), django-sphinx.

Those search engines are great, but they seem like overkill if you just need a simple search feature for your CMS or blog.

To deal with that, I've come up with a generic and simple trick. All you need is copy/paste the following snippet anywhere in your project:

import re

from django.db.models import Q

def normalize_query(query_string,
                    findterms=re.compile(r'"([^"]+)"|(\S+)').findall,
                    normspace=re.compile(r'\s{2,}').sub):
    ''' Splits the query string in invidual keywords, getting rid of unecessary spaces
        and grouping quoted words together.
        Example:

        >>> normalize_query('  some random  words "with   quotes  " and   spaces')
        ['some', 'random', 'words', 'with quotes', 'and', 'spaces']

    '''
    return [normspace(' ', (t[0] or t[1]).strip()) for t in findterms(query_string)]

def get_query(query_string, search_fields):
    ''' Returns a query, that is a combination of Q objects. That combination
        aims to search keywords within a model by testing the given search fields.

    '''
    query = None # Query to search for every search term        
    terms = normalize_query(query_string)
    for term in terms:
        or_query = None # Query to search for a given term in each field
        for field_name in search_fields:
            q = Q(**{"%s__icontains" % field_name: term})
            if or_query is None:
                or_query = q
            else:
                or_query = or_query | q
        if query is None:
            query = or_query
        else:
            query = query & or_query
    return query

What the above does is generate a django.db.models.Q object (see doc) to search through your model, based on the query string and on the model's fields that you want to search. Importantly, it also analyses the query string by splitting out the key words and allowing words to be grouped by quotes. For example, out of the following query string...

'  some random  words "with   quotes  " and   spaces'

...the words 'some', 'random', 'words', 'with quotes', 'and', 'spaces' would actually be searched. It performs an AND search with all the given words, but you could easily customise it to do different kinds of search.

Then, your search view would become as simple as:

def search(request):
    query_string = ''
    found_entries = None
    if ('q' in request.GET) and request.GET['q'].strip():
        query_string = request.GET['q']

        entry_query = get_query(query_string, ['title', 'body',])

        found_entries = Entry.objects.filter(entry_query).order_by('-pub_date')

    return render_to_response('search/search_results.html',
                          { 'query_string': query_string, 'found_entries': found_entries },
                          context_instance=RequestContext(request))

And that's it! I use this on a site that has about 10,000 news items and it works pretty fast...

Now you have no excuse not to add a search box to your site! ;)