mirror of
https://github.com/django/django.git
synced 2025-10-23 21:59:11 +00:00
Refs #3254 -- Added full text search to contrib.postgres.
Adds a reasonably feature complete implementation of full text search using the built in PostgreSQL engine. It uses public APIs from Expression and Lookup. With thanks to Tim Graham, Simon Charettes, Josh Smeaton, Mikey Ariel and many others for their advice and review. Particular thanks also go to the supporters of the contrib.postgres kickstarter.
This commit is contained in:
@@ -37,4 +37,5 @@ release. Some fields require higher versions.
|
||||
functions
|
||||
lookups
|
||||
operations
|
||||
search
|
||||
validators
|
||||
|
191
docs/ref/contrib/postgres/search.txt
Normal file
191
docs/ref/contrib/postgres/search.txt
Normal file
@@ -0,0 +1,191 @@
|
||||
================
|
||||
Full text search
|
||||
================
|
||||
|
||||
.. versionadded:: 1.10
|
||||
|
||||
The database functions in the ``django.contrib.postgres.search`` module ease
|
||||
the use of PostgreSQL's `full text search engine
|
||||
<http://www.postgresql.org/docs/current/static/textsearch.html>`_.
|
||||
|
||||
For the examples in this document, we'll use the models defined in
|
||||
:doc:`/topics/db/queries`.
|
||||
|
||||
.. seealso::
|
||||
|
||||
For a high-level overview of searching, see the :doc:`topic documentation
|
||||
</topics/db/search>`.
|
||||
|
||||
.. currentmodule:: django.contrib.postgres.search
|
||||
|
||||
The ``search`` lookup
|
||||
=====================
|
||||
|
||||
.. fieldlookup:: search
|
||||
|
||||
The simplest way to use full text search is to search a single term against a
|
||||
single column in the database. For example::
|
||||
|
||||
>>> Entry.objects.filter(body_text__search='Cheese')
|
||||
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
||||
|
||||
This creates a ``to_tsvector`` in the database from the ``body_text`` field
|
||||
and a ``plainto_tsquery`` from the search term ``'Potato'``, both using the
|
||||
default database search configuration. The results are obtained by matching the
|
||||
query and the vector.
|
||||
|
||||
To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
|
||||
:setting:`INSTALLED_APPS`.
|
||||
|
||||
``SearchVector``
|
||||
================
|
||||
|
||||
.. class:: SearchVector(\*expressions, config=None, weight=None)
|
||||
|
||||
Searching against a single field is great but rather limiting. The ``Entry``
|
||||
instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
|
||||
To query against both fields, use a ``SearchVector``::
|
||||
|
||||
>>> from django.contrib.postgres.search import SearchVector
|
||||
>>> Entry.objects.annotate(
|
||||
... search=SearchVector('body_text', 'blog__tagline'),
|
||||
... ).filter(search='Cheese')
|
||||
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
||||
|
||||
The arguments to ``SearchVector`` can be any
|
||||
:class:`~django.db.models.Expression` or the name of a field. Multiple
|
||||
arguments will be concatenated together using a space so that the search
|
||||
document includes them all.
|
||||
|
||||
``SearchVector`` objects can be combined together, allowing you to reuse them.
|
||||
For example::
|
||||
|
||||
>>> Entry.objects.annotate(
|
||||
... search=SearchVector('body_text') + SearchVector('blog__tagline'),
|
||||
... ).filter(search='Cheese')
|
||||
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
||||
|
||||
See :ref:`postgresql-fts-search-configuration` and
|
||||
:ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
|
||||
and ``weight`` parameters.
|
||||
|
||||
``SearchQuery``
|
||||
===============
|
||||
|
||||
.. class:: SearchQuery(value, config=None)
|
||||
|
||||
``SearchQuery`` translates the terms the user provides into a search query
|
||||
object that the database compares to a search vector. By default, all the words
|
||||
the user provides are passed through the stemming algorithms, and then it
|
||||
looks for matches for all of the resulting terms.
|
||||
|
||||
``SearchQuery`` terms can be combined logically to provide more flexibility::
|
||||
|
||||
>>> from django.contrib.postgres.search import SearchQuery
|
||||
>>> SearchQuery('potato') & SearchQuery('ireland') # potato AND ireland
|
||||
>>> SearchQuery('potato') | SearchQuery('penguin') # potato OR penguin
|
||||
>>> ~SearchQuery('sausage') # NOT sausage
|
||||
|
||||
See :ref:`postgresql-fts-search-configuration` for an explanation of the
|
||||
``config`` parameter.
|
||||
|
||||
``SearchRank``
|
||||
==============
|
||||
|
||||
.. class:: SearchRank(vector, query, weights=None)
|
||||
|
||||
So far, we've just returned the results for which any match between the vector
|
||||
and the query are possible. It's likely you may wish to order the results by
|
||||
some sort of relevancy. PostgreSQL provides a ranking function which takes into
|
||||
account how often the query terms appear in the document, how close together
|
||||
the terms are in the document, and how important the part of the document is
|
||||
where they occur. The better the match, the higher the value of the rank. To
|
||||
order by relevancy::
|
||||
|
||||
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
||||
>>> vector = SearchVector('body_text')
|
||||
>>> query = SearchQuery('cheese')
|
||||
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
|
||||
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
||||
|
||||
See :ref:`postgresql-fts-weighting-queries` for an explanation of the
|
||||
``weights`` parameter.
|
||||
|
||||
.. _postgresql-fts-search-configuration:
|
||||
|
||||
Changing the search configuration
|
||||
=================================
|
||||
|
||||
You can specify the ``config`` attribute to a :class:`SearchVector` and
|
||||
:class:`SearchQuery` to use a different search configuration. This allows using
|
||||
a different language parsers and dictionaries as defined by the database::
|
||||
|
||||
>>> from django.contrib.postgres.search import SearchQuery, SearchVector
|
||||
>>> Entry.objects.annotate(
|
||||
... search=SearchVector('body_text', config='french'),
|
||||
... ).filter(search=SearchQuery('œuf', config='french'))
|
||||
[<Entry: Pain perdu>]
|
||||
|
||||
The value of ``config`` could also be stored in another column::
|
||||
|
||||
>>> from djanog.db.models import F
|
||||
>>> Entry.objects.annotate(
|
||||
... search=SearchVector('body_text', config=F('blog__language')),
|
||||
... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
|
||||
[<Entry: Pain perdu>]
|
||||
|
||||
.. _postgresql-fts-weighting-queries:
|
||||
|
||||
Weighting queries
|
||||
=================
|
||||
|
||||
Every field may not have the same relevance in a query, so you can set weights
|
||||
of various vectors before you combine them::
|
||||
|
||||
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
||||
>>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
|
||||
>>> query = SearchQuery('cheese')
|
||||
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
|
||||
|
||||
The weight should be one of the following letters: D, C, B, A. By default,
|
||||
these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
|
||||
respectively. If you wish to weight them differently, pass a list of four
|
||||
floats to :class:`SearchRank` as ``weights`` in the same order above::
|
||||
|
||||
>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
|
||||
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
|
||||
|
||||
Performance
|
||||
===========
|
||||
|
||||
Special database configuration isn't necessary to use any of these functions,
|
||||
however, if you're searching more than a few hundred records, you're likely to
|
||||
run into performance problems. Full text search is a more intensive process
|
||||
than comparing the size of an integer, for example.
|
||||
|
||||
In the event that all the fields you're querying on are contained within one
|
||||
particular model, you can create a functional index which matches the search
|
||||
vector you wish to use. For example:
|
||||
|
||||
.. code-block:: sql
|
||||
|
||||
CREATE INDEX body_text_search ON blog_entry (to_tsvector(body_text));
|
||||
|
||||
This index will then be used by subsequent queries. In many cases this will be
|
||||
sufficient.
|
||||
|
||||
``SearchVectorField``
|
||||
---------------------
|
||||
|
||||
.. class:: SearchVectorField
|
||||
|
||||
If this approach becomes too slow, you can add a ``SearchVectorField`` to your
|
||||
model. You'll need to keep it populated with triggers, for example, as
|
||||
described in the `PostgreSQL documentation`_. You can then query the field as
|
||||
if it were an annotated ``SearchVector``::
|
||||
|
||||
>>> Entry.objects.update(search_vector=SearchVector('body_text'))
|
||||
>>> Entry.objects.filter(search_vector='potato')
|
||||
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
||||
|
||||
.. _PostgreSQL documentation: http://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
|
Reference in New Issue
Block a user