mirror of
				https://github.com/django/django.git
				synced 2025-10-25 22:56:12 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			247 lines
		
	
	
		
			9.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			247 lines
		
	
	
		
			9.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| ================
 | |
| Full text search
 | |
| ================
 | |
| 
 | |
| .. versionadded:: 1.10
 | |
| 
 | |
| The database functions in the ``django.contrib.postgres.search`` module ease
 | |
| the use of PostgreSQL's `full text search engine
 | |
| <http://www.postgresql.org/docs/current/static/textsearch.html>`_.
 | |
| 
 | |
| For the examples in this document, we'll use the models defined in
 | |
| :doc:`/topics/db/queries`.
 | |
| 
 | |
| .. seealso::
 | |
| 
 | |
|     For a high-level overview of searching, see the :doc:`topic documentation
 | |
|     </topics/db/search>`.
 | |
| 
 | |
| .. currentmodule:: django.contrib.postgres.search
 | |
| 
 | |
| The ``search`` lookup
 | |
| =====================
 | |
| 
 | |
| .. fieldlookup:: search
 | |
| 
 | |
| The simplest way to use full text search is to search a single term against a
 | |
| single column in the database. For example::
 | |
| 
 | |
|     >>> Entry.objects.filter(body_text__search='Cheese')
 | |
|     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
 | |
| 
 | |
| This creates a ``to_tsvector`` in the database from the ``body_text`` field
 | |
| and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
 | |
| default database search configuration. The results are obtained by matching the
 | |
| query and the vector.
 | |
| 
 | |
| To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
 | |
| :setting:`INSTALLED_APPS`.
 | |
| 
 | |
| ``SearchVector``
 | |
| ================
 | |
| 
 | |
| .. class:: SearchVector(\*expressions, config=None, weight=None)
 | |
| 
 | |
| Searching against a single field is great but rather limiting. The ``Entry``
 | |
| instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
 | |
| To query against both fields, use a ``SearchVector``::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchVector
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector('body_text', 'blog__tagline'),
 | |
|     ... ).filter(search='Cheese')
 | |
|     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
 | |
| 
 | |
| The arguments to ``SearchVector`` can be any
 | |
| :class:`~django.db.models.Expression` or the name of a field. Multiple
 | |
| arguments will be concatenated together using a space so that the search
 | |
| document includes them all.
 | |
| 
 | |
| ``SearchVector`` objects can be combined together, allowing you to reuse them.
 | |
| For example::
 | |
| 
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector('body_text') + SearchVector('blog__tagline'),
 | |
|     ... ).filter(search='Cheese')
 | |
|     [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
 | |
| 
 | |
| See :ref:`postgresql-fts-search-configuration` and
 | |
| :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
 | |
| and ``weight`` parameters.
 | |
| 
 | |
| ``SearchQuery``
 | |
| ===============
 | |
| 
 | |
| .. class:: SearchQuery(value, config=None)
 | |
| 
 | |
| ``SearchQuery`` translates the terms the user provides into a search query
 | |
| object that the database compares to a search vector. By default, all the words
 | |
| the user provides are passed through the stemming algorithms, and then it
 | |
| looks for matches for all of the resulting terms.
 | |
| 
 | |
| ``SearchQuery`` terms can be combined logically to provide more flexibility::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery
 | |
|     >>> SearchQuery('potato') & SearchQuery('ireland')  # potato AND ireland
 | |
|     >>> SearchQuery('potato') | SearchQuery('penguin')  # potato OR penguin
 | |
|     >>> ~SearchQuery('sausage')  # NOT sausage
 | |
| 
 | |
| See :ref:`postgresql-fts-search-configuration` for an explanation of the
 | |
| ``config`` parameter.
 | |
| 
 | |
| ``SearchRank``
 | |
| ==============
 | |
| 
 | |
| .. class:: SearchRank(vector, query, weights=None)
 | |
| 
 | |
| So far, we've just returned the results for which any match between the vector
 | |
| and the query are possible. It's likely you may wish to order the results by
 | |
| some sort of relevancy. PostgreSQL provides a ranking function which takes into
 | |
| account how often the query terms appear in the document, how close together
 | |
| the terms are in the document, and how important the part of the document is
 | |
| where they occur. The better the match, the higher the value of the rank. To
 | |
| order by relevancy::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
 | |
|     >>> vector = SearchVector('body_text')
 | |
|     >>> query = SearchQuery('cheese')
 | |
|     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
 | |
|     [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
 | |
| 
 | |
| See :ref:`postgresql-fts-weighting-queries` for an explanation of the
 | |
| ``weights`` parameter.
 | |
| 
 | |
| .. _postgresql-fts-search-configuration:
 | |
| 
 | |
| Changing the search configuration
 | |
| =================================
 | |
| 
 | |
| You can specify the ``config`` attribute to a :class:`SearchVector` and
 | |
| :class:`SearchQuery` to use a different search configuration. This allows using
 | |
| a different language parsers and dictionaries as defined by the database::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchVector
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector('body_text', config='french'),
 | |
|     ... ).filter(search=SearchQuery('œuf', config='french'))
 | |
|     [<Entry: Pain perdu>]
 | |
| 
 | |
| The value of ``config`` could also be stored in another column::
 | |
| 
 | |
|     >>> from djanog.db.models import F
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector('body_text', config=F('blog__language')),
 | |
|     ... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
 | |
|     [<Entry: Pain perdu>]
 | |
| 
 | |
| .. _postgresql-fts-weighting-queries:
 | |
| 
 | |
| Weighting queries
 | |
| =================
 | |
| 
 | |
| Every field may not have the same relevance in a query, so you can set weights
 | |
| of various vectors before you combine them::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
 | |
|     >>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
 | |
|     >>> query = SearchQuery('cheese')
 | |
|     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
 | |
| 
 | |
| The weight should be one of the following letters: D, C, B, A. By default,
 | |
| these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
 | |
| respectively. If you wish to weight them differently, pass a list of four
 | |
| floats to :class:`SearchRank` as ``weights`` in the same order above::
 | |
| 
 | |
|     >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
 | |
|     >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
 | |
| 
 | |
| Performance
 | |
| ===========
 | |
| 
 | |
| Special database configuration isn't necessary to use any of these functions,
 | |
| however, if you're searching more than a few hundred records, you're likely to
 | |
| run into performance problems. Full text search is a more intensive process
 | |
| than comparing the size of an integer, for example.
 | |
| 
 | |
| In the event that all the fields you're querying on are contained within one
 | |
| particular model, you can create a functional index which matches the search
 | |
| vector you wish to use. For example:
 | |
| 
 | |
| .. code-block:: sql
 | |
| 
 | |
|     CREATE INDEX body_text_search ON blog_entry (to_tsvector(body_text));
 | |
| 
 | |
| This index will then be used by subsequent queries. In many cases this will be
 | |
| sufficient.
 | |
| 
 | |
| ``SearchVectorField``
 | |
| ---------------------
 | |
| 
 | |
| .. class:: SearchVectorField
 | |
| 
 | |
| If this approach becomes too slow, you can add a ``SearchVectorField`` to your
 | |
| model. You'll need to keep it populated with triggers, for example, as
 | |
| described in the `PostgreSQL documentation`_. You can then query the field as
 | |
| if it were an annotated ``SearchVector``::
 | |
| 
 | |
|     >>> Entry.objects.update(search_vector=SearchVector('body_text'))
 | |
|     >>> Entry.objects.filter(search_vector='cheese')
 | |
|     [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
 | |
| 
 | |
| .. _PostgreSQL documentation: http://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
 | |
| 
 | |
| Trigram similarity
 | |
| ==================
 | |
| 
 | |
| Another approach to searching is trigram similarity. A trigram is a group of
 | |
| three consecutive characters. In addition to the :lookup:`trigram_similar`
 | |
| lookup, you can use a couple of other expressions.
 | |
| 
 | |
| To use them, you need to activate the `pg_trgm extension
 | |
| <http://www.postgresql.org/docs/current/interactive/pgtrgm.html>`_ on
 | |
| PostgreSQL. You can install it using the
 | |
| :class:`~django.contrib.postgres.operations.TrigramExtension` migration
 | |
| operation.
 | |
| 
 | |
| ``TrigramSimilarity``
 | |
| ---------------------
 | |
| 
 | |
| .. class:: TrigramSimilarity(expression, string, **extra)
 | |
| 
 | |
| .. versionadded:: 1.10
 | |
| 
 | |
| Accepts a field name or expression, and a string or expression. Returns the
 | |
| trigram similarity between the two arguments.
 | |
| 
 | |
| Usage example::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramSimilarity
 | |
|     >>> Author.objects.create(name='Katy Stevens')
 | |
|     >>> Author.objects.create(name='Stephen Keats')
 | |
|     >>> test = 'Katie Stephens'
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     similarity=TrigramSimilarity('name', test),
 | |
|     ... ).filter(similarity__gt=0.3).order_by('-similarity')
 | |
|     [<Author: Katy Stephens>, <Author: Stephen Keats>]
 | |
| 
 | |
| ``TrigramDistance``
 | |
| -------------------
 | |
| 
 | |
| .. class:: TrigramDistance(expression, string, **extra)
 | |
| 
 | |
| .. versionadded:: 1.10
 | |
| 
 | |
| Accepts a field name or expression, and a string or expression. Returns the
 | |
| trigram distance between the two arguments.
 | |
| 
 | |
| Usage example::
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramDistance
 | |
|     >>> Author.objects.create(name='Katy Stevens')
 | |
|     >>> Author.objects.create(name='Stephen Keats')
 | |
|     >>> test = 'Katie Stephens'
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     distance=TrigramDistance('name', test),
 | |
|     ... ).filter(distance__lte=0.7).order_by('distance')
 | |
|     [<Author: Katy Stephens>, <Author: Stephen Keats>]
 |