mirror of
				https://github.com/django/django.git
				synced 2025-10-26 07:06:08 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			494 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			494 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| ================
 | |
| Full text search
 | |
| ================
 | |
| 
 | |
| The database functions in the ``django.contrib.postgres.search`` module ease
 | |
| the use of PostgreSQL's `full text search engine
 | |
| <https://www.postgresql.org/docs/current/textsearch.html>`_.
 | |
| 
 | |
| For the examples in this document, we'll use the models defined in
 | |
| :doc:`/topics/db/queries`.
 | |
| 
 | |
| .. seealso::
 | |
| 
 | |
|     For a high-level overview of searching, see the :doc:`topic documentation
 | |
|     </topics/db/search>`.
 | |
| 
 | |
| .. currentmodule:: django.contrib.postgres.search
 | |
| 
 | |
| The ``search`` lookup
 | |
| =====================
 | |
| 
 | |
| .. fieldlookup:: search
 | |
| 
 | |
| A common way to use full text search is to search a single term against a
 | |
| single column in the database. For example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> Entry.objects.filter(body_text__search="Cheese")
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]>
 | |
| 
 | |
| This creates a ``to_tsvector`` in the database from the ``body_text`` field
 | |
| and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
 | |
| default database search configuration. The results are obtained by matching the
 | |
| query and the vector.
 | |
| 
 | |
| To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
 | |
| :setting:`INSTALLED_APPS`.
 | |
| 
 | |
| ``SearchVector``
 | |
| ================
 | |
| 
 | |
| .. class:: SearchVector(*expressions, config=None, weight=None)
 | |
| 
 | |
| Searching against a single field is great but rather limiting. The ``Entry``
 | |
| instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
 | |
| To query against both fields, use a ``SearchVector``:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchVector
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector("body_text", "blog__tagline"),
 | |
|     ... ).filter(search="Cheese")
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]>
 | |
| 
 | |
| The arguments to ``SearchVector`` can be any
 | |
| :class:`~django.db.models.Expression` or the name of a field. Multiple
 | |
| arguments will be concatenated together using a space so that the search
 | |
| document includes them all.
 | |
| 
 | |
| ``SearchVector`` objects can be combined together, allowing you to reuse them.
 | |
| For example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector("body_text") + SearchVector("blog__tagline"),
 | |
|     ... ).filter(search="Cheese")
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]>
 | |
| 
 | |
| See :ref:`postgresql-fts-search-configuration` and
 | |
| :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
 | |
| and ``weight`` parameters.
 | |
| 
 | |
| ``SearchQuery``
 | |
| ===============
 | |
| 
 | |
| .. class:: SearchQuery(value, config=None, search_type='plain')
 | |
| 
 | |
| ``SearchQuery`` translates the terms the user provides into a search query
 | |
| object that the database compares to a search vector. By default, all the words
 | |
| the user provides are passed through the stemming algorithms, and then it
 | |
| looks for matches for all of the resulting terms.
 | |
| 
 | |
| If ``search_type`` is ``'plain'``, which is the default, the terms are treated
 | |
| as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
 | |
| as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
 | |
| formatted search query with terms and operators. If ``search_type`` is
 | |
| ``'websearch'``, then you can provide a formatted search query, similar to the
 | |
| one used by web search engines. ``'websearch'`` requires PostgreSQL ≥ 11. Read
 | |
| PostgreSQL's `Full Text Search docs`_ to learn about differences and syntax.
 | |
| Examples:
 | |
| 
 | |
| .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, Lexeme
 | |
|     >>> SearchQuery("red tomato")  # two keywords
 | |
|     >>> SearchQuery("tomato red")  # same results as above
 | |
|     >>> SearchQuery("red tomato", search_type="phrase")  # a phrase
 | |
|     >>> SearchQuery("tomato red", search_type="phrase")  # a different phrase
 | |
|     >>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw")  # boolean operators
 | |
|     >>> SearchQuery(
 | |
|     ...     "'tomato' ('red' OR 'green')", search_type="websearch"
 | |
|     ... )  # websearch operators
 | |
|     >>> SearchQuery(Lexeme("tomato") & (Lexeme("red") | Lexeme("green")))  # Lexeme objects
 | |
| 
 | |
| ``SearchQuery`` terms can be combined logically to provide more flexibility:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery
 | |
|     >>> SearchQuery("meat") & SearchQuery("cheese")  # AND
 | |
|     >>> SearchQuery("meat") | SearchQuery("cheese")  # OR
 | |
|     >>> ~SearchQuery("meat")  # NOT
 | |
| 
 | |
| See :ref:`postgresql-fts-search-configuration` for an explanation of the
 | |
| ``config`` parameter.
 | |
| 
 | |
| .. versionchanged:: 6.0
 | |
| 
 | |
|     :class:`Lexeme` objects were added.
 | |
| 
 | |
| ``SearchRank``
 | |
| ==============
 | |
| 
 | |
| .. class:: SearchRank(vector, query, weights=None, normalization=None, cover_density=False)
 | |
| 
 | |
| So far, we've returned the results for which any match between the vector and
 | |
| the query are possible. It's likely you may wish to order the results by some
 | |
| sort of relevancy. PostgreSQL provides a ranking function which takes into
 | |
| account how often the query terms appear in the document, how close together
 | |
| the terms are in the document, and how important the part of the document is
 | |
| where they occur. The better the match, the higher the value of the rank. To
 | |
| order by relevancy:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
 | |
|     >>> vector = SearchVector("body_text")
 | |
|     >>> query = SearchQuery("cheese")
 | |
|     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]>
 | |
| 
 | |
| See :ref:`postgresql-fts-weighting-queries` for an explanation of the
 | |
| ``weights`` parameter.
 | |
| 
 | |
| Set the ``cover_density`` parameter to ``True`` to enable the cover density
 | |
| ranking, which means that the proximity of matching query terms is taken into
 | |
| account.
 | |
| 
 | |
| Provide an integer to the ``normalization`` parameter to control rank
 | |
| normalization. This integer is a bit mask, so you can combine multiple
 | |
| behaviors:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.db.models import Value
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     rank=SearchRank(
 | |
|     ...         vector,
 | |
|     ...         query,
 | |
|     ...         normalization=Value(2).bitor(Value(4)),
 | |
|     ...     )
 | |
|     ... )
 | |
| 
 | |
| The PostgreSQL documentation has more details about `different rank
 | |
| normalization options`_.
 | |
| 
 | |
| .. _different rank normalization options: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
 | |
| 
 | |
| ``SearchHeadline``
 | |
| ==================
 | |
| 
 | |
| .. class:: SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
 | |
| 
 | |
| Accepts a single text field or an expression, a query, a config, and a set of
 | |
| options. Returns highlighted search results.
 | |
| 
 | |
| Set the ``start_sel`` and ``stop_sel`` parameters to the string values to be
 | |
| used to wrap highlighted query terms in the document. PostgreSQL's defaults are
 | |
| ``<b>`` and ``</b>``.
 | |
| 
 | |
| Provide integer values to the ``max_words`` and ``min_words`` parameters to
 | |
| determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
 | |
| 15.
 | |
| 
 | |
| Provide an integer value to the ``short_word`` parameter to discard words of
 | |
| this length or less in each headline. PostgreSQL's default is 3.
 | |
| 
 | |
| Set the ``highlight_all`` parameter to ``True`` to use the whole document in
 | |
| place of a fragment and ignore ``max_words``, ``min_words``, and ``short_word``
 | |
| parameters. That's disabled by default in PostgreSQL.
 | |
| 
 | |
| Provide a non-zero integer value to the ``max_fragments`` to set the maximum
 | |
| number of fragments to display. That's disabled by default in PostgreSQL.
 | |
| 
 | |
| Set the ``fragment_delimiter`` string parameter to configure the delimiter
 | |
| between fragments. PostgreSQL's default is ``" ... "``.
 | |
| 
 | |
| The PostgreSQL documentation has more details on `highlighting search
 | |
| results`_.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
 | |
|     >>> query = SearchQuery("red tomato")
 | |
|     >>> entry = Entry.objects.annotate(
 | |
|     ...     headline=SearchHeadline(
 | |
|     ...         "body_text",
 | |
|     ...         query,
 | |
|     ...         start_sel="<span>",
 | |
|     ...         stop_sel="</span>",
 | |
|     ...     ),
 | |
|     ... ).get()
 | |
|     >>> print(entry.headline)
 | |
|     Sandwich with <span>tomato</span> and <span>red</span> cheese.
 | |
| 
 | |
| See :ref:`postgresql-fts-search-configuration` for an explanation of the
 | |
| ``config`` parameter.
 | |
| 
 | |
| .. _highlighting search results: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
 | |
| 
 | |
| .. _postgresql-fts-search-configuration:
 | |
| 
 | |
| Changing the search configuration
 | |
| =================================
 | |
| 
 | |
| You can specify the ``config`` attribute to a :class:`SearchVector` and
 | |
| :class:`SearchQuery` to use a different search configuration. This allows using
 | |
| different language parsers and dictionaries as defined by the database:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchVector
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector("body_text", config="french"),
 | |
|     ... ).filter(search=SearchQuery("œuf", config="french"))
 | |
|     <QuerySet [<Entry: Pain perdu>]>
 | |
| 
 | |
| The value of ``config`` could also be stored in another column:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.db.models import F
 | |
|     >>> Entry.objects.annotate(
 | |
|     ...     search=SearchVector("body_text", config=F("blog__language")),
 | |
|     ... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
 | |
|     <QuerySet [<Entry: Pain perdu>]>
 | |
| 
 | |
| .. _postgresql-fts-weighting-queries:
 | |
| 
 | |
| Weighting queries
 | |
| =================
 | |
| 
 | |
| Every field may not have the same relevance in a query, so you can set weights
 | |
| of various vectors before you combine them:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
 | |
|     >>> vector = SearchVector("body_text", weight="A") + SearchVector(
 | |
|     ...     "blog__tagline", weight="B"
 | |
|     ... )
 | |
|     >>> query = SearchQuery("cheese")
 | |
|     >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
 | |
|     ...     "rank"
 | |
|     ... )
 | |
| 
 | |
| The weight should be one of the following letters: D, C, B, A. By default,
 | |
| these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
 | |
| respectively. If you wish to weight them differently, pass a list of four
 | |
| floats to :class:`SearchRank` as ``weights`` in the same order above:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
 | |
|     >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")
 | |
| 
 | |
| ``Lexeme``
 | |
| ==========
 | |
| 
 | |
| .. versionadded:: 6.0
 | |
| 
 | |
| .. class:: Lexeme(value, output_field=None, *, invert=False, prefix=False, weight=None)
 | |
| 
 | |
| ``Lexeme`` objects allow search operators to be safely used with strings from
 | |
| an untrusted source. The content of each lexeme is escaped so that any
 | |
| operators that may exist in the string itself will not be interpreted.
 | |
| 
 | |
| You can combine lexemes with other lexemes using the ``&`` and ``|`` operators
 | |
| and also negate them with the ``~`` operator. For example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import SearchQuery, SearchVector, Lexeme
 | |
|     >>> vector = SearchVector("body_text", "blog__tagline")
 | |
|     >>> Entry.objects.annotate(search=vector).filter(
 | |
|     ...     search=SearchQuery(Lexeme("fruit") & Lexeme("dessert"))
 | |
|     ... )
 | |
|     <QuerySet [<Entry: Apple Crumble Recipes>, <Entry: Banana Split Recipes>]>
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> Entry.objects.annotate(search=vector).filter(
 | |
|     ...     search=SearchQuery(Lexeme("fruit") & Lexeme("dessert") & ~Lexeme("banana"))
 | |
|     ... )
 | |
|     <QuerySet [<Entry: Apple Crumble Recipes>]>
 | |
| 
 | |
| Lexeme objects also support term weighting and prefixes:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> Entry.objects.annotate(search=vector).filter(
 | |
|     ...     search=SearchQuery(Lexeme("Pizza") | Lexeme("Cheese"))
 | |
|     ... )
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]>
 | |
|     >>> Entry.objects.annotate(search=vector).filter(
 | |
|     ...     search=SearchQuery(Lexeme("Pizza") | Lexeme("Cheese", weight="A"))
 | |
|     ... )
 | |
|     <QuerySet [<Entry: Pizza recipes>]>
 | |
|     >>> Entry.objects.annotate(search=vector).filter(
 | |
|     ...     search=SearchQuery(Lexeme("za", prefix=True))
 | |
|     ... )
 | |
|     <QuerySet []>
 | |
| 
 | |
| Performance
 | |
| ===========
 | |
| 
 | |
| Special database configuration isn't necessary to use any of these functions,
 | |
| however, if you're searching more than a few hundred records, you're likely to
 | |
| run into performance problems. Full text search is a more intensive process
 | |
| than comparing the size of an integer, for example.
 | |
| 
 | |
| In the event that all the fields you're querying on are contained within one
 | |
| particular model, you can create a functional
 | |
| :class:`GIN <django.contrib.postgres.indexes.GinIndex>` or
 | |
| :class:`GiST <django.contrib.postgres.indexes.GistIndex>` index which matches
 | |
| the search vector you wish to use. For example::
 | |
| 
 | |
|     GinIndex(
 | |
|         SearchVector("body_text", "headline", config="english"),
 | |
|         name="search_vector_idx",
 | |
|     )
 | |
| 
 | |
| The PostgreSQL docs has details on `creating indexes for full text search
 | |
| <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
 | |
| 
 | |
| ``SearchVectorField``
 | |
| ---------------------
 | |
| 
 | |
| .. class:: SearchVectorField
 | |
| 
 | |
| If this approach becomes too slow, you can add a ``SearchVectorField`` to your
 | |
| model. You'll need to keep it populated with triggers, for example, as
 | |
| described in the `PostgreSQL documentation`_. You can then query the field as
 | |
| if it were an annotated ``SearchVector``:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> Entry.objects.update(search_vector=SearchVector("body_text"))
 | |
|     >>> Entry.objects.filter(search_vector="cheese")
 | |
|     <QuerySet [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]>
 | |
| 
 | |
| .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
 | |
| 
 | |
| Trigram similarity
 | |
| ==================
 | |
| 
 | |
| Another approach to searching is trigram similarity. A trigram is a group of
 | |
| three consecutive characters. In addition to the :lookup:`trigram_similar`,
 | |
| :lookup:`trigram_word_similar`, and :lookup:`trigram_strict_word_similar`
 | |
| lookups, you can use a couple of other expressions.
 | |
| 
 | |
| To use them, you need to activate the `pg_trgm extension
 | |
| <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
 | |
| install it using the
 | |
| :class:`~django.contrib.postgres.operations.TrigramExtension` migration
 | |
| operation.
 | |
| 
 | |
| ``TrigramSimilarity``
 | |
| ---------------------
 | |
| 
 | |
| .. class:: TrigramSimilarity(expression, string, **extra)
 | |
| 
 | |
| Accepts a field name or expression, and a string or expression. Returns the
 | |
| trigram similarity between the two arguments.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramSimilarity
 | |
|     >>> Author.objects.create(name="Katy Stevens")
 | |
|     >>> Author.objects.create(name="Stephen Keats")
 | |
|     >>> test = "Katie Stephens"
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     similarity=TrigramSimilarity("name", test),
 | |
|     ... ).filter(
 | |
|     ...     similarity__gt=0.3
 | |
|     ... ).order_by("-similarity")
 | |
|     <QuerySet [<Author: Katy Stevens>, <Author: Stephen Keats>]>
 | |
| 
 | |
| ``TrigramWordSimilarity``
 | |
| -------------------------
 | |
| 
 | |
| .. class:: TrigramWordSimilarity(string, expression, **extra)
 | |
| 
 | |
| Accepts a string or expression, and a field name or expression. Returns the
 | |
| trigram word similarity between the two arguments.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramWordSimilarity
 | |
|     >>> Author.objects.create(name="Katy Stevens")
 | |
|     >>> Author.objects.create(name="Stephen Keats")
 | |
|     >>> test = "Kat"
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     similarity=TrigramWordSimilarity(test, "name"),
 | |
|     ... ).filter(
 | |
|     ...     similarity__gt=0.3
 | |
|     ... ).order_by("-similarity")
 | |
|     <QuerySet [<Author: Katy Stevens>]>
 | |
| 
 | |
| ``TrigramStrictWordSimilarity``
 | |
| -------------------------------
 | |
| 
 | |
| .. class:: TrigramStrictWordSimilarity(string, expression, **extra)
 | |
| 
 | |
| Accepts a string or expression, and a field name or expression. Returns the
 | |
| trigram strict word similarity between the two arguments. Similar to
 | |
| :class:`TrigramWordSimilarity() <TrigramWordSimilarity>`, except that it forces
 | |
| extent boundaries to match word boundaries.
 | |
| 
 | |
| ``TrigramDistance``
 | |
| -------------------
 | |
| 
 | |
| .. class:: TrigramDistance(expression, string, **extra)
 | |
| 
 | |
| Accepts a field name or expression, and a string or expression. Returns the
 | |
| trigram distance between the two arguments.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramDistance
 | |
|     >>> Author.objects.create(name="Katy Stevens")
 | |
|     >>> Author.objects.create(name="Stephen Keats")
 | |
|     >>> test = "Katie Stephens"
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     distance=TrigramDistance("name", test),
 | |
|     ... ).filter(
 | |
|     ...     distance__lte=0.7
 | |
|     ... ).order_by("distance")
 | |
|     <QuerySet [<Author: Katy Stevens>, <Author: Stephen Keats>]>
 | |
| 
 | |
| ``TrigramWordDistance``
 | |
| -----------------------
 | |
| 
 | |
| .. class:: TrigramWordDistance(string, expression, **extra)
 | |
| 
 | |
| Accepts a string or expression, and a field name or expression. Returns the
 | |
| trigram word distance between the two arguments.
 | |
| 
 | |
| Usage example:
 | |
| 
 | |
| .. code-block:: pycon
 | |
| 
 | |
|     >>> from django.contrib.postgres.search import TrigramWordDistance
 | |
|     >>> Author.objects.create(name="Katy Stevens")
 | |
|     >>> Author.objects.create(name="Stephen Keats")
 | |
|     >>> test = "Kat"
 | |
|     >>> Author.objects.annotate(
 | |
|     ...     distance=TrigramWordDistance(test, "name"),
 | |
|     ... ).filter(
 | |
|     ...     distance__lte=0.7
 | |
|     ... ).order_by("distance")
 | |
|     <QuerySet [<Author: Katy Stevens>]>
 | |
| 
 | |
| ``TrigramStrictWordDistance``
 | |
| -----------------------------
 | |
| 
 | |
| .. class:: TrigramStrictWordDistance(string, expression, **extra)
 | |
| 
 | |
| Accepts a string or expression, and a field name or expression. Returns the
 | |
| trigram strict word distance between the two arguments.
 |