mirror of
https://github.com/django/django.git
synced 2025-01-18 22:33:44 +00:00
Created a 'DB optimization' topic, with cross-refs to relevant sections.
Also fixed #10291, which was related, and cleaned up some inconsistent doc labels. git-svn-id: http://code.djangoproject.com/svn/django/trunk@12229 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
parent
19fad16414
commit
2e9518bb39
@ -3,6 +3,8 @@
|
||||
FAQ: Databases and models
|
||||
=========================
|
||||
|
||||
.. _faq-see-raw-sql-queries:
|
||||
|
||||
How can I see the raw SQL queries Django is running?
|
||||
----------------------------------------------------
|
||||
|
||||
|
@ -71,7 +71,8 @@ The model layer
|
||||
* **Other:**
|
||||
:ref:`Supported databases <ref-databases>` |
|
||||
:ref:`Legacy databases <howto-legacy-databases>` |
|
||||
:ref:`Providing initial data <howto-initial-data>`
|
||||
:ref:`Providing initial data <howto-initial-data>` |
|
||||
:ref:`Optimize database access <topics-db-optimization>`
|
||||
|
||||
The template layer
|
||||
==================
|
||||
|
@ -66,6 +66,18 @@ You can evaluate a ``QuerySet`` in the following ways:
|
||||
iterating over a ``QuerySet`` will take advantage of your database to
|
||||
load data and instantiate objects only as you need them.
|
||||
|
||||
* **bool().** Testing a ``QuerySet`` in a boolean context, such as using
|
||||
``bool()``, ``or``, ``and`` or an ``if`` statement, will cause the query
|
||||
to be executed. If there is at least one result, the ``QuerySet`` is
|
||||
``True``, otherwise ``False``. For example::
|
||||
|
||||
if Entry.objects.filter(headline="Test"):
|
||||
print "There is at least one Entry with the headline Test"
|
||||
|
||||
Note: *Don't* use this if all you want to do is determine if at least one
|
||||
result exists, and don't need the actual objects. It's more efficient to
|
||||
use ``exists()`` (see below).
|
||||
|
||||
.. _pickling QuerySets:
|
||||
|
||||
Pickling QuerySets
|
||||
@ -302,7 +314,7 @@ a model which defines a default ordering, or when using
|
||||
ordering was undefined prior to calling ``reverse()``, and will remain
|
||||
undefined afterward).
|
||||
|
||||
.. _querysets-distinct:
|
||||
.. _queryset-distinct:
|
||||
|
||||
``distinct()``
|
||||
~~~~~~~~~~~~~~
|
||||
@ -336,6 +348,8 @@ query spans multiple tables, it's possible to get duplicate results when a
|
||||
``values()`` call.
|
||||
|
||||
|
||||
.. _queryset-values:
|
||||
|
||||
``values(*fields)``
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -616,7 +630,7 @@ call, since they are conflicting options.
|
||||
Both the ``depth`` argument and the ability to specify field names in the call
|
||||
to ``select_related()`` are new in Django version 1.0.
|
||||
|
||||
.. _extra:
|
||||
.. _queryset-extra:
|
||||
|
||||
``extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -1062,17 +1076,18 @@ Example::
|
||||
|
||||
If you pass ``in_bulk()`` an empty list, you'll get an empty dictionary.
|
||||
|
||||
.. _queryset-iterator:
|
||||
|
||||
``iterator()``
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Evaluates the ``QuerySet`` (by performing the query) and returns an
|
||||
`iterator`_ over the results. A ``QuerySet`` typically reads all of
|
||||
its results and instantiates all of the corresponding objects the
|
||||
first time you access it; ``iterator()`` will instead read results and
|
||||
instantiate objects in discrete chunks, yielding them one at a
|
||||
time. For a ``QuerySet`` which returns a large number of objects, this
|
||||
often results in better performance and a significant reduction in
|
||||
memory use.
|
||||
`iterator`_ over the results. A ``QuerySet`` typically caches its
|
||||
results internally so that repeated evaluations do not result in
|
||||
additional queries; ``iterator()`` will instead read results directly,
|
||||
without doing any caching at the ``QuerySet`` level. For a
|
||||
``QuerySet`` which returns a large number of objects, this often
|
||||
results in better performance and a significant reduction in memory
|
||||
|
||||
Note that using ``iterator()`` on a ``QuerySet`` which has already
|
||||
been evaluated will force it to evaluate again, repeating the query.
|
||||
|
@ -353,7 +353,7 @@ without any harmful effects, since that is already playing a role in the
|
||||
query.
|
||||
|
||||
This behavior is the same as that noted in the queryset documentation for
|
||||
:ref:`distinct() <querysets-distinct>` and the general rule is the same:
|
||||
:ref:`distinct() <queryset-distinct>` and the general rule is the same:
|
||||
normally you won't want extra columns playing a part in the result, so clear
|
||||
out the ordering, or at least make sure it's restricted only to those fields
|
||||
you also select in a ``values()`` call.
|
||||
|
@ -17,3 +17,4 @@ model maps to a single database table.
|
||||
sql
|
||||
transactions
|
||||
multi-db
|
||||
optimization
|
||||
|
263
docs/topics/db/optimization.txt
Normal file
263
docs/topics/db/optimization.txt
Normal file
@ -0,0 +1,263 @@
|
||||
.. _topics-db-optimization:
|
||||
|
||||
============================
|
||||
Database access optimization
|
||||
============================
|
||||
|
||||
Django's database layer provides various ways to help developers get the most
|
||||
out of their databases. This documents gathers together links to the relevant
|
||||
documentation, and adds various tips, organized under an number of headings that
|
||||
outline the steps to take when attempting to optimize your database usage.
|
||||
|
||||
Profile first
|
||||
=============
|
||||
|
||||
As general programming practice, this goes without saying. Find out :ref:`what
|
||||
queries you are doing and what they are costing you
|
||||
<faq-see-raw-sql-queries>`. You may also want to use an external project like
|
||||
'django-debug-toolbar', or a tool that monitors your database directly.
|
||||
|
||||
Remember that you may be optimizing for speed or memory or both, depending on
|
||||
your requirements. Sometimes optimizing for one will be detrimental to the
|
||||
other, but sometimes they will help each other. Also, work that is done by the
|
||||
database process might not have the same cost (to you) as the same amount of
|
||||
work done in your Python process. It is up to you to decide what your
|
||||
priorities are, where the balance must lie, and profile all of these as required
|
||||
since this will depend on your application and server.
|
||||
|
||||
With everything that follows, remember to profile after every change to ensure
|
||||
that the change is a benefit, and a big enough benefit given the decrease in
|
||||
readability of your code. **All** of the suggestions below come with the caveat
|
||||
that in your circumstances the general principle might not apply, or might even
|
||||
be reversed.
|
||||
|
||||
Use standard DB optimization techniques
|
||||
=======================================
|
||||
|
||||
...including:
|
||||
|
||||
* Indexes. This is a number one priority, *after* you have determined from
|
||||
profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add
|
||||
these from Django.
|
||||
|
||||
* Appropriate use of field types.
|
||||
|
||||
We will assume you have done the obvious things above. The rest of this document
|
||||
focuses on how to use Django in such a way that you are not doing unnecessary
|
||||
work. This document also does not address other optimization techniques that
|
||||
apply to all expensive operations, such as :ref:`general purpose caching
|
||||
<topics-cache>`.
|
||||
|
||||
Understand QuerySets
|
||||
====================
|
||||
|
||||
Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good
|
||||
performance with simple code. In particular:
|
||||
|
||||
Understand QuerySet evaluation
|
||||
------------------------------
|
||||
|
||||
To avoid performance problems, it is important to understand:
|
||||
|
||||
* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
|
||||
|
||||
* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
|
||||
|
||||
* how :ref:`the data is held in memory <caching-and-querysets>`.
|
||||
|
||||
Understand cached attributes
|
||||
----------------------------
|
||||
|
||||
As well as caching of the whole ``QuerySet``, there is caching of the result of
|
||||
attributes on ORM objects. In general, attributes that are not callable will be
|
||||
cached. For example, assuming the :ref:`example weblog models
|
||||
<queryset-model-example>`:
|
||||
|
||||
>>> entry = Entry.objects.get(id=1)
|
||||
>>> entry.blog # Blog object is retrieved at this point
|
||||
>>> entry.blog # cached version, no DB access
|
||||
|
||||
But in general, callable attributes cause DB lookups every time::
|
||||
|
||||
>>> entry = Entry.objects.get(id=1)
|
||||
>>> entry.authors.all() # query performed
|
||||
>>> entry.authors.all() # query performed again
|
||||
|
||||
Be careful when reading template code - the template system does not allow use
|
||||
of parentheses, but will call callables automatically, hiding the above
|
||||
distinction.
|
||||
|
||||
Be careful with your own custom properties - it is up to you to implement
|
||||
caching.
|
||||
|
||||
Use the ``with`` template tag
|
||||
-----------------------------
|
||||
|
||||
To make use of the caching behaviour of ``QuerySet``, you may need to use the
|
||||
:ttag:`with` template tag.
|
||||
|
||||
Use ``iterator()``
|
||||
------------------
|
||||
|
||||
When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
|
||||
cause a large amount of memory to be used. In this case,
|
||||
:ref:`QuerySet.iterator() <queryset-iterator>` may help.
|
||||
|
||||
Do database work in the database rather than in Python
|
||||
======================================================
|
||||
|
||||
For instance:
|
||||
|
||||
* At the most basic level, use :ref:`filter and exclude <queryset-api>` to
|
||||
filtering in the database to avoid loading data into your Python process, only
|
||||
to throw much of it away.
|
||||
|
||||
* Use :ref:`F() object query expressions <query-expressions>` to do filtering
|
||||
against other fields within the same model.
|
||||
|
||||
* Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`.
|
||||
|
||||
If these aren't enough to generate the SQL you need:
|
||||
|
||||
Use ``QuerySet.extra()``
|
||||
------------------------
|
||||
|
||||
A less portable but more powerful method is :ref:`QuerySet.extra()
|
||||
<queryset-extra>`, which allows some SQL to be explicitly added to the query.
|
||||
If that still isn't powerful enough:
|
||||
|
||||
Use raw SQL
|
||||
-----------
|
||||
|
||||
Write your own :ref:`custom SQL to retrieve data or populate models
|
||||
<topics-db-sql>`. Use ``django.db.connection.queries`` to find out what Django
|
||||
is writing for you and start from there.
|
||||
|
||||
Retrieve everything at once if you know you will need it
|
||||
========================================================
|
||||
|
||||
Hitting the database multiple times for different parts of a single 'set' of
|
||||
data that you will need all parts of is, in general, less efficient than
|
||||
retrieving it all in one query. This is particularly important if you have a
|
||||
query that is executed in a loop, and could therefore end up doing many database
|
||||
queries, when only one was needed. So:
|
||||
|
||||
Use ``QuerySet.select_related()``
|
||||
---------------------------------
|
||||
|
||||
Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
|
||||
|
||||
* in view code,
|
||||
|
||||
* and in :ref:`managers and default managers <topics-db-managers>` where
|
||||
appropriate. Be aware when your manager is and is not used; sometimes this is
|
||||
tricky so don't make assumptions.
|
||||
|
||||
Don't retrieve things you don't need
|
||||
====================================
|
||||
|
||||
Use ``QuerySet.values()`` and ``values_list()``
|
||||
-----------------------------------------------
|
||||
|
||||
When you just want a dict/list of values, and don't need ORM model objects, make
|
||||
appropriate usage of :ref:`QuerySet.values() <queryset-values>`.
|
||||
These can be useful for replacing model objects in template code - as long as
|
||||
the dicts you supply have the same attributes as those used in the template, you
|
||||
are fine.
|
||||
|
||||
Use ``QuerySet.defer()`` and ``only()``
|
||||
---------------------------------------
|
||||
|
||||
Use :ref:`defer() and only() <queryset-defer>` if there are database columns you
|
||||
know that you won't need (or won't need in most cases) to avoid loading
|
||||
them. Note that if you *do* use them, the ORM will have to go and get them in a
|
||||
separate query, making this a pessimization if you use it inappropriately.
|
||||
|
||||
Use QuerySet.count()
|
||||
--------------------
|
||||
|
||||
...if you only want the count, rather than doing ``len(queryset)``.
|
||||
|
||||
Use QuerySet.exists()
|
||||
---------------------
|
||||
|
||||
...if you only want to find out if at least one result exists, rather than ``if
|
||||
queryset``.
|
||||
|
||||
But:
|
||||
|
||||
Don't overuse ``count()`` and ``exists()``
|
||||
------------------------------------------
|
||||
|
||||
If you are going to need other data from the QuerySet, just evaluate it.
|
||||
|
||||
For example, assuming an Email class that has a ``body`` attribute and a
|
||||
many-to-many relation to User, the following template code is optimal:
|
||||
|
||||
.. code-block:: html+django
|
||||
|
||||
{% if display_inbox %}
|
||||
{% with user.emails.all as emails %}
|
||||
{% if emails %}
|
||||
<p>You have {{ emails|length }} email(s)</p>
|
||||
{% for email in emails %}
|
||||
<p>{{ email.body }}</p>
|
||||
{% endfor %}
|
||||
{% else %}
|
||||
<p>No messages today.</p>
|
||||
{% endif %}
|
||||
{% endwith %}
|
||||
{% endif %}
|
||||
|
||||
|
||||
It is optimal because:
|
||||
|
||||
1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
|
||||
|
||||
#. Use of ``with`` means that we store ``user.emails.all`` in a variable for
|
||||
later use, allowing its cache to be re-used.
|
||||
|
||||
#. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
|
||||
which causes the ``user.emails.all()`` query to be run on the database, and
|
||||
at the least the first line to be turned into an ORM object. If there aren't
|
||||
any results, it will return False, otherwise True.
|
||||
|
||||
#. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
|
||||
out the rest of the cache without doing another query.
|
||||
|
||||
#. The ``for`` loop iterates over the already filled cache.
|
||||
|
||||
In total, this code does either one or zero database queries. The only
|
||||
deliberate optimization performed is the use of the ``with`` tag. Using
|
||||
``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
|
||||
additional queries.
|
||||
|
||||
Use ``QuerySet.update()`` and ``delete()``
|
||||
------------------------------------------
|
||||
|
||||
Rather than retrieve a load of objects, set some values, and save them
|
||||
individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
|
||||
<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
|
||||
<topics-db-queries-delete>` where possible.
|
||||
|
||||
Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()``
|
||||
methods of individual instances, which means that any custom behaviour you have
|
||||
added for these methods will not be executed, including anything driven from the
|
||||
normal database object :ref:`signals <ref-signals>`.
|
||||
|
||||
Don't retrieve things you already have
|
||||
======================================
|
||||
|
||||
Use foreign key values directly
|
||||
-------------------------------
|
||||
|
||||
If you only need a foreign key value, use the foreign key value that is already on
|
||||
the object you've got, rather than getting the whole related object and taking
|
||||
its primary key. i.e. do::
|
||||
|
||||
entry.blog_id
|
||||
|
||||
instead of::
|
||||
|
||||
entry.blog.id
|
||||
|
Loading…
x
Reference in New Issue
Block a user