mirror of
				https://github.com/django/django.git
				synced 2025-10-25 14:46:09 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			341 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			341 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| ============================
 | |
| Database access optimization
 | |
| ============================
 | |
| 
 | |
| Django's database layer provides various ways to help developers get the most
 | |
| out of their databases. This document gathers together links to the relevant
 | |
| documentation, and adds various tips, organized under a number of headings that
 | |
| outline the steps to take when attempting to optimize your database usage.
 | |
| 
 | |
| Profile first
 | |
| =============
 | |
| 
 | |
| As general programming practice, this goes without saying. Find out :ref:`what
 | |
| queries you are doing and what they are costing you
 | |
| <faq-see-raw-sql-queries>`. You may also want to use an external project like
 | |
| django-debug-toolbar_, or a tool that monitors your database directly.
 | |
| 
 | |
| Remember that you may be optimizing for speed or memory or both, depending on
 | |
| your requirements. Sometimes optimizing for one will be detrimental to the
 | |
| other, but sometimes they will help each other. Also, work that is done by the
 | |
| database process might not have the same cost (to you) as the same amount of
 | |
| work done in your Python process. It is up to you to decide what your
 | |
| priorities are, where the balance must lie, and profile all of these as required
 | |
| since this will depend on your application and server.
 | |
| 
 | |
| With everything that follows, remember to profile after every change to ensure
 | |
| that the change is a benefit, and a big enough benefit given the decrease in
 | |
| readability of your code. **All** of the suggestions below come with the caveat
 | |
| that in your circumstances the general principle might not apply, or might even
 | |
| be reversed.
 | |
| 
 | |
| .. _django-debug-toolbar: https://github.com/django-debug-toolbar/django-debug-toolbar/
 | |
| 
 | |
| Use standard DB optimization techniques
 | |
| =======================================
 | |
| 
 | |
| ...including:
 | |
| 
 | |
| * Indexes. This is a number one priority, *after* you have determined from
 | |
|   profiling what indexes should be added. Use
 | |
|   :attr:`django.db.models.Field.db_index` to add these from Django.
 | |
| 
 | |
| * Appropriate use of field types.
 | |
| 
 | |
| We will assume you have done the obvious things above. The rest of this document
 | |
| focuses on how to use Django in such a way that you are not doing unnecessary
 | |
| work. This document also does not address other optimization techniques that
 | |
| apply to all expensive operations, such as :doc:`general purpose caching
 | |
| </topics/cache>`.
 | |
| 
 | |
| Understand QuerySets
 | |
| ====================
 | |
| 
 | |
| Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
 | |
| performance with simple code. In particular:
 | |
| 
 | |
| Understand QuerySet evaluation
 | |
| ------------------------------
 | |
| 
 | |
| To avoid performance problems, it is important to understand:
 | |
| 
 | |
| * that :ref:`QuerySets are lazy <querysets-are-lazy>`.
 | |
| 
 | |
| * when :ref:`they are evaluated <when-querysets-are-evaluated>`.
 | |
| 
 | |
| * how :ref:`the data is held in memory <caching-and-querysets>`.
 | |
| 
 | |
| Understand cached attributes
 | |
| ----------------------------
 | |
| 
 | |
| As well as caching of the whole ``QuerySet``, there is caching of the result of
 | |
| attributes on ORM objects. In general, attributes that are not callable will be
 | |
| cached. For example, assuming the :ref:`example Weblog models
 | |
| <queryset-model-example>`::
 | |
| 
 | |
|   >>> entry = Entry.objects.get(id=1)
 | |
|   >>> entry.blog   # Blog object is retrieved at this point
 | |
|   >>> entry.blog   # cached version, no DB access
 | |
| 
 | |
| But in general, callable attributes cause DB lookups every time::
 | |
| 
 | |
|   >>> entry = Entry.objects.get(id=1)
 | |
|   >>> entry.authors.all()   # query performed
 | |
|   >>> entry.authors.all()   # query performed again
 | |
| 
 | |
| Be careful when reading template code - the template system does not allow use
 | |
| of parentheses, but will call callables automatically, hiding the above
 | |
| distinction.
 | |
| 
 | |
| Be careful with your own custom properties - it is up to you to implement
 | |
| caching.
 | |
| 
 | |
| Use the ``with`` template tag
 | |
| -----------------------------
 | |
| 
 | |
| To make use of the caching behavior of ``QuerySet``, you may need to use the
 | |
| :ttag:`with` template tag.
 | |
| 
 | |
| Use ``iterator()``
 | |
| ------------------
 | |
| 
 | |
| When you have a lot of objects, the caching behavior of the ``QuerySet`` can
 | |
| cause a large amount of memory to be used. In this case,
 | |
| :meth:`~django.db.models.query.QuerySet.iterator()` may help.
 | |
| 
 | |
| Do database work in the database rather than in Python
 | |
| ======================================================
 | |
| 
 | |
| For instance:
 | |
| 
 | |
| * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
 | |
|   filtering in the database.
 | |
| 
 | |
| * Use :ref:`F() object query expressions <query-expressions>` to do filtering
 | |
|   against other fields within the same model.
 | |
| 
 | |
| * Use :doc:`annotate to do aggregation in the database </topics/db/aggregation>`.
 | |
| 
 | |
| If these aren't enough to generate the SQL you need:
 | |
| 
 | |
| Use ``QuerySet.extra()``
 | |
| ------------------------
 | |
| 
 | |
| A less portable but more powerful method is
 | |
| :meth:`~django.db.models.query.QuerySet.extra()`, which allows some SQL to be
 | |
| explicitly added to the query. If that still isn't powerful enough:
 | |
| 
 | |
| Use raw SQL
 | |
| -----------
 | |
| 
 | |
| Write your own :doc:`custom SQL to retrieve data or populate models
 | |
| </topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
 | |
| is writing for you and start from there.
 | |
| 
 | |
| Retrieve individual objects using a unique, indexed column
 | |
| ==========================================================
 | |
| 
 | |
| There are two reasons to use a column with
 | |
| :attr:`~django.db.models.Field.unique` or
 | |
| :attr:`~django.db.models.Field.db_index` when using
 | |
| :meth:`~django.db.models.query.QuerySet.get` to retrieve individual objects.
 | |
| First, the query will be quicker because of the underlying database index.
 | |
| Also, the query could run much slower if multiple objects match the lookup;
 | |
| having a unique constraint on the column guarantees this will never happen.
 | |
| 
 | |
| So using the :ref:`example Weblog models <queryset-model-example>`::
 | |
| 
 | |
|   >>> entry = Entry.objects.get(id=10)
 | |
| 
 | |
| will be quicker than:
 | |
| 
 | |
|   >>> entry = Entry.object.get(headline="News Item Title")
 | |
| 
 | |
| because ``id`` is indexed by the database and is guaranteed to be unique.
 | |
| 
 | |
| Doing the following is potentially quite slow:
 | |
| 
 | |
|   >>> entry = Entry.objects.get(headline__startswith="News")
 | |
| 
 | |
| First of all, `headline` is not indexed, which will make the underlying
 | |
| database fetch slower.
 | |
| 
 | |
| Second, the lookup doesn't guarantee that only one object will be returned.
 | |
| If the query matches more than one object, it will retrieve and transfer all of
 | |
| them from the database. This penalty could be substantial if hundreds or
 | |
| thousands of records are returned. The penalty will be compounded if the
 | |
| database lives on a separate server, where network overhead and latency also
 | |
| play a factor.
 | |
| 
 | |
| Retrieve everything at once if you know you will need it
 | |
| ========================================================
 | |
| 
 | |
| Hitting the database multiple times for different parts of a single 'set' of
 | |
| data that you will need all parts of is, in general, less efficient than
 | |
| retrieving it all in one query. This is particularly important if you have a
 | |
| query that is executed in a loop, and could therefore end up doing many database
 | |
| queries, when only one was needed. So:
 | |
| 
 | |
| Use ``QuerySet.select_related()`` and ``prefetch_related()``
 | |
| ------------------------------------------------------------
 | |
| 
 | |
| Understand :meth:`~django.db.models.query.QuerySet.select_related` and
 | |
| :meth:`~django.db.models.query.QuerySet.prefetch_related` thoroughly, and use
 | |
| them:
 | |
| 
 | |
| * in view code,
 | |
| 
 | |
| * and in :doc:`managers and default managers </topics/db/managers>` where
 | |
|   appropriate. Be aware when your manager is and is not used; sometimes this is
 | |
|   tricky so don't make assumptions.
 | |
| 
 | |
| Don't retrieve things you don't need
 | |
| ====================================
 | |
| 
 | |
| Use ``QuerySet.values()`` and ``values_list()``
 | |
| -----------------------------------------------
 | |
| 
 | |
| When you just want a ``dict`` or ``list`` of values, and don't need ORM model
 | |
| objects, make appropriate usage of
 | |
| :meth:`~django.db.models.query.QuerySet.values()`.
 | |
| These can be useful for replacing model objects in template code - as long as
 | |
| the dicts you supply have the same attributes as those used in the template,
 | |
| you are fine.
 | |
| 
 | |
| Use ``QuerySet.defer()`` and ``only()``
 | |
| ---------------------------------------
 | |
| 
 | |
| Use :meth:`~django.db.models.query.QuerySet.defer()` and
 | |
| :meth:`~django.db.models.query.QuerySet.only()` if there are database columns
 | |
| you know that you won't need (or won't need in most cases) to avoid loading
 | |
| them. Note that if you *do* use them, the ORM will have to go and get them in
 | |
| a separate query, making this a pessimization if you use it inappropriately.
 | |
| 
 | |
| Also, be aware that there is some (small extra) overhead incurred inside
 | |
| Django when constructing a model with deferred fields. Don't be too aggressive
 | |
| in deferring fields without profiling as the database has to read most of the
 | |
| non-text, non-VARCHAR data from the disk for a single row in the results, even
 | |
| if it ends up only using a few columns. The ``defer()`` and ``only()`` methods
 | |
| are most useful when you can avoid loading a lot of text data or for fields
 | |
| that might take a lot of processing to convert back to Python. As always,
 | |
| profile first, then optimize.
 | |
| 
 | |
| Use QuerySet.count()
 | |
| --------------------
 | |
| 
 | |
| ...if you only want the count, rather than doing ``len(queryset)``.
 | |
| 
 | |
| Use QuerySet.exists()
 | |
| ---------------------
 | |
| 
 | |
| ...if you only want to find out if at least one result exists, rather than ``if
 | |
| queryset``.
 | |
| 
 | |
| But:
 | |
| 
 | |
| Don't overuse ``count()`` and ``exists()``
 | |
| ------------------------------------------
 | |
| 
 | |
| If you are going to need other data from the QuerySet, just evaluate it.
 | |
| 
 | |
| For example, assuming an Email model that has a ``body`` attribute and a
 | |
| many-to-many relation to User, the following template code is optimal:
 | |
| 
 | |
| .. code-block:: html+django
 | |
| 
 | |
|    {% if display_inbox %}
 | |
|      {% with emails=user.emails.all %}
 | |
|        {% if emails %}
 | |
|          <p>You have {{ emails|length }} email(s)</p>
 | |
|          {% for email in emails %}
 | |
|            <p>{{ email.body }}</p>
 | |
|          {% endfor %}
 | |
|        {% else %}
 | |
|          <p>No messages today.</p>
 | |
|        {% endif %}
 | |
|      {% endwith %}
 | |
|    {% endif %}
 | |
| 
 | |
| 
 | |
| It is optimal because:
 | |
| 
 | |
| 1. Since QuerySets are lazy, this does no database queries if 'display_inbox'
 | |
|    is False.
 | |
| 
 | |
| #. Use of :ttag:`with` means that we store ``user.emails.all`` in a variable
 | |
|    for later use, allowing its cache to be re-used.
 | |
| 
 | |
| #. The line ``{% if emails %}`` causes ``QuerySet.__bool__()`` to be called,
 | |
|    which causes the ``user.emails.all()`` query to be run on the database, and
 | |
|    at the least the first line to be turned into an ORM object. If there aren't
 | |
|    any results, it will return False, otherwise True.
 | |
| 
 | |
| #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
 | |
|    out the rest of the cache without doing another query.
 | |
| 
 | |
| #. The :ttag:`for` loop iterates over the already filled cache.
 | |
| 
 | |
| In total, this code does either one or zero database queries. The only
 | |
| deliberate optimization performed is the use of the :ttag:`with` tag. Using
 | |
| ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
 | |
| additional queries.
 | |
| 
 | |
| Use ``QuerySet.update()`` and ``delete()``
 | |
| ------------------------------------------
 | |
| 
 | |
| Rather than retrieve a load of objects, set some values, and save them
 | |
| individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
 | |
| <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
 | |
| <topics-db-queries-delete>` where possible.
 | |
| 
 | |
| Note, however, that these bulk update methods cannot call the ``save()`` or
 | |
| ``delete()`` methods of individual instances, which means that any custom
 | |
| behavior you have added for these methods will not be executed, including
 | |
| anything driven from the normal database object :doc:`signals </ref/signals>`.
 | |
| 
 | |
| Use foreign key values directly
 | |
| -------------------------------
 | |
| 
 | |
| If you only need a foreign key value, use the foreign key value that is already on
 | |
| the object you've got, rather than getting the whole related object and taking
 | |
| its primary key. i.e. do::
 | |
| 
 | |
|    entry.blog_id
 | |
| 
 | |
| instead of::
 | |
| 
 | |
|    entry.blog.id
 | |
| 
 | |
| Insert in bulk
 | |
| ==============
 | |
| 
 | |
| When creating objects, where possible, use the
 | |
| :meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
 | |
| number of SQL queries. For example::
 | |
| 
 | |
|     Entry.objects.bulk_create([
 | |
|         Entry(headline="Python 3.0 Released"),
 | |
|         Entry(headline="Python 3.1 Planned")
 | |
|     ])
 | |
| 
 | |
| ...is preferable to::
 | |
| 
 | |
|     Entry.objects.create(headline="Python 3.0 Released")
 | |
|     Entry.objects.create(headline="Python 3.1 Planned")
 | |
| 
 | |
| Note that there are a number of :meth:`caveats to this method
 | |
| <django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
 | |
| for your use case.
 | |
| 
 | |
| This also applies to :class:`ManyToManyFields
 | |
| <django.db.models.ManyToManyField>`, so doing::
 | |
| 
 | |
|     my_band.members.add(me, my_friend)
 | |
| 
 | |
| ...is preferable to::
 | |
| 
 | |
|     my_band.members.add(me)
 | |
|     my_band.members.add(my_friend)
 | |
| 
 | |
| ...where ``Bands`` and ``Artists`` have a many-to-many relationship.
 |