mirror of
				https://github.com/django/django.git
				synced 2025-10-31 09:41:08 +00:00 
			
		
		
		
	[1.1.X] Created a 'DB optimization' topic, with cross-refs to relevant sections.
Also fixed #10291, which was related, and cleaned up some inconsistent doc labels. Backport of r12229 from trunk git-svn-id: http://code.djangoproject.com/svn/django/branches/releases/1.1.X@12230 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
		| @@ -3,6 +3,8 @@ | ||||
| FAQ: Databases and models | ||||
| ========================= | ||||
|  | ||||
| .. _faq-see-raw-sql-queries: | ||||
|  | ||||
| How can I see the raw SQL queries Django is running? | ||||
| ---------------------------------------------------- | ||||
|  | ||||
|   | ||||
| @@ -70,7 +70,8 @@ The model layer | ||||
|     * **Other:** | ||||
|       :ref:`Supported databases <ref-databases>` | | ||||
|       :ref:`Legacy databases <howto-legacy-databases>` | | ||||
|       :ref:`Providing initial data <howto-initial-data>` | ||||
|       :ref:`Providing initial data <howto-initial-data>` | | ||||
|       :ref:`Optimize database access <topics-db-optimization>` | ||||
|  | ||||
| The template layer | ||||
| ================== | ||||
|   | ||||
| @@ -66,6 +66,18 @@ You can evaluate a ``QuerySet`` in the following ways: | ||||
|       iterating over a ``QuerySet`` will take advantage of your database to | ||||
|       load data and instantiate objects only as you need them. | ||||
|  | ||||
|     * **bool().** Testing a ``QuerySet`` in a boolean context, such as using | ||||
|       ``bool()``, ``or``, ``and`` or an ``if`` statement, will cause the query | ||||
|       to be executed. If there is at least one result, the ``QuerySet`` is | ||||
|       ``True``, otherwise ``False``. For example:: | ||||
|  | ||||
|           if Entry.objects.filter(headline="Test"): | ||||
|              print "There is at least one Entry with the headline Test" | ||||
|  | ||||
|       Note: *Don't* use this if all you want to do is determine if at least one | ||||
|       result exists, and don't need the actual objects. It's more efficient to | ||||
|       use ``exists()`` (see below). | ||||
|  | ||||
| .. _pickling QuerySets: | ||||
|  | ||||
| Pickling QuerySets | ||||
| @@ -302,7 +314,7 @@ a model which defines a default ordering, or when using | ||||
| ordering was undefined prior to calling ``reverse()``, and will remain | ||||
| undefined afterward). | ||||
|  | ||||
| .. _querysets-distinct: | ||||
| .. _queryset-distinct: | ||||
|  | ||||
| ``distinct()`` | ||||
| ~~~~~~~~~~~~~~ | ||||
| @@ -336,6 +348,8 @@ query spans multiple tables, it's possible to get duplicate results when a | ||||
|     ``values()`` call. | ||||
|  | ||||
|  | ||||
| .. _queryset-values: | ||||
|  | ||||
| ``values(*fields)`` | ||||
| ~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| @@ -616,7 +630,7 @@ call, since they are conflicting options. | ||||
| Both the ``depth`` argument and the ability to specify field names in the call | ||||
| to ``select_related()`` are new in Django version 1.0. | ||||
|  | ||||
| .. _extra: | ||||
| .. _queryset-extra: | ||||
|  | ||||
| ``extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)`` | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| @@ -1043,17 +1057,18 @@ Example:: | ||||
|  | ||||
| If you pass ``in_bulk()`` an empty list, you'll get an empty dictionary. | ||||
|  | ||||
| .. _queryset-iterator: | ||||
|  | ||||
| ``iterator()`` | ||||
| ~~~~~~~~~~~~~~ | ||||
|  | ||||
| Evaluates the ``QuerySet`` (by performing the query) and returns an | ||||
| `iterator`_ over the results. A ``QuerySet`` typically reads all of | ||||
| its results and instantiates all of the corresponding objects the | ||||
| first time you access it; ``iterator()`` will instead read results and | ||||
| instantiate objects in discrete chunks, yielding them one at a | ||||
| time. For a ``QuerySet`` which returns a large number of objects, this | ||||
| often results in better performance and a significant reduction in | ||||
| memory use. | ||||
| `iterator`_ over the results. A ``QuerySet`` typically caches its | ||||
| results internally so that repeated evaluations do not result in | ||||
| additional queries; ``iterator()`` will instead read results directly, | ||||
| without doing any caching at the ``QuerySet`` level. For a | ||||
| ``QuerySet`` which returns a large number of objects, this often | ||||
| results in better performance and a significant reduction in memory | ||||
|  | ||||
| Note that using ``iterator()`` on a ``QuerySet`` which has already | ||||
| been evaluated will force it to evaluate again, repeating the query. | ||||
|   | ||||
| @@ -353,7 +353,7 @@ without any harmful effects, since that is already playing a role in the | ||||
| query. | ||||
|  | ||||
| This behavior is the same as that noted in the queryset documentation for | ||||
| :ref:`distinct() <querysets-distinct>` and the general rule is the same: | ||||
| :ref:`distinct() <queryset-distinct>` and the general rule is the same: | ||||
| normally you won't want extra columns playing a part in the result, so clear | ||||
| out the ordering, or at least make sure it's restricted only to those fields | ||||
| you also select in a ``values()`` call. | ||||
|   | ||||
| @@ -16,3 +16,4 @@ model maps to a single database table. | ||||
|    managers | ||||
|    sql | ||||
|    transactions | ||||
|    optimization | ||||
|   | ||||
							
								
								
									
										248
									
								
								docs/topics/db/optimization.txt
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										248
									
								
								docs/topics/db/optimization.txt
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,248 @@ | ||||
| .. _topics-db-optimization: | ||||
|  | ||||
| ============================ | ||||
| Database access optimization | ||||
| ============================ | ||||
|  | ||||
| Django's database layer provides various ways to help developers get the most | ||||
| out of their databases. This documents gathers together links to the relevant | ||||
| documentation, and adds various tips, organized under an number of headings that | ||||
| outline the steps to take when attempting to optimize your database usage. | ||||
|  | ||||
| Profile first | ||||
| ============= | ||||
|  | ||||
| As general programming practice, this goes without saying. Find out :ref:`what | ||||
| queries you are doing and what they are costing you | ||||
| <faq-see-raw-sql-queries>`. You may also want to use an external project like | ||||
| 'django-debug-toolbar', or a tool that monitors your database directly. | ||||
|  | ||||
| Remember that you may be optimizing for speed or memory or both, depending on | ||||
| your requirements. Sometimes optimizing for one will be detrimental to the | ||||
| other, but sometimes they will help each other. Also, work that is done by the | ||||
| database process might not have the same cost (to you) as the same amount of | ||||
| work done in your Python process. It is up to you to decide what your | ||||
| priorities are, where the balance must lie, and profile all of these as required | ||||
| since this will depend on your application and server. | ||||
|  | ||||
| With everything that follows, remember to profile after every change to ensure | ||||
| that the change is a benefit, and a big enough benefit given the decrease in | ||||
| readability of your code. **All** of the suggestions below come with the caveat | ||||
| that in your circumstances the general principle might not apply, or might even | ||||
| be reversed. | ||||
|  | ||||
| Use standard DB optimization techniques | ||||
| ======================================= | ||||
|  | ||||
| ...including: | ||||
|  | ||||
| * Indexes. This is a number one priority, *after* you have determined from | ||||
|   profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add | ||||
|   these from Django. | ||||
|  | ||||
| * Appropriate use of field types. | ||||
|  | ||||
| We will assume you have done the obvious things above. The rest of this document | ||||
| focuses on how to use Django in such a way that you are not doing unnecessary | ||||
| work. This document also does not address other optimization techniques that | ||||
| apply to all expensive operations, such as :ref:`general purpose caching | ||||
| <topics-cache>`. | ||||
|  | ||||
| Understand QuerySets | ||||
| ==================== | ||||
|  | ||||
| Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good | ||||
| performance with simple code. In particular: | ||||
|  | ||||
| Understand QuerySet evaluation | ||||
| ------------------------------ | ||||
|  | ||||
| To avoid performance problems, it is important to understand: | ||||
|  | ||||
| * that :ref:`QuerySets are lazy <querysets-are-lazy>`. | ||||
|  | ||||
| * when :ref:`they are evaluated <when-querysets-are-evaluated>`. | ||||
|  | ||||
| * how :ref:`the data is held in memory <caching-and-querysets>`. | ||||
|  | ||||
| Understand cached attributes | ||||
| ---------------------------- | ||||
|  | ||||
| As well as caching of the whole ``QuerySet``, there is caching of the result of | ||||
| attributes on ORM objects. In general, attributes that are not callable will be | ||||
| cached. For example, assuming the :ref:`example weblog models | ||||
| <queryset-model-example>`: | ||||
|  | ||||
|   >>> entry = Entry.objects.get(id=1) | ||||
|   >>> entry.blog   # Blog object is retrieved at this point | ||||
|   >>> entry.blog   # cached version, no DB access | ||||
|  | ||||
| But in general, callable attributes cause DB lookups every time:: | ||||
|  | ||||
|   >>> entry = Entry.objects.get(id=1) | ||||
|   >>> entry.authors.all()   # query performed | ||||
|   >>> entry.authors.all()   # query performed again | ||||
|  | ||||
| Be careful when reading template code - the template system does not allow use | ||||
| of parentheses, but will call callables automatically, hiding the above | ||||
| distinction. | ||||
|  | ||||
| Be careful with your own custom properties - it is up to you to implement | ||||
| caching. | ||||
|  | ||||
| Use the ``with`` template tag | ||||
| ----------------------------- | ||||
|  | ||||
| To make use of the caching behaviour of ``QuerySet``, you may need to use the | ||||
| :ttag:`with` template tag. | ||||
|  | ||||
| Use ``iterator()`` | ||||
| ------------------ | ||||
|  | ||||
| When you have a lot of objects, the caching behaviour of the ``QuerySet`` can | ||||
| cause a large amount of memory to be used. In this case, | ||||
| :ref:`QuerySet.iterator() <queryset-iterator>` may help. | ||||
|  | ||||
| Do database work in the database rather than in Python | ||||
| ====================================================== | ||||
|  | ||||
| For instance: | ||||
|  | ||||
| * At the most basic level, use :ref:`filter and exclude <queryset-api>` to | ||||
|   filtering in the database to avoid loading data into your Python process, only | ||||
|   to throw much of it away. | ||||
|  | ||||
| * Use :ref:`F() object query expressions <query-expressions>` to do filtering | ||||
|   against other fields within the same model. | ||||
|  | ||||
| * Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`. | ||||
|  | ||||
| If these aren't enough to generate the SQL you need: | ||||
|  | ||||
| Use ``QuerySet.extra()`` | ||||
| ------------------------ | ||||
|  | ||||
| A less portable but more powerful method is :ref:`QuerySet.extra() | ||||
| <queryset-extra>`, which allows some SQL to be explicitly added to the query. | ||||
| If that still isn't powerful enough: | ||||
|  | ||||
| Use raw SQL | ||||
| ----------- | ||||
|  | ||||
| Write your own :ref:`custom SQL to retrieve data <topics-db-sql>`. Use | ||||
| ``django.db.connection.queries`` to find out what Django is writing for you and | ||||
| start from there. | ||||
|  | ||||
| Retrieve everything at once if you know you will need it | ||||
| ======================================================== | ||||
|  | ||||
| Hitting the database multiple times for different parts of a single 'set' of | ||||
| data that you will need all parts of is, in general, less efficient than | ||||
| retrieving it all in one query. This is particularly important if you have a | ||||
| query that is executed in a loop, and could therefore end up doing many database | ||||
| queries, when only one was needed. So: | ||||
|  | ||||
| Use ``QuerySet.select_related()`` | ||||
| --------------------------------- | ||||
|  | ||||
| Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it: | ||||
|  | ||||
| * in view code, | ||||
|  | ||||
| * and in :ref:`managers and default managers <topics-db-managers>` where | ||||
|   appropriate. Be aware when your manager is and is not used; sometimes this is | ||||
|   tricky so don't make assumptions. | ||||
|  | ||||
| Don't retrieve things you don't need | ||||
| ==================================== | ||||
|  | ||||
| Use ``QuerySet.values()`` and ``values_list()`` | ||||
| ----------------------------------------------- | ||||
|  | ||||
| When you just want a dict/list of values, and don't need ORM model objects, make | ||||
| appropriate usage of :ref:`QuerySet.values() <queryset-values>`. | ||||
| These can be useful for replacing model objects in template code - as long as | ||||
| the dicts you supply have the same attributes as those used in the template, you | ||||
| are fine. | ||||
|  | ||||
| Use QuerySet.count() | ||||
| -------------------- | ||||
|  | ||||
| ...if you only want the count, rather than doing ``len(queryset)``. | ||||
|  | ||||
| But: | ||||
|  | ||||
| Don't overuse ``count()`` | ||||
| ------------------------- | ||||
|  | ||||
| If you are going to need other data from the QuerySet, just evaluate it. | ||||
|  | ||||
| For example, assuming an Email class that has a ``body`` attribute and a | ||||
| many-to-many relation to User, the following template code is optimal: | ||||
|  | ||||
| .. code-block:: html+django | ||||
|  | ||||
|    {% if display_inbox %} | ||||
|      {% with user.emails.all as emails %} | ||||
|        {% if emails %} | ||||
|          <p>You have {{ emails|length }} email(s)</p> | ||||
|          {% for email in emails %} | ||||
|            <p>{{ email.body }}</p> | ||||
|          {% endfor %} | ||||
|        {% else %} | ||||
|          <p>No messages today.</p> | ||||
|        {% endif %} | ||||
|      {% endwith %} | ||||
|    {% endif %} | ||||
|  | ||||
|  | ||||
| It is optimal because: | ||||
|  | ||||
|  1. Since QuerySets are lazy, this does no database if 'display_inbox' is False. | ||||
|  | ||||
|  #. Use of ``with`` means that we store ``user.emails.all`` in a variable for | ||||
|     later use, allowing its cache to be re-used. | ||||
|  | ||||
|  #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called, | ||||
|     which causes the ``user.emails.all()`` query to be run on the database, and | ||||
|     at the least the first line to be turned into an ORM object. If there aren't | ||||
|     any results, it will return False, otherwise True. | ||||
|  | ||||
|  #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling | ||||
|     out the rest of the cache without doing another query. | ||||
|  | ||||
|  #. The ``for`` loop iterates over the already filled cache. | ||||
|  | ||||
| In total, this code does either one or zero database queries. The only | ||||
| deliberate optimization performed is the use of the ``with`` tag. Using | ||||
| ``QuerySet.count()`` at any point would cause additional queries. | ||||
|  | ||||
| Use ``QuerySet.update()`` and ``delete()`` | ||||
| ------------------------------------------ | ||||
|  | ||||
| Rather than retrieve a load of objects, set some values, and save them | ||||
| individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update() | ||||
| <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes | ||||
| <topics-db-queries-delete>` where possible. | ||||
|  | ||||
| Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()`` | ||||
| methods of individual instances, which means that any custom behaviour you have | ||||
| added for these methods will not be executed, including anything driven from the | ||||
| normal database object :ref:`signals <ref-signals>`. | ||||
|  | ||||
| Don't retrieve things you already have | ||||
| ====================================== | ||||
|  | ||||
| Use foreign key values directly | ||||
| ------------------------------- | ||||
|  | ||||
| If you only need a foreign key value, use the foreign key value that is already on | ||||
| the object you've got, rather than getting the whole related object and taking | ||||
| its primary key. i.e. do:: | ||||
|  | ||||
|    entry.blog_id | ||||
|  | ||||
| instead of:: | ||||
|  | ||||
|    entry.blog.id | ||||
|  | ||||
| @@ -83,6 +83,6 @@ An easier option? | ||||
|  | ||||
| A final note: If all you want to do is a custom ``WHERE`` clause, you can just | ||||
| use the ``where``, ``tables`` and ``params`` arguments to the | ||||
| :ref:`extra clause <extra>` in the standard queryset API. | ||||
| :ref:`extra clause <queryset-extra>` in the standard queryset API. | ||||
|  | ||||
| .. _Python DB-API: http://www.python.org/dev/peps/pep-0249/ | ||||
|   | ||||
		Reference in New Issue
	
	Block a user