Django 1.8.2.dev20150513143415 documentation

Sitemap 框架

Django 自带了一个高级的网站地图创建框架, 这使得创建XML格式的网站地图 变得容易。

概述

一个站点地图是一个在你网站上的用来告诉搜索引擎你的页面更新的多频繁和某些页面在你的网站中的重要关系的索引的XML文件This information helps search engines index your site.

Django sitemap 框架通过让你在 Python 代码中表达此信息,自动创建此 XML 文件。

它的工作原理很像 Django 的联合框架为了创建网站地图,只需编写 Sitemap 类,并在 URLconf 中指向该类。

安装

安装网站地图APP的步骤如下:

  1. INSTALLED_APPS设置中添加'django.contrib.sitemaps' .
  2. 确认你的TEMPLATES 设置包含 DjangoTemplates 后端,并将APP_DIRS 选项设置为 True. 当然默认值就是这样的,只有当你曾经修改过这些设置,才需要修改这个配置。
  3. 确认你已经安装sites framework.

(注意: 网站地图APP并不安装任何数据库表 。需要修改INSTALLED_APPS的唯一原因是,以便Loader()模板加载器可以找到默认模板。)

初始化

views.sitemap(request, sitemaps, section=None, template_name='sitemap.xml', content_type='application/xml')

为了在你的Django网站激活网站地图生成功能, 请把以下代码添加 URLconf:

from django.contrib.sitemaps.views import sitemap

url(r'^sitemap\.xml$', sitemap, {'sitemaps': sitemaps},
    name='django.contrib.sitemaps.views.sitemap')

当客服端访问 /sitemap.xml时,这将告诉Django生成一个网站地图。

网站地图的文件名并不重要,重要的是文件的位置。搜索引擎只会索引网站的当前 URL 层级及下属层级。例如,如果 sitemap.xml 位于根目录中,它可能会引用网站中的任何 URL。但是,如果站点地图位于 /content/sitemap.xml,则它只能引用以 /content/ 开头的网址。

网站地图视图需要一个额外的必需参数:{'sitemaps': sitemaps}sitemaps 应是一个字典,将小节的标签(例如 blognews)映射到其 Sitemap 类(例如,BlogSitemapNewsSitemap)。它也可以映射到 Sitemap 类的实例(例如,BlogSitemap(some_var))。

Sitemap 类

Sitemap 类是一个简单的 Python 类,表示站点地图中“一部分”条目。例如,一个 Sitemap 类可以表示 Weblog 的所有条目,而另一个可以表示事件日历中的所有事件。

在最简单的情况下,所有这些部分都集中到一个 sitemap.xml 中,但也可以使用框架,为每个部分生成一个站点地图索引,它引用单个站点地图文件。(请参阅下面的创建网站地图索引。)

Sitemap 类必须继承自 django.contrib.sitemaps.Sitemap它们可以位于你的代码库中的任何地方。

一个简单示例

假设你有一个博客系统,拥有 Entry 模型,并且您希望站点地图包含指向各个博客条目的所有链接。Here’s how your sitemap class might look:

from django.contrib.sitemaps import Sitemap
from blog.models import Entry

class BlogSitemap(Sitemap):
    changefreq = "never"
    priority = 0.5

    def items(self):
        return Entry.objects.filter(is_draft=False)

    def lastmod(self, obj):
        return obj.pub_date

注意:

  • changefreq and priority are class attributes corresponding to <changefreq> and <priority> elements, respectively. 它们可以作为函数调用,例如这个例子中的lastmod
  • items() is simply a method that returns a list of objects. The objects returned will get passed to any callable methods corresponding to a sitemap property (location, lastmod, changefreq, and priority).
  • lastmod 应返回 Python datetime 对象。
  • 在此示例中没有 location 方法,但你可以提供此方法来指定对象的 URL。By default, location() calls get_absolute_url() on each object and returns the result.

Sitemap 类参考

class Sitemap[source]

A Sitemap class can define the following methods/attributes:

items[source]

Required. A method that returns a list of objects. The framework doesn’t care what type of objects they are; all that matters is that these objects get passed to the location(), lastmod(), changefreq() and priority() methods.

location[source]

可选的. 进入一个方法或属性

如果它是一个方法, 它应该为items()返回的对象返回绝对路径.

If it’s an attribute, its value should be a string representing an absolute path to use for every object returned by items().

In both cases, “absolute path” means a URL that doesn’t include the protocol or domain. Examples:

  • Good: '/foo/bar/'
  • Bad: 'example.com/foo/bar/'
  • Bad: 'http://example.com/foo/bar/'

If location isn’t provided, the framework will call the get_absolute_url() method on each object as returned by items().

To specify a protocol other than 'http', use protocol.

lastmod

Optional. Either a method or attribute.

If it’s a method, it should take one argument – an object as returned by items() – and return that object’s last-modified date/time, as a Python datetime.datetime object.

If it’s an attribute, its value should be a Python datetime.datetime object representing the last-modified date/time for every object returned by items().

New in Django 1.7.

If all items in a sitemap have a lastmod, the sitemap generated by views.sitemap() will have a Last-Modified header equal to the latest lastmod. You can activate the ConditionalGetMiddleware to make Django respond appropriately to requests with an If-Modified-Since header which will prevent sending the sitemap if it hasn’t changed.

changefreq

Optional. Either a method or attribute.

If it’s a method, it should take one argument – an object as returned by items() – and return that object’s change frequency, as a Python string.

If it’s an attribute, its value should be a string representing the change frequency of every object returned by items().

Possible values for changefreq, whether you use a method or attribute, are:

  • 'always'
  • 'hourly'
  • 'daily'
  • 'weekly'
  • 'monthly'
  • 'yearly'
  • 'never'
priority

Optional. Either a method or attribute.

If it’s a method, it should take one argument – an object as returned by items() – and return that object’s priority, as either a string or float.

If it’s an attribute, its value should be either a string or float representing the priority of every object returned by items().

Example values for priority: 0.4, 1.0. The default priority of a page is 0.5. See the sitemaps.org documentation for more.

protocol

Optional.

This attribute defines the protocol ('http' or 'https') of the URLs in the sitemap. If it isn’t set, the protocol with which the sitemap was requested is used. If the sitemap is built outside the context of a request, the default is 'http'.

limit

Optional.

This attribute defines the maximum number of URLs included on each page of the sitemap. Its value should not exceed the default value of 50000, which is the upper limit allowed in the Sitemaps protocol.

i18n
New in Django 1.8.

Optional.

A boolean attribute that defines if the URLs of this sitemap should be generated using all of your LANGUAGES. The default is False.

Shortcuts

The sitemap framework provides a couple convenience classes for common cases:

class FlatPageSitemap[source]

Deprecated since version 1.8: Use django.contrib.flatpages.sitemaps.FlatPageSitemap instead.

The django.contrib.sitemaps.FlatPageSitemap class looks at all publicly visible flatpages defined for the current SITE_ID (see the sites documentation) and creates an entry in the sitemap. These entries include only the location attribute – not lastmod, changefreq or priority.

class GenericSitemap[source]

The django.contrib.sitemaps.GenericSitemap class allows you to create a sitemap by passing it a dictionary which has to contain at least a queryset entry. This queryset will be used to generate the items of the sitemap. It may also have a date_field entry that specifies a date field for objects retrieved from the queryset. This will be used for the lastmod attribute in the generated sitemap. You may also pass priority and changefreq keyword arguments to the GenericSitemap constructor to specify these attributes for all URLs.

Example

Here’s an example of a URLconf using GenericSitemap:

from django.conf.urls import url
from django.contrib.sitemaps import GenericSitemap
from django.contrib.sitemaps.views import sitemap
from blog.models import Entry

info_dict = {
    'queryset': Entry.objects.all(),
    'date_field': 'pub_date',
}

urlpatterns = [
    # some generic view using info_dict
    # ...

    # the sitemap
    url(r'^sitemap\.xml$', sitemap,
        {'sitemaps': {'blog': GenericSitemap(info_dict, priority=0.6)}},
        name='django.contrib.sitemaps.views.sitemap'),
]

Sitemap for static views

Often you want the search engine crawlers to index views which are neither object detail pages nor flatpages. The solution is to explicitly list URL names for these views in items and call reverse() in the location method of the sitemap. For example:

# sitemaps.py
from django.contrib import sitemaps
from django.core.urlresolvers import reverse

class StaticViewSitemap(sitemaps.Sitemap):
    priority = 0.5
    changefreq = 'daily'

    def items(self):
        return ['main', 'about', 'license']

    def location(self, item):
        return reverse(item)

# urls.py
from django.conf.urls import url
from django.contrib.sitemaps.views import sitemap

from .sitemaps import StaticViewSitemap
from . import views

sitemaps = {
    'static': StaticViewSitemap,
}

urlpatterns = [
    url(r'^$', views.main, name='main'),
    url(r'^about/$', views.about, name='about'),
    url(r'^license/$', views.license, name='license'),
    # ...
    url(r'^sitemap\.xml$', sitemap, {'sitemaps': sitemaps},
        name='django.contrib.sitemaps.views.sitemap')
]

Creating a sitemap index

views.index(request, sitemaps, template_name='sitemap_index.xml', content_type='application/xml', sitemap_url_name='django.contrib.sitemaps.views.sitemap')

The sitemap framework also has the ability to create a sitemap index that references individual sitemap files, one per each section defined in your sitemaps dictionary. The only differences in usage are:

Here’s what the relevant URLconf lines would look like for the example above:

from django.contrib.sitemaps import views

urlpatterns = [
    url(r'^sitemap\.xml$', views.index, {'sitemaps': sitemaps}),
    url(r'^sitemap-(?P<section>.+)\.xml$', views.sitemap, {'sitemaps': sitemaps}),
]

This will automatically generate a sitemap.xml file that references both sitemap-flatpages.xml and sitemap-blog.xml. The Sitemap classes and the sitemaps dict don’t change at all.

You should create an index file if one of your sitemaps has more than 50,000 URLs. In this case, Django will automatically paginate the sitemap, and the index will reflect that.

If you’re not using the vanilla sitemap view – for example, if it’s wrapped with a caching decorator – you must name your sitemap view and pass sitemap_url_name to the index view:

from django.contrib.sitemaps import views as sitemaps_views
from django.views.decorators.cache import cache_page

urlpatterns = [
    url(r'^sitemap\.xml$',
        cache_page(86400)(sitemaps_views.index),
        {'sitemaps': sitemaps, 'sitemap_url_name': 'sitemaps'}),
    url(r'^sitemap-(?P<section>.+)\.xml$',
        cache_page(86400)(sitemaps_views.sitemap),
        {'sitemaps': sitemaps}, name='sitemaps'),
]

Template customization

If you wish to use a different template for each sitemap or sitemap index available on your site, you may specify it by passing a template_name parameter to the sitemap and index views via the URLconf:

from django.contrib.sitemaps import views

urlpatterns = [
    url(r'^custom-sitemap\.xml$', views.index, {
        'sitemaps': sitemaps,
        'template_name': 'custom_sitemap.html'
    }),
    url(r'^custom-sitemap-(?P<section>.+)\.xml$', views.sitemap, {
        'sitemaps': sitemaps,
        'template_name': 'custom_sitemap.html'
    }),
]

These views return TemplateResponse instances which allow you to easily customize the response data before rendering. For more details, see the TemplateResponse documentation.

Context variables

When customizing the templates for the index() and sitemap() views, you can rely on the following context variables.

Index

The variable sitemaps is a list of absolute URLs to each of the sitemaps.

Sitemap

The variable urlset is a list of URLs that should appear in the sitemap. Each URL exposes attributes as defined in the Sitemap class:

  • changefreq
  • item
  • lastmod
  • location
  • priority

The item attribute has been added for each URL to allow more flexible customization of the templates, such as Google news sitemaps. Assuming Sitemap’s items() would return a list of items with publication_data and a tags field something like this would generate a Google News compatible sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
{% spaceless %}
{% for url in urlset %}
  <url>
    <loc>{{ url.location }}</loc>
    {% if url.lastmod %}<lastmod>{{ url.lastmod|date:"Y-m-d" }}</lastmod>{% endif %}
    {% if url.changefreq %}<changefreq>{{ url.changefreq }}</changefreq>{% endif %}
    {% if url.priority %}<priority>{{ url.priority }}</priority>{% endif %}
    <news:news>
      {% if url.item.publication_date %}<news:publication_date>{{ url.item.publication_date|date:"Y-m-d" }}</news:publication_date>{% endif %}
      {% if url.item.tags %}<news:keywords>{{ url.item.tags }}</news:keywords>{% endif %}
    </news:news>
   </url>
{% endfor %}
{% endspaceless %}
</urlset>

Pinging Google

你可能希望在 Sitemap 更改时“ping”Google,以便让其重新索引你的网站。The sitemaps framework provides a function to do just that: django.contrib.sitemaps.ping_google().

ping_google()[source]

ping_google() takes an optional argument, sitemap_url, which should be the absolute path to your site’s sitemap (e.g., '/sitemap.xml'). If this argument isn’t provided, ping_google() will attempt to figure out your sitemap by performing a reverse looking in your URLconf.

ping_google() raises the exception django.contrib.sitemaps.SitemapNotFound if it cannot determine your sitemap URL.

Register with Google first!

The ping_google() command only works if you have registered your site with Google Webmaster Tools.

One useful way to call ping_google() is from a model’s save() method:

from django.contrib.sitemaps import ping_google

class Entry(models.Model):
    # ...
    def save(self, force_insert=False, force_update=False):
        super(Entry, self).save(force_insert, force_update)
        try:
            ping_google()
        except Exception:
            # Bare 'except' because we could get a variety
            # of HTTP-related exceptions.
            pass

A more efficient solution, however, would be to call ping_google() from a cron script, or some other scheduled task. The function makes an HTTP request to Google’s servers, so you may not want to introduce that network overhead each time you call save().

Pinging Google via manage.py

django-admin ping_google

Once the sitemaps application is added to your project, you may also ping Google using the ping_google management command:

python manage.py ping_google [/sitemap.xml]