Writing a field subclass
When planning your Field
class your new field is most similar to. Can you subclass an existing Django field and save yourself some work? If not, you should subclass the Field
class, from which everything is descended.
Initializing your new field is a matter of separating out any arguments that are specific to your case from the common arguments and passing the latter to the __init__()
method of Field
(or your parent class).
In our example, we’ll call our field HandField
. (It’s a good idea to call your Field
subclass <Something>Field
, so it’s easily identifiable as a Field
subclass.) It doesn’t behave like any existing field, so we’ll subclass directly from Field
:
from django.db import models
class HandField(models.Field):
description = "A hand of cards (bridge style)"
def __init__(self, *args, **kwargs):
kwargs['max_length'] = 104
super().__init__(*args, **kwargs)
Our HandField
accepts most of the standard field options (see the list below), but we ensure it has a fixed length, since it only needs to hold 52 card values plus their suits; 104 characters in total.
Note
Many of Django’s model fields accept options that they don’t do anything with. For example, you can pass both editable
and auto_now
to a django.db.models.DateField
and it will simply ignore theeditable
parameter (auto_now
being set implies editable=False
). No error is raised in this case.
This behavior simplifies the field classes, because they don’t need to check for options that aren’t necessary. They just pass all the options to the parent class and then don’t use them later on. It’s up to you whether you want your fields to be more strict about the options they select, or to use the simpler, more permissive behavior of the current fields.
The Field.__init__()
method takes the following parameters:
verbose_name
name
primary_key
max_length
unique
blank
null
db_index
-
rel
: Used for related fields (likeForeignKey
). For advanced use only. default
editable
-
serialize
: IfFalse
, the field will not be serialized when the model is passed to Django’s serializers. Defaults toTrue
. unique_for_date
unique_for_month
unique_for_year
choices
help_text
db_column
-
db_tablespace
: Only for index creation, if the backend supports tablespaces. You can usually ignore this option. -
auto_created
:True
if the field was automatically created, as for theOneToOneField
used by model inheritance. For advanced use only.
All of the options without an explanation in the above list have the same meaning they do for normal Django fields. See the field documentation for examples and details.
Field deconstruction?
The counterpoint to writing your __init__()
method is writing the deconstruct()
method. It’s used during model migrations to tell Django how to take an instance of your new field and reduce it to a serialized form - in particular, what arguments to pass to __init__()
to re-create it.
If you haven’t added any extra options on top of the field you inherited from, then there’s no need to write a new deconstruct()
method. If, however, you’re changing the arguments passed in __init__()
(like we are inHandField
), you’ll need to supplement the values being passed.
The contract of deconstruct()
is simple; it returns a tuple of four items: the field’s attribute name, the full import path of the field class, the positional arguments (as a list), and the keyword arguments (as a dict). Note this is different from the deconstruct()
method for custom classes which returns a tuple of three things.
As a custom field author, you don’t need to care about the first two values; the base Field
class has all the code to work out the field’s attribute name and import path. You do, however, have to care about the positional and keyword arguments, as these are likely the things you are changing.
For example, in our HandField
class we’re always forcibly setting max_length in __init__()
. The deconstruct()
method on the base Field
class will see this and try to return it in the keyword arguments; thus, we can drop it from the keyword arguments for readability:
from django.db import models
class HandField(models.Field):
def __init__(self, *args, **kwargs):
kwargs['max_length'] = 104
super().__init__(*args, **kwargs)
def deconstruct(self):
name, path, args, kwargs = super().deconstruct()
del kwargs["max_length"]
return name, path, args, kwargs
If you add a new keyword argument, you need to write code in deconstruct()
that puts its value into kwargs
yourself. You should also omit the value from kwargs
when it isn’t necessary to reconstruct the state of the field, such as when the default value is being used:
from django.db import models
class CommaSepField(models.Field):
"Implements comma-separated storage of lists"
def __init__(self, separator=",", *args, **kwargs):
self.separator = separator
super().__init__(*args, **kwargs)
def deconstruct(self):
name, path, args, kwargs = super().deconstruct()
# Only include kwarg if it's not the default
if self.separator != ",":
kwargs['separator'] = self.separator
return name, path, args, kwargs
More complex examples are beyond the scope of this document, but remember - for any configuration of your Field instance, deconstruct()
must return arguments that you can pass to __init__
to reconstruct that state.
Pay extra attention if you set new default values for arguments in the Field
superclass; you want to make sure they’re always included, rather than disappearing if they take on the old default value.
In addition, try to avoid returning values as positional arguments; where possible, return values as keyword arguments for maximum future compatibility. Of course, if you change the names of things more often than their position in the constructor’s argument list, you might prefer positional, but bear in mind that people will be reconstructing your field from the serialized version for quite a while (possibly years), depending how long your migrations live for.
You can see the results of deconstruction by looking in migrations that include the field, and you can test deconstruction in unit tests by just deconstructing and reconstructing the field:
name, path, args, kwargs = my_field_instance.deconstruct()
new_instance = MyField(*args, **kwargs)
self.assertEqual(my_field_instance.some_attribute, new_instance.some_attribute)
Changing a custom field’s base class
You can’t change the base class of a custom field because Django won’t detect the change and make a migration for it. For example, if you start with:
class CustomCharField(models.CharField):
...
and then decide that you want to use TextField
instead, you can’t change the subclass like this:
class CustomCharField(models.TextField):
...
Instead, you must create a new custom field class and update your models to reference it:
class CustomCharField(models.CharField):
...
class CustomTextField(models.TextField):
...
As discussed in removing fields, you must retain the original CustomCharField
class as long as you have migrations that reference it.
Documenting your custom field
As always, you should document your field type, so users will know what it is. In addition to providing a docstring for it, which is useful for developers, you can also allow users of the admin app to see a short description of the field type via the django.contrib.admindocs application. To do this simply provide descriptive text in a description
class attribute of your custom field. In the above example, the description displayed by the admindocs
application for a HandField
will be ‘A hand of cards (bridge style)’.
In the django.contrib.admindocs
display, the field description is interpolated with field.__dict__
which allows the description to incorporate arguments of the field. For example, the description for CharField
is:
description = _("String (up to %(max_length)s)")
Useful methods
Once you’ve created your Field
subclass, you might consider overriding a few standard methods, depending on your field’s behavior. The list of methods below is in approximately decreasing order of importance, so start from the top.
Custom database types
Say you’ve created a PostgreSQL custom type called mytype
. You can subclass Field
and implement the db_type()
method, like so:
from django.db import models
class MytypeField(models.Field):
def db_type(self, connection):
return 'mytype'
Once you have MytypeField
, you can use it in any model, just like any other Field
type:
class Person(models.Model):
name = models.CharField(max_length=80)
something_else = MytypeField()
If you aim to build a database-agnostic application, you should account for differences in database column types. For example, the date/time column type in PostgreSQL is called timestamp
, while the same column in MySQL is calleddatetime
. The simplest way to handle this in a db_type()
method is to check the connection.settings_dict['ENGINE']
attribute.
For example:
if connection.settings_dict['ENGINE'] == 'django.db.backends.mysql':
return 'datetime'
else:
return 'timestamp'
The db_type()
and rel_db_type()
methods are called by Django when the framework constructs the CREATE TABLE
statements for your application – that is, when you first create your tables. The methods are also called when constructing a WHERE
clause that includes the model field – that is, when you retrieve data using QuerySet methods like get()
,filter()
, and exclude()
and have the model field as an argument. They are not called at any other time, so it can afford to execute slightly complex code, such as the connection.settings_dict
check in the above example.
Some database column types accept parameters, such as CHAR(25)
, where the parameter 25
represents the maximum column length. In cases like these, it’s more flexible if the parameter is specified in the model rather than being hard-coded in the db_type()
method. For example, it wouldn’t make much sense to have a CharMaxlength25Field
, shown here:
# This is a silly example of hard-coded parameters.
class CharMaxlength25Field(models.Field):
def db_type(self, connection):
return 'char(25)'
# In the model:
class MyModel(models.Model):
# ...
my_field = CharMaxlength25Field()
The better way of doing this would be to make the parameter specifiable at run time – i.e., when the class is instantiated. To do that, just implement Field.__init__()
, like so:
# This is a much more flexible example.
class BetterCharField(models.Field):
def __init__(self, max_length, *args, **kwargs):
self.max_length = max_length
super().__init__(*args, **kwargs)
def db_type(self, connection):
return 'char(%s)' % self.max_length
# In the model:
class MyModel(models.Model):
# ...
my_field = BetterCharField(25)
Finally, if your column requires truly complex SQL setup, return None
from db_type()
. This will cause Django’s SQL creation code to skip over this field. You are then responsible for creating the column in the right table in some other way, of course, but this gives you a way to tell Django to get out of the way.
The rel_db_type()
method is called by fields such as ForeignKey
and OneToOneField
that point to another field to determine their database column data types. For example, if you have an UnsignedAutoField
, you also need the foreign keys that point to that field to use the same data type:
# MySQL unsigned integer (range 0 to 4294967295).
class UnsignedAutoField(models.AutoField):
def db_type(self, connection):
return 'integer UNSIGNED AUTO_INCREMENT'
def rel_db_type(self, connection):
return 'integer UNSIGNED'
Converting values to Python objects?
If your custom Field
class deals with data structures that are more complex than strings, dates, integers, or floats, then you may need to override from_db_value()
and to_python()
.
If present for the field subclass, from_db_value()
will be called in all circumstances when the data is loaded from the database, including in aggregates and values()
calls.
to_python()
is called by deserialization and during the clean()
method used from forms.
As a general rule, to_python()
should deal gracefully with any of the following arguments:
- An instance of the correct type (e.g.,
Hand
in our ongoing example). - A string
-
None
(if the field allowsnull=True
)
In our HandField
class, we’re storing the data as a VARCHAR field in the database, so we need to be able to process strings and None
in the from_db_value()
. In to_python()
, we need to also handle Hand
instances:
import re
from django.core.exceptions import ValidationError
from django.db import models
from django.utils.translation import gettext_lazy as _
def parse_hand(hand_string):
"""Takes a string of cards and splits into a full hand."""
p1 = re.compile('.{26}')
p2 = re.compile('..')
args = [p2.findall(x) for x in p1.findall(hand_string)]
if len(args) != 4:
raise ValidationError(_("Invalid input for a Hand instance"))
return Hand(*args)
class HandField(models.Field):
# ...
def from_db_value(self, value, expression, connection):
if value is None:
return value
return parse_hand(value)
def to_python(self, value):
if isinstance(value, Hand):
return value
if value is None:
return value
return parse_hand(value)
Notice that we always return a Hand
instance from these methods. That’s the Python object type we want to store in the model’s attribute.
For to_python()
, if anything goes wrong during value conversion, you should raise a ValidationError
exception.
Converting Python objects to query values?
Since using a database requires conversion in both ways, if you override to_python()
you also have to override get_prep_value()
to convert Python objects back to query values.
For example:
class HandField(models.Field):
# ...
def get_prep_value(self, value):
return ''.join([''.join(l) for l in (value.north,
value.east, value.south, value.west)])
Warning
If your custom field uses the CHAR
, VARCHAR
or TEXT
types for MySQL, you must make sure that get_prep_value()
always returns a string type. MySQL performs flexible and unexpected matching when a query is performed on these types and the provided value is an integer, which can cause queries to include unexpected objects in their results. This problem cannot occur if you always return a string type from get_prep_value()
.
Converting query values to database values?
Some data types (for example, dates) need to be in a specific format before they can be used by a database backend.get_db_prep_value()
is the method where those conversions should be made. The specific connection that will be used for the query is passed as the connection
parameter. This allows you to use backend-specific conversion logic if it is required.
For example, Django uses the following method for its BinaryField
:
def get_db_prep_value(self, value, connection, prepared=False):
value = super().get_db_prep_value(value, connection, prepared)
if value is not None:
return connection.Database.Binary(value)
return value
In case your custom field needs a special conversion when being saved that is not the same as the conversion used for normal query parameters, you can override get_db_prep_save()
.
Preprocessing values before saving?
If you want to preprocess the value just before saving, you can use pre_save()
. For example, Django’s DateTimeField
uses this method to set the attribute correctly in the case of auto_now
or auto_now_add
.
If you do override this method, you must return the value of the attribute at the end. You should also update the model’s attribute if you make any changes to the value so that code holding references to the model will always see the correct value.
Specifying the form field for a model field?
To customize the form field used by ModelForm
, you can override formfield()
.
The form field class can be specified via the form_class
and choices_form_class
arguments; the latter is used if the field has choices specified, the former otherwise. If these arguments are not provided, CharField
or TypedChoiceField
will be used.
All of the kwargs
dictionary is passed directly to the form field’s __init__()
method. Normally, all you need to do is set up a good default for the form_class
(and maybe choices_form_class
) argument and then delegate further handling to the parent class. This might require you to write a custom form field (and even a form widget). See the forms documentation for information about this.
Continuing our ongoing example, we can write the formfield()
method as:
class HandField(models.Field):
# ...
def formfield(self, **kwargs):
# This is a fairly standard way to set up some defaults
# while letting the caller override them.
defaults = {'form_class': MyFormField}
defaults.update(kwargs)
return super().formfield(**defaults)
This assumes we’ve imported a MyFormField
field class (which has its own default widget). This document doesn’t cover the details of writing custom form fields.
Emulating built-in field types?
If you have created a db_type()
method, you don’t need to worry about get_internal_type()
– it won’t be used much. Sometimes, though, your database storage is similar in type to some other field, so you can use that other field’s logic to create the right column.
For example:
class HandField(models.Field):
# ...
def get_internal_type(self):
return 'CharField'
No matter which database backend we are using, this will mean that migrate
and other SQL commands create the right column type for storing a string.
If get_internal_type()
returns a string that is not known to Django for the database backend you are using – that is, it doesn’t appear in django.db.backends.<db_name>.base.DatabaseWrapper.data_types
– the string will still be used by the serializer, but the default db_type()
method will return None
. See the documentation of db_type()
for reasons why this might be useful. Putting a descriptive string in as the type of the field for the serializer is a useful idea if you’re ever going to be using the serializer output in some other place, outside of Django.
Converting field data for serialization?
To customize how the values are serialized by a serializer, you can override value_to_string()
. Using value_from_object()
is the best way to get the field’s value prior to serialization. For example, since HandField
uses strings for its data storage anyway, we can reuse some existing conversion code:
class HandField(models.Field):
# ...
def value_to_string(self, obj):
value = self.value_from_object(obj)
return self.get_prep_value(value)
Some general advice?
Writing a custom field can be a tricky process, particularly if you’re doing complex conversions between your Python types and your database and serialization formats. Here are a couple of tips to make things go more smoothly:
- Look at the existing Django fields (in
django/db/models/fields/__init__.py
) for inspiration. Try to find a field that’s similar to what you want and extend it a little bit, instead of creating an entirely new field from scratch. - Put a
__str__()
method on the class you’re wrapping up as a field. There are a lot of places where the default behavior of the field code is to callstr()
on the value. (In our examples in this document,value
would be aHand
instance, not aHandField
). So if your__str__()
method automatically converts to the string form of your Python object, you can save yourself a lot of work.
Writing a FileField
subclass?
In addition to the above methods, fields that deal with files have a few other special requirements which must be taken into account. The majority of the mechanics provided by FileField
, such as controlling database storage and retrieval, can remain unchanged, leaving subclasses to deal with the challenge of supporting a particular type of file.
Django provides a File
class, which is used as a proxy to the file’s contents and operations. This can be subclassed to customize how the file is accessed, and what methods are available. It lives at django.db.models.fields.files
, and its default behavior is explained in the file documentation.
Once a subclass of File
is created, the new FileField
subclass must be told to use it. To do so, simply assign the new File
subclass to the special attr_class
attribute of the FileField
subclass.
A few suggestions?
In addition to the above details, there are a few guidelines which can greatly improve the efficiency and readability of the field’s code.
- The source for Django’s own
ImageField
(indjango/db/models/fields/files.py
) is a great example of how to subclassFileField
to support a particular type of file, as it incorporates all of the techniques described above. - Cache file attributes wherever possible. Since files may be stored in remote storage systems, retrieving them may cost extra time, or even money, that isn’t always necessary. Once a file is retrieved to obtain some data about its content, cache as much of that data as possible to reduce the number of times the file must be retrieved on subsequent calls for that information.