dumps() function¶
- rapidjson.dumps(obj, *, skipkeys=False, ensure_ascii=True, write_mode=WM_COMPACT, indent=4, default=None, sort_keys=False, number_mode=None, datetime_mode=None, uuid_mode=None, bytes_mode=BM_UTF8, iterable_mode=IM_ANY_ITERABLE, mapping_mode=MM_ANY_MAPPING, allow_nan=True)¶
Encode given Python obj instance into a
JSON
string.- Parameters
obj – the value to be serialized
skipkeys (bool) – whether invalid
dict
keys will be skippedensure_ascii (bool) – whether the output should contain only ASCII characters
write_mode (int) – enable particular pretty print behaviors
indent – indentation width or string to produce pretty printed JSON
default (callable) – a function that gets called for objects that can’t otherwise be serialized
sort_keys (bool) – whether dictionary keys should be sorted alphabetically
number_mode (int) – enable particular behaviors in handling numbers
datetime_mode (int) – how should
datetime
,time
anddate
instances be handleduuid_mode (int) – how should
UUID
instances be handledbytes_mode (int) – how should
bytes
instances be handlediterable_mode (int) – how should iterable values be handled
mapping_mode (int) – how should mapping values be handled
allow_nan (bool) – compatibility flag equivalent to
number_mode=NM_NAN
- Returns
A Python
str
instance.
skipkeys
If skipkeys is true (default:
False
), then dict keys that are not of a basic type (str
,int
,float
,bool
,None
) will be skipped instead of raising aTypeError
:>>> dumps({(0,): 'empty tuple', True: 'a true value'}) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: keys must be strings >>> dumps({(0,): 'empty tuple', True: 'a true value'}, skipkeys=True) '{}'
Note
skipkeys is a backward compatible alias of new
MM_SKIP_NON_STRING_KEYS
mapping mode.ensure_ascii
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is:
>>> dumps('The symbol for the Euro currency is €') '"The symbol for the Euro currency is \\u20AC"' >>> dumps('The symbol for the Euro currency is €', ... ensure_ascii=False) '"The symbol for the Euro currency is €"'
write_mode
The write_mode controls how
python-rapidjson
emits JSON: by default it isWM_COMPACT
, that produces the most compact JSON representation:>>> dumps([1, 2, {'three': 3, 'four': 4}]) '[1,2,{"four":4,"three":3}]'
With
WM_PRETTY
it will useRapidJSON
‘sPrettyWriter
, with a default indent (see below) of four spaces:>>> print(dumps([1, 2, {'three': 3, 'four': 4}], write_mode=WM_PRETTY)) [ 1, 2, { "four": 4, "three": 3 } ]
With
WM_SINGLE_LINE_ARRAY
arrays will be kept on a single line:>>> print(dumps([1, 2, 'three', [4, 5]], write_mode=WM_SINGLE_LINE_ARRAY)) [1, 2, 'three', [4, 5]] >>> print(dumps([1, 2, {'three': 3, 'four': 4}], write_mode=WM_SINGLE_LINE_ARRAY)) [1, 2, { "three": 3, "four": 4 }]
indent
The indent parameter may be either a positive integer number or a string: in the former case it specifies a number of spaces, while in the latter the string may contain zero or more ASCII whitespace characters (space, tab
\t
, newline\n
and carriage-return\r
), all equals (that is,"\n\t"
is not accepted).The integer number or the length of the string determine how many spaces (or the characters composing the string) will be used to indent nested structures, when the write_mode above is not
WM_COMPACT
, and it defaults to 4. Specifying a value different fromNone
automatically sets write_mode toWM_PRETTY
, if not explicited.By setting indent to 0 each array item (when write_mode is not
WM_SINGLE_LINE_MODE
) and each dictionary value will be followed by a newline. A positive integer means that each level will be indented by that many spaces:>>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=0)) [ 1, 2, { "four": 4, "three": 3 } ] >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=2)) [ 1, 2, { "four": 4, "three": 3 } ] >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent="")) [ 1, 2, { "four": 4, "three": 3 } ] >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=" ")) [ 1, 2, { "four": 4, "three": 3 } ] >>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent="\t")) [ 1, 2, { "three": 3, "four": 4 } ]
default
The default argument may be used to specify a custom serializer for otherwise not handled objects. If specified, it should be a function that gets called for such objects and returns a JSON encodable version of the object itself or raise a
TypeError
:>>> class Point(object): ... def __init__(self, x, y): ... self.x = x ... self.y = y ... >>> point = Point(1,2) >>> dumps(point) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: <__main__.Point object at …> is not JSON serializable >>> def point_jsonifier(obj): ... if isinstance(obj, Point): ... return {'x': obj.x, 'y': obj.y} ... else: ... raise ValueError('%r is not JSON serializable' % obj) ... >>> dumps(point, default=point_jsonifier) '{"x":1,"y":2}'
sort_keys
When sort_keys is true (default:
False
), the JSON representation of Python dictionaries is sorted by key:>>> data = {'a': 'A', 'c': 'C', 'i': 'I', 'd': 'D'} >>> dumps(data, sort_keys=True) '{"a":"A","c":"C","d":"D","i":"I"}'
Note
sort_keys is a backward compatible alias of new
MM_SORT_KEYS
mapping mode.>>> dumps(data, mapping_mode=MM_SORT_KEYS) '{"a":"A","c":"C","d":"D","i":"I"}'
The default setting, on modern snakes (that is, on Python >= 3.7), preserves original dictionary insertion order:
>>> dumps(data) '{"a":"A","c":"C","i":"I","d":"D"}'
number_mode
The number_mode argument selects different behaviors in handling numeric values.
By default non-numbers (
nan
,inf
,-inf
) will be serialized as their JavaScript equivalents (NaN
,Infinity
,-Infinity
), becauseNM_NAN
is on by default (NB: this is not compliant with theJSON
standard):>>> nan = float('nan') >>> inf = float('inf') >>> dumps([nan, inf]) '[NaN,Infinity]' >>> dumps([nan, inf], number_mode=NM_NAN) '[NaN,Infinity]'
Explicitly setting number_mode or using the compatibility option allow_nan you can avoid that and obtain a
ValueError
exception instead:>>> dumps([nan, inf], number_mode=NM_NATIVE) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Out of range float values are not JSON compliant >>> dumps([nan, inf], allow_nan=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Out of range float values are not JSON compliant
Likewise
Decimal
instances cause aTypeError
exception:>>> from decimal import Decimal >>> pi = Decimal('3.1415926535897932384626433832795028841971') >>> dumps(pi) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Decimal(…) is not JSON serializable
while using
NM_DECIMAL
they will be serialized as their textual representation like any other float value:>>> dumps(pi, number_mode=NM_DECIMAL) '3.1415926535897932384626433832795028841971'
Yet another possible flag affects how numeric values are passed to the underlying RapidJSON library: by default they are serialized to their string representation by the module itself, so they are virtually of unlimited precision:
>>> dumps(123456789012345678901234567890) '123456789012345678901234567890'
With
NM_NATIVE
their binary values will be passed directly instead: this is somewhat faster, it is subject to the underlying C librarylong long
anddouble
limits:>>> dumps(123456789012345678901234567890, number_mode=NM_NATIVE) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too big to convert
These flags can be combined together:
>>> fast_and_precise = NM_NATIVE | NM_DECIMAL | NM_NAN >>> dumps([-1, nan, pi], number_mode=fast_and_precise) '[-1,NaN,3.1415926535897932384626433832795028841971]'
datetime_mode
By default
date
,datetime
andtime
instances are not serializable:>>> from datetime import datetime >>> right_now = datetime(2016, 8, 28, 13, 14, 52, 277256) >>> date = right_now.date() >>> time = right_now.time() >>> dumps({'date': date, 'time': time, 'timestamp': right_now}) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: datetime(…) is not JSON serializable
When datetime_mode is set to
DM_ISO8601
those values are serialized using the common ISO 8601 format:>>> dumps(['date', date, 'time', time, 'timestamp', right_now], ... datetime_mode=DM_ISO8601) '["date","2016-08-28","time","13:14:52.277256","timestamp","2016-08-28T13:14:52.277256"]'
The right_now value is a naïve datetime (because it does not carry the timezone information) and is normally assumed to be in the local timezone, whatever your system thinks it is. When you instead know that your value, even being naïve are actually in the UTC timezone, you can use the
DM_NAIVE_IS_UTC
flag to inform RapidJSON about that:>>> mode = DM_ISO8601 | DM_NAIVE_IS_UTC >>> dumps(['time', time, 'timestamp', right_now], datetime_mode=mode) '["time","13:14:52.277256+00:00","timestamp","2016-08-28T13:14:52.277256+00:00"]'
A variant is
DM_SHIFT_TO_UTC
, that shifts all datetime values to the UTC timezone before serializing them:>>> from datetime import timedelta, timezone >>> here = timezone(timedelta(hours=2)) >>> now = datetime(2016, 8, 28, 20, 31, 11, 84418, here) >>> dumps(now, datetime_mode=DM_ISO8601) '"2016-08-28T20:31:11.084418+02:00"' >>> mode = DM_ISO8601 | DM_SHIFT_TO_UTC >>> dumps(now, datetime_mode=mode) '"2016-08-28T18:31:11.084418+00:00"'
With
DM_IGNORE_TZ
the timezone, if present, is simply omitted:>>> mode = DM_ISO8601 | DM_IGNORE_TZ >>> dumps(now, datetime_mode=mode) '"2016-08-28T20:31:11.084418"'
Another one-way only alternative format is Unix time: with
DM_UNIX_TIME
date
,datetime
andtime
instances are serialized as a number of seconds, respectively since theEPOCH
for the first two kinds and since midnight for the latter:>>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC >>> dumps([now, now.date(), now.time()], datetime_mode=mode) '[1472409071.084418,1472342400.0,73871.084418]' >>> unixtime = float(dumps(now, datetime_mode=mode)) >>> datetime.fromtimestamp(unixtime, here) == now True
Combining it with the
DM_ONLY_SECONDS
will produce integer values instead, dropping microseconds:>>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC | DM_ONLY_SECONDS >>> dumps([now, now.date(), now.time()], datetime_mode=mode) '[1472409071,1472342400,73871]'
It can be used combined with
DM_SHIFT_TO_UTC
to obtain the timestamp of the corresponding UTC time:>>> mode = DM_UNIX_TIME | DM_SHIFT_TO_UTC >>> dumps(now, datetime_mode=mode) '1472409071.084418'
As above, when you know that your values are in the UTC timezone, you can use the
DM_NAIVE_IS_UTC
flag to get the right result:>>> a_long_time_ago = datetime(1968, 3, 18, 9, 10, 0, 0) >>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC >>> dumps([a_long_time_ago, a_long_time_ago.date(), a_long_time_ago.time()], ... datetime_mode=mode) '[-56472600.0,-56505600.0,33000.0]'
uuid_mode
Likewise, to handle
UUID
instances there are two modes that can be specified with the uuid_mode argument, that will use the string representation of their values:>>> from uuid import uuid4 >>> random_uuid = uuid4() >>> dumps(random_uuid) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: UUID(…) is not JSON serializable >>> dumps(random_uuid, uuid_mode=UM_CANONICAL) '"be576345-65b5-4fc2-92c5-94e2f82e38fd"' >>> dumps(random_uuid, uuid_mode=UM_HEX) '"be57634565b54fc292c594e2f82e38fd"'
bytes_mode
By default all
bytes
instances are assumed to beUTF-8
encoded strings, and acted on accordingly:>>> ascii_string = 'ciao' >>> bytes_string = b'cio\xc3\xa8' >>> unicode_string = 'cioè' >>> dumps([ascii_string, bytes_string, unicode_string]) '["ciao","cio\\u00E8","cio\\u00E8"]'
Sometime you may prefer a different approach, explicitly disabling that behavior using the
BM_NONE
mode:>>> dumps([ascii_string, bytes_string, unicode_string], ... bytes_mode=BM_NONE) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: b'cio\xc3\xa8' is not JSON serializable >>> my_bytes_handler = lambda b: b.decode('UTF-8').upper() >>> dumps([ascii_string, bytes_string, unicode_string], ... bytes_mode=BM_NONE, default=my_bytes_handler) '["ciao","CIO\\u00C8","cio\\u00E8"]'
iterable_mode
By default a value that implements the iterable protocol gets encoded as a
JSON
array:>>> from time import localtime, struct_time >>> lt = localtime() >>> dumps(lt) '[2020,11,28,19,55,40,5,333,0]' >>> class MyList(list): ... pass >>> ml = MyList((1,2,3)) >>> dumps(ml) '[1,2,3]'
When that’s not appropriate, for example because you want to use a different way to encode them, you may specify iterable_mode to
IM_ONLY_LISTS
:>>> dumps(lt, iterable_mode=IM_ONLY_LISTS) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: <time.struct_time …> is not JSON serializable >>> dumps(ml, iterable_mode=IM_ONLY_LISTS) Traceback (most recent call last): ... TypeError: [1, 2, 3] is not JSON serializable
and thus you can use the default argument:
>>> def ts_or_ml(obj): ... if isinstance(obj, struct_time): ... return {'__class__': 'time.struct_time', '__init__': list(obj)} ... elif isinstance(obj, MyList): ... return [i*2 for i in obj] ... else: ... raise ValueError('%r is not JSON serializable' % obj) >>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml) '{"__class__":"time.struct_time","__init__":[2020,11,28,19,55,40,5,333,0]}' >>> dumps(ml, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml) '[2,4,6]'
Obviously, in such case the value returned by the default callable must not be or contain a
tuple
:>>> def bad_timestruct(obj): ... if isinstance(obj, struct_time): ... return {'__class__': 'time.struct_time', '__init__': tuple(obj)} ... else: ... raise ValueError('%r is not JSON serializable' % (obj,)) >>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=bad_timestruct) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: (…) is not JSON serializable
mapping_mode
By default a value that implements the mapping protocol gets encoded as a
JSON
object:>>> from collections import Counter >>> d = {"a":1,"b":2,"c":3} >>> c = Counter(d) >>> dumps([c, d]) '[{"a":1,"b":2,"c":3},{"a":1,"b":2,"c":3}]'
When that’s not appropriate, for example because you want to use a different way to encode them, you may specify mapping_mode to
MM_ONLY_DICTS
:>>> dumps(d, mapping_mode=MM_ONLY_DICTS) '{"a":1,"b":2,"c":3}' >>> dumps(c, mapping_mode=MM_ONLY_DICTS) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Counter(…) is not JSON serializable
and thus you can use the default argument:
>>> def counter(obj): ... if isinstance(obj, Counter): ... return {'__class__': 'collections.Counter', '__init__': dict(obj)} ... else: ... raise ValueError('%r is not JSON serializable' % obj) >>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=counter) '{"__class__":"collections.Counter","__init__":{"a":1,"b":2,"c":3}}'
Obviously, in such case the value returned by the default callable must not be or contain mappings other than plain
dict
s:>>> from collections import OrderedDict >>> def bad_counter(obj): ... if isinstance(obj, Counter): ... return {'__class__': 'time.struct_time', '__init__': OrderedDict(obj)} ... else: ... raise ValueError('%r is not JSON serializable' % (obj,)) >>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=bad_counter) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: OrderedDict([('a', 1), ('b', 2), ('c', 3)]) is not JSON serializable
Normally, dumping a dictionary containing non-string keys raises a
TypeError
exception:>>> dumps({-1: 'minus-one'}) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: keys must be strings
Setting mapping_mode to
MM_COERCE_KEYS_TO_STRINGS
such keys will be converted to their string representation:>>> dumps({-1: 'minus-one', True: "good", False: "bad", None: "ugly"}, ... mapping_mode=MM_COERCE_KEYS_TO_STRINGS) '{"-1":"minus-one","True":"good","False":"bad","None":"ugly"}'
Alternatively, by providing a default function you can have finer control on how they should be encoded. For example the following mimics the default behaviour of the standard library
json
module:>>> def mimic_stdlib_json(obj): ... if isinstance(obj, dict): ... result = {} ... for key in obj: ... if key is True: ... result['true'] = obj[key] ... elif key is False: ... result['false'] = obj[key] ... elif key is None: ... result['null'] = obj[key] ... elif isinstance(key, (int, float)): ... result[str(key)] = obj[key] ... else: ... raise TypeError('keys must be str, int, float, bool or None') ... return result ... else: ... raise ValueError('%r is not JSON serializable' % (obj,)) >>> dumps({True: 'good', False: 'bad', None: 'ugly'}, ... default=mimic_stdlib_json) '{"true":"good","false":"bad","null":"ugly"}'
Warning
This can lead to an infinite recursion error, if the default function returns a dictionary that still contains non-string keys:
>>> dumps({True: 'vero', False: 'falso'}, ... default=lambda map: map) Traceback (most recent call last): File "<stdin>", line 1, in <module> RecursionError: maximum recursion depth exceeded