dumps() function¶

rapidjson.dumps(obj, *, skipkeys=False, ensure_ascii=True, write_mode=WM_COMPACT, indent=4, default=None, sort_keys=False, number_mode=None, datetime_mode=None, uuid_mode=None, bytes_mode=BM_UTF8, iterable_mode=IM_ANY_ITERABLE, mapping_mode=MM_ANY_MAPPING, allow_nan=True)¶

Encode given Python obj instance into a JSON string.

Parameters:

obj – the value to be serialized
skipkeys (bool) – whether invalid dict keys will be skipped
ensure_ascii (bool) – whether the output should contain only ASCII characters
write_mode (int) – enable particular pretty print behaviors
indent – indentation width or string to produce pretty printed JSON
default (callable) – a function that gets called for objects that can’t otherwise be serialized
sort_keys (bool) – whether dictionary keys should be sorted alphabetically
number_mode (int) – enable particular behaviors in handling numbers
datetime_mode (int) – how should datetime, time and date instances be handled
uuid_mode (int) – how should UUID instances be handled
bytes_mode (int) – how should bytes instances be handled
iterable_mode (int) – how should iterable values be handled
mapping_mode (int) – how should mapping values be handled
allow_nan (bool) – compatibility flag equivalent to number_mode=NM_NAN

Returns:

A Python str instance.

skipkeys

If skipkeys is true (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError:

>>> dumps({(0,): 'empty tuple', True: 'a true value'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keys must be strings
>>> dumps({(0,): 'empty tuple', True: 'a true value'},
...       skipkeys=True)
'{}'

Note

skipkeys is a backward compatible alias of new MM_SKIP_NON_STRING_KEYS mapping mode.

ensure_ascii

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is:

>>> dumps('The symbol for the Euro currency is €')
'"The symbol for the Euro currency is \\u20AC"'
>>> dumps('The symbol for the Euro currency is €',
...       ensure_ascii=False)
'"The symbol for the Euro currency is €"'

write_mode

The write_mode controls how python-rapidjson emits JSON: by default it is WM_COMPACT, that produces the most compact JSON representation:

>>> dumps([1, 2, {'three': 3, 'four': 4}])
'[1,2,{"three":3,"four":4}]'

With WM_PRETTY it will use RapidJSON‘s PrettyWriter, with a default indent (see below) of four spaces:

>>> print(dumps([1, 2, {'three': 3, 'four': 4}],
...       write_mode=WM_PRETTY))
[
    1,
    2,
    {
        "three": 3,
        "four": 4
    }
]

With WM_SINGLE_LINE_ARRAY arrays will be kept on a single line:

>>> print(dumps([1, 2, 'three', [4, 5]],
...       write_mode=WM_SINGLE_LINE_ARRAY))
[1, 2, "three", [4, 5]]
>>> print(dumps([1, 2, {'three': 3, 'four': 4}],
...       write_mode=WM_SINGLE_LINE_ARRAY))
[1, 2, {
        "three": 3,
        "four": 4
    }]

indent

The indent parameter may be either a positive integer number or a string: in the former case it specifies a number of spaces, while in the latter the string may contain zero or more ASCII whitespace characters (space, tab \t, newline \n and carriage-return \r), all equals (that is, "\n\t" is not accepted).

The integer number or the length of the string determine how many spaces (or the characters composing the string) will be used to indent nested structures, when the write_mode above is not WM_COMPACT, and it defaults to 4. Specifying a value different from None automatically sets write_mode to WM_PRETTY, if not explicited.

By setting indent to 0 each array item (when write_mode is not WM_SINGLE_LINE_MODE) and each dictionary value will be followed by a newline. A positive integer means that each level will be indented by that many spaces:

>>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=0))
[
1,
2,
{
"three": 3,
"four": 4
}
]
>>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=2))
[
  1,
  2,
  {
    "three": 3,
    "four": 4
  }
]
>>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent=""))
[
1,
2,
{
"three": 3,
"four": 4
}
]
>>> print(dumps([1, 2, {'three': 3, 'four': 4}], indent="  "))
[
  1,
  2,
  {
    "three": 3,
    "four": 4
  }
]
>>> print(dumps([1, 2, {'three': 3, 'four': 4}],
...       indent="\t").replace('\t', '→ '))
[
→ 1,
→ 2,
→ {
→ → "three": 3,
→ → "four": 4
→ }
]

default

The default argument may be used to specify a custom serializer for otherwise not handled objects. If specified, it should be a function that gets called for such objects and returns a JSON encodable version of the object itself or raise a TypeError:

>>> class Point(object):
...   def __init__(self, x, y):
...     self.x = x
...     self.y = y
...
>>> point = Point(1,2)
>>> dumps(point)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: <__main__.Point object at …> is not JSON serializable
>>> def point_jsonifier(obj):
...   if isinstance(obj, Point):
...     return {'x': obj.x, 'y': obj.y}
...   else:
...     raise ValueError('%r is not JSON serializable' % obj)
...
>>> dumps(point, default=point_jsonifier)
'{"x":1,"y":2}'

sort_keys

When sort_keys is true (default: False), the JSON representation of Python dictionaries is sorted by key:

>>> data = {'a': 'A', 'c': 'C', 'i': 'I', 'd': 'D'}
>>> dumps(data, sort_keys=True)
'{"a":"A","c":"C","d":"D","i":"I"}'

Note

sort_keys is a backward compatible alias of new MM_SORT_KEYS mapping mode.

>>> dumps(data, mapping_mode=MM_SORT_KEYS)
'{"a":"A","c":"C","d":"D","i":"I"}'

The default setting, on modern snakes (that is, on Python >= 3.7), preserves original dictionary insertion order:

>>> dumps(data)
'{"a":"A","c":"C","i":"I","d":"D"}'

number_mode

The number_mode argument selects different behaviors in handling numeric values.

By default non-numbers (nan, inf, -inf) will be serialized as their JavaScript equivalents (NaN, Infinity, -Infinity), because NM_NAN is on by default (NB: this is not compliant with the JSON standard):

>>> nan = float('nan')
>>> inf = float('inf')
>>> dumps([nan, inf])
'[NaN,Infinity]'
>>> dumps([nan, inf], number_mode=NM_NAN)
'[NaN,Infinity]'

Explicitly setting number_mode or using the compatibility option allow_nan you can avoid that and obtain a ValueError exception instead:

>>> dumps([nan, inf], number_mode=NM_NATIVE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Out of range float values are not JSON compliant
>>> dumps([nan, inf], allow_nan=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Out of range float values are not JSON compliant

Likewise Decimal instances cause a TypeError exception:

>>> from decimal import Decimal
>>> pi = Decimal('3.1415926535897932384626433832795028841971')
>>> dumps(pi)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Decimal(…) is not JSON serializable

while using NM_DECIMAL they will be serialized as their textual representation like any other float value:

>>> dumps(pi, number_mode=NM_DECIMAL)
'3.1415926535897932384626433832795028841971'

Yet another possible flag affects how numeric values are passed to the underlying RapidJSON library: by default they are serialized to their string representation by the module itself, so they are virtually of unlimited precision:

>>> dumps(123456789012345678901234567890)
'123456789012345678901234567890'

With NM_NATIVE their binary values will be passed directly instead: this is somewhat faster, it is subject to the underlying C library long long and double limits:

>>> dumps(123456789012345678901234567890, number_mode=NM_NATIVE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert

These flags can be combined together:

>>> fast_and_precise = NM_NATIVE | NM_DECIMAL | NM_NAN
>>> dumps([-1, nan, pi], number_mode=fast_and_precise)
'[-1,NaN,3.1415926535897932384626433832795028841971]'

datetime_mode

By default date, datetime and time instances are not serializable:

>>> from datetime import datetime
>>> right_now = datetime(2016, 8, 28, 13, 14, 52, 277256)
>>> date = right_now.date()
>>> time = right_now.time()
>>> dumps({'date': date, 'time': time, 'timestamp': right_now})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: datetime(…) is not JSON serializable

When datetime_mode is set to DM_ISO8601 those values are serialized using the common ISO 8601 format:

>>> dumps(['date', date, 'time', time, 'timestamp', right_now],
...       datetime_mode=DM_ISO8601)
'["date","2016-08-28","time","13:14:52.277256","timestamp","2016-08-28T13:14:52.277256"]'

The right_now value is a naïve datetime (because it does not carry the timezone information) and is normally assumed to be in the local timezone, whatever your system thinks it is. When you instead know that your value, even being naïve are actually in the UTC timezone, you can use the DM_NAIVE_IS_UTC flag to inform RapidJSON about that:

>>> mode = DM_ISO8601 | DM_NAIVE_IS_UTC
>>> dumps(['time', time, 'timestamp', right_now], datetime_mode=mode)
'["time","13:14:52.277256+00:00","timestamp","2016-08-28T13:14:52.277256+00:00"]'

A variant is DM_SHIFT_TO_UTC, that shifts all datetime values to the UTC timezone before serializing them:

>>> from datetime import timedelta, timezone
>>> here = timezone(timedelta(hours=2))
>>> now = datetime(2016, 8, 28, 20, 31, 11, 84418, here)
>>> dumps(now, datetime_mode=DM_ISO8601)
'"2016-08-28T20:31:11.084418+02:00"'
>>> mode = DM_ISO8601 | DM_SHIFT_TO_UTC
>>> dumps(now, datetime_mode=mode)
'"2016-08-28T18:31:11.084418+00:00"'

With DM_IGNORE_TZ the timezone, if present, is simply omitted:

>>> mode = DM_ISO8601 | DM_IGNORE_TZ
>>> dumps(now, datetime_mode=mode)
'"2016-08-28T20:31:11.084418"'

Another one-way only alternative format is Unix time: with DM_UNIX_TIME date, datetime and time instances are serialized as a number of seconds, respectively since the EPOCH for the first two kinds and since midnight for the latter:

>>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC
>>> dumps([now, now.date(), now.time()], datetime_mode=mode)
'[1472409071.084418,1472342400.0,73871.084418]'
>>> unixtime = float(dumps(now, datetime_mode=mode))
>>> datetime.fromtimestamp(unixtime, here) == now
True

Combining it with the DM_ONLY_SECONDS will produce integer values instead, dropping microseconds:

>>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC | DM_ONLY_SECONDS
>>> dumps([now, now.date(), now.time()], datetime_mode=mode)
'[1472409071,1472342400,73871]'

It can be used combined with DM_SHIFT_TO_UTC to obtain the timestamp of the corresponding UTC time:

>>> mode = DM_UNIX_TIME | DM_SHIFT_TO_UTC
>>> dumps(now, datetime_mode=mode)
'1472409071.084418'

As above, when you know that your values are in the UTC timezone, you can use the DM_NAIVE_IS_UTC flag to get the right result:

>>> a_long_time_ago = datetime(1968, 3, 18, 9, 10, 0, 0)
>>> mode = DM_UNIX_TIME | DM_NAIVE_IS_UTC
>>> dumps([a_long_time_ago, a_long_time_ago.date(), a_long_time_ago.time()],
...       datetime_mode=mode)
'[-56472600.0,-56505600.0,33000.0]'

uuid_mode

Likewise, to handle UUID instances there are two modes that can be specified with the uuid_mode argument, that will use the string representation of their values:

>>> from uuid import uuid4
>>> random_uuid = uuid4()
>>> dumps(random_uuid)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: UUID(…) is not JSON serializable
>>> dumps(random_uuid, uuid_mode=UM_CANONICAL) 
'"be576345-65b5-4fc2-92c5-94e2f82e38fd"'
>>> dumps(random_uuid, uuid_mode=UM_HEX) 
'"be57634565b54fc292c594e2f82e38fd"'

bytes_mode

By default all bytes instances are assumed to be UTF-8 encoded strings, and acted on accordingly:

>>> ascii_string = 'ciao'
>>> bytes_string = b'cio\xc3\xa8'
>>> unicode_string = 'cioè'
>>> dumps([ascii_string, bytes_string, unicode_string])
'["ciao","cio\\u00E8","cio\\u00E8"]'

Sometime you may prefer a different approach, explicitly disabling that behavior using the BM_NONE mode:

>>> dumps([ascii_string, bytes_string, unicode_string],
...       bytes_mode=BM_NONE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: b'cio\xc3\xa8' is not JSON serializable
>>> my_bytes_handler = lambda b: b.decode('UTF-8').upper()
>>> dumps([ascii_string, bytes_string, unicode_string],
...       bytes_mode=BM_NONE, default=my_bytes_handler)
'["ciao","CIO\\u00C8","cio\\u00E8"]'

iterable_mode

By default a value that implements the iterable protocol gets encoded as a JSON array:

>>> from time import localtime, struct_time
>>> lt = localtime()
>>> dumps(lt) 
'[2020,11,28,19,55,40,5,333,0]'
>>> class MyList(list):
...   pass
>>> ml = MyList((1,2,3))
>>> dumps(ml)
'[1,2,3]'

When that’s not appropriate, for example because you want to use a different way to encode them, you may specify iterable_mode to IM_ONLY_LISTS:

>>> dumps(lt, iterable_mode=IM_ONLY_LISTS)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: <time.struct_time …> is not JSON serializable
>>> dumps(ml, iterable_mode=IM_ONLY_LISTS)
Traceback (most recent call last):
  ...
TypeError: [1, 2, 3] is not JSON serializable

and thus you can use the default argument:

>>> def ts_or_ml(obj):
...   if isinstance(obj, struct_time):
...     return {'__class__': 'time.struct_time', '__init__': list(obj)}
...   elif isinstance(obj, MyList):
...     return [i*2 for i in obj]
...   else:
...     raise ValueError('%r is not JSON serializable' % obj)
>>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml) 
'{"__class__":"time.struct_time","__init__":[2020,11,28,19,55,40,5,333,0]}'
>>> dumps(ml, iterable_mode=IM_ONLY_LISTS, default=ts_or_ml)
'[2,4,6]'

Obviously, in such case the value returned by the default callable must not be or contain a tuple:

>>> def bad_timestruct(obj):
...   if isinstance(obj, struct_time):
...     return {'__class__': 'time.struct_time', '__init__': tuple(obj)}
...   else:
...     raise ValueError('%r is not JSON serializable' % (obj,))
>>> dumps(lt, iterable_mode=IM_ONLY_LISTS, default=bad_timestruct)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: (…) is not JSON serializable

mapping_mode

By default a value that implements the mapping protocol gets encoded as a JSON object:

>>> from collections import Counter
>>> d = {"a":1,"b":2,"c":3}
>>> c = Counter(d)
>>> dumps([c, d])
'[{"a":1,"b":2,"c":3},{"a":1,"b":2,"c":3}]'

When that’s not appropriate, for example because you want to use a different way to encode them, you may specify mapping_mode to MM_ONLY_DICTS:

>>> dumps(d, mapping_mode=MM_ONLY_DICTS)
'{"a":1,"b":2,"c":3}'
>>> dumps(c, mapping_mode=MM_ONLY_DICTS)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Counter(…) is not JSON serializable

and thus you can use the default argument:

>>> def counter(obj):
...   if isinstance(obj, Counter):
...     return {'__class__': 'collections.Counter', '__init__': dict(obj)}
...   else:
...     raise ValueError('%r is not JSON serializable' % obj)
>>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=counter)
'{"__class__":"collections.Counter","__init__":{"a":1,"b":2,"c":3}}'

Obviously, in such case the value returned by the default callable must not be or contain mappings other than plain dicts:

>>> from collections import OrderedDict
>>> def bad_counter(obj):
...   if isinstance(obj, Counter):
...     return {'__class__': 'time.struct_time', '__init__': OrderedDict(obj)}
...   else:
...     raise ValueError('%r is not JSON serializable' % (obj,))
>>> dumps(c, mapping_mode=MM_ONLY_DICTS, default=bad_counter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: OrderedDict([('a', 1), ('b', 2), ('c', 3)]) is not JSON serializable

Normally, dumping a dictionary containing non-string keys raises a TypeError exception:

>>> dumps({-1: 'minus-one'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keys must be strings

Setting mapping_mode to MM_COERCE_KEYS_TO_STRINGS such keys will be converted to their string representation:

>>> dumps({-1: 'minus-one', True: "good", False: "bad", None: "ugly"},
...       mapping_mode=MM_COERCE_KEYS_TO_STRINGS)
'{"-1":"minus-one","True":"good","False":"bad","None":"ugly"}'

Alternatively, by providing a default function you can have finer control on how they should be encoded. For example the following mimics the default behaviour of the standard library json module:

>>> def mimic_stdlib_json(obj):
...   if isinstance(obj, dict):
...     result = {}
...     for key in obj:
...       if key is True:
...         result['true'] = obj[key]
...       elif key is False:
...         result['false'] = obj[key]
...       elif key is None:
...         result['null'] = obj[key]
...       elif isinstance(key, (int, float)):
...         result[str(key)] = obj[key]
...       else:
...         raise TypeError('keys must be str, int, float, bool or None')
...     return result
...   else:
...     raise ValueError('%r is not JSON serializable' % (obj,))
>>> dumps({True: 'good', False: 'bad', None: 'ugly'},
...       default=mimic_stdlib_json)
'{"true":"good","false":"bad","null":"ugly"}'

Warning

This can lead to an infinite recursion error, if the default function returns a dictionary that still contains non-string keys:

>>> dumps({True: 'vero', False: 'falso'},
...       default=lambda map: map)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded

dumps() function¶

python-rapidjson

Navigation

Related Topics