loads() function

rapidjson.loads(string, *, object_hook=None, number_mode=None, datetime_mode=None, uuid_mode=None, parse_mode=None, allow_nan=True)

Decode the given JSON formatted value into Python object.

Parameters:
  • string – The JSON string to parse, either a Unicode str instance or a bytes or a bytearray instance containing an UTF-8 encoded value

  • object_hook (callable) – an optional function that will be called with the result of any object literal decoded (a dict) and should return the value to use instead of the dict

  • number_mode (int) – enable particular behaviors in handling numbers

  • datetime_mode (int) – how should datetime and date instances be handled

  • uuid_mode (int) – how should UUID instances be handled

  • parse_mode (int) – whether the parser should allow non-standard JSON extensions

  • allow_nan (bool) – compatibility flag equivalent to number_mode=NM_NAN

Returns:

An equivalent Python object.

Raises:
  • ValueError – if an invalid argument is given

  • JSONDecodeError – if string is not a valid JSON value

object_hook

object_hook may be used to inject a custom deserializer that can replace any dict instance found in the JSON structure with a derived object instance:

>>> class Point(object):
...   def __init__(self, x, y):
...     self.x = x
...     self.y = y
...   def __repr__(self):
...     return 'Point(%s, %s)' % (self.x, self.y)
...
>>> def point_dejsonifier(d):
...   if 'x' in d and 'y' in d:
...     return Point(d['x'], d['y'])
...   else:
...     return d
...
>>> loads('{"x":1,"y":2}', object_hook=point_dejsonifier)
Point(1, 2)

number_mode

The number_mode argument selects different behaviors in handling numeric values.

By default non-numbers (nan, inf, -inf) are recognized, because NM_NAN is on by default:

>>> loads('[NaN, Infinity]')
[nan, inf]
>>> loads('[NaN, Infinity]', number_mode=NM_NAN)
[nan, inf]

Explicitly setting number_mode or using the compatibility option allow_nan you can avoid that and obtain a ValueError exception instead:

>>> loads('[NaN, Infinity]', number_mode=NM_NATIVE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rapidjson.JSONDecodeError: … Out of range float values are not JSON compliant
>>> loads('[NaN, Infinity]', allow_nan=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rapidjson.JSONDecodeError: … Out of range float values are not JSON compliant

Normally all floating point literals present in the JSON structure will be loaded as Python float instances, with NM_DECIMAL they will be returned as Decimal instances instead:

>>> loads('1.2345')
1.2345
>>> loads('1.2345', number_mode=NM_DECIMAL)
Decimal('1.2345')

When you can be sure that all the numeric values are constrained within the architecture’s hardware limits you can get a sensible speed gain with the NM_NATIVE flag. While this is quite faster, integer literals that do not fit into the underlying C library long long limits will be converted (truncated) to double numbers:

>>> loads('123456789012345678901234567890')
123456789012345678901234567890
>>> loads('123456789012345678901234567890', number_mode=NM_NATIVE)
1.2345678901234566e+29

These flags can be combined together:

>>> loads('[-1, NaN, 3.1415926535897932384626433832795028841971]',
...       number_mode=NM_DECIMAL | NM_NAN)
[-1, Decimal('NaN'), Decimal('3.1415926535897932384626433832795028841971')]

with the exception of NM_NATIVE and NM_DECIMAL, that does not make sense since there’s little point in creating Decimal instances out of possibly truncated float literals:

datetime_mode

With datetime_mode you can enable recognition of string literals containing an ISO 8601 representation as either date, datetime or time instances:

>>> loads('"2016-01-02T01:02:03+01:00"')
'2016-01-02T01:02:03+01:00'
>>> loads('"2016-01-02T01:02:03+01:00"', datetime_mode=DM_ISO8601)
datetime.datetime(2016, 1, 2, 1, 2, 3, tzinfo=...delta(...3600)))
>>> loads('"2016-01-02T01:02:03-01:00"', datetime_mode=DM_ISO8601)
datetime.datetime(2016, 1, 2, 1, 2, 3, tzinfo=...delta(...82800)))
>>> loads('"2016-01-02"', datetime_mode=DM_ISO8601)
datetime.date(2016, 1, 2)
>>> loads('"01:02:03+01:00"', datetime_mode=DM_ISO8601)
datetime.time(1, 2, 3, tzinfo=...delta(...3600)))

It can be combined with DM_SHIFT_TO_UTC to always obtain values in the UTC timezone:

>>> mode = DM_ISO8601 | DM_SHIFT_TO_UTC
>>> loads('"2016-01-02T01:02:03+01:00"', datetime_mode=mode)
datetime.datetime(2016, 1, 2, 0, 2, 3, tzinfo=...utc)

Note

This option is somewhat limited when the value is a non-naïve time literal because negative values cannot be represented by the underlying Python type, so it cannot adapt such values reliably:

>>> mode = DM_ISO8601 | DM_SHIFT_TO_UTC
>>> loads('"00:01:02+00:00"', datetime_mode=mode)
datetime.time(0, 1, 2, tzinfo=...utc)
>>> loads('"00:01:02+01:00"', datetime_mode=mode)
Traceback (most recent call last):
  ...
ValueError: ... Time literal cannot be shifted to UTC: 00:01:02+01:00

If you combine it with DM_NAIVE_IS_UTC then all values without a timezone will be assumed to be relative to UTC:

>>> mode = DM_ISO8601 | DM_NAIVE_IS_UTC
>>> loads('"2016-01-02T01:02:03"', datetime_mode=mode)
datetime.datetime(2016, 1, 2, 1, 2, 3, tzinfo=...utc)
>>> loads('"2016-01-02T01:02:03+01:00"', datetime_mode=mode)
datetime.datetime(2016, 1, 2, 1, 2, 3, tzinfo=...delta(...3600)))
>>> loads('"01:02:03"', datetime_mode=mode)
datetime.time(1, 2, 3, tzinfo=...utc)

Yet another combination is with DM_IGNORE_TZ to ignore the timezone and obtain naïve values:

>>> mode = DM_ISO8601 | DM_IGNORE_TZ
>>> loads('"2016-01-02T01:02:03+01:00"', datetime_mode=mode)
datetime.datetime(2016, 1, 2, 1, 2, 3)
>>> loads('"01:02:03+01:00"', datetime_mode=mode)
datetime.time(1, 2, 3)

The DM_UNIX_TIME cannot be used here, because there isn’t a reasonable heuristic to disambiguate between plain numbers and timestamps:

>>> loads('[1,2,3]', datetime_mode=DM_UNIX_TIME)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid datetime_mode, can deserialize only from ISO8601

uuid_mode

With uuid_mode you can enable recognition of string literals containing two different representations of UUID values:

>>> loads('"aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"')
'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'
>>> loads('"aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"',
...       uuid_mode=UM_CANONICAL)
UUID('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa')
>>> loads('"aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"',
...       uuid_mode=UM_HEX)
UUID('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa')
>>> loads('"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"',
...       uuid_mode=UM_CANONICAL)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> loads('"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"',
...       uuid_mode=UM_HEX)
UUID('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa')

parse_mode

With parse_mode you can tell the parser to be relaxed, allowing either C++/JavaScript like comments (PM_COMMENTS):

>>> loads('"foo" // one line of explanation')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rapidjson.JSONDecodeError: Parse error at offset 6: The document root must not be followed by other values.
>>> loads('"bar" /* detailed explanation */')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rapidjson.JSONDecodeError: Parse error at offset 6: The document root must not be followed by other values.
>>> loads('"foo" // one line of explanation', parse_mode=PM_COMMENTS)
'foo'
>>> loads('"bar" /* detailed explanation */', parse_mode=PM_COMMENTS)
'bar'

or trailing commas at the end of arrays and objects (PM_TRAILING_COMMAS):

>>> loads('[1,]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
rapidjson.JSONDecodeError: Parse error at offset 3: Invalid value.
>>> loads('[1,]', parse_mode=PM_TRAILING_COMMAS)
[1]
>>> loads('{"one": 1,}', parse_mode=PM_TRAILING_COMMAS)
{'one': 1}

or both:

>>> loads('[1, /* 2, */ 3,]')
Traceback (most recent call last):
  ...
rapidjson.JSONDecodeError: Parse error at offset 4: Invalid value.
>>> loads('[1, /* 2, */ 3,]', parse_mode=PM_COMMENTS | PM_TRAILING_COMMAS)
[1, 3]