Skip to content

Fixed #36134 -- Add support for ABSENT ON NULL clause for JSONArray. #19097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

john-parton
Copy link
Contributor

@john-parton john-parton commented Jan 23, 2025

Trac ticket number

https://code.djangoproject.com/ticket/36134

Branch description

On all backends except Sqlite and Postgres version 15 and earlier, there is an additional clause to the JSON_ARRAY SQL which controls whether SQL NULL values are mapped to JSON null values or whether they are omitted.

In sqlite and postgres 15 and earlier, the default behavior is to map them, so my initial feature for JsonArray just had that as the only behavior without any ability to customize on the richer backends.

This adds a parameter to JsonArray, defaulting to the old behavior.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch 3 times, most recently from 5dcc39c to 5f3ffa9 Compare January 23, 2025 23:30
@john-parton
Copy link
Contributor Author

A few notes real quick:

I chose to use an enum rather than a simple boolean keyword argument to mirror the SQL

I put the enum on the JsonArray class, because I didn't think it had any use outside of JSON_ARRAY. I actually am seeing that's not true. It looks like JsonObject has a similar clause: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/JSON_OBJECT.html I could expand this pull request to add the functionality to JsonObject as well. I don't think it would be particularly difficult.

Docs were a pretty rough copy/paste/tweak job.

I added a new DB feature. I originally just did some basic checks inside the class, but it made it basically impossible to test cleanly.

@john-parton john-parton changed the title Fixed #36134 -- Add support for ABSENT ON NULL clause for JSONField. Fixed #36134 -- Add support for ABSENT ON NULL clause for JSONArray. Jan 23, 2025
@john-parton
Copy link
Contributor Author

It looks like the biggest issue with adding an on_null clause to JsonObject is that it already accepts variadic **kwargs, so in the case that you want to make a json object that has an on_null key, you wouldn't be able to do it.

The closest parallel I can think of in the existing database is with get_or_create where if you want to do a lookup on a field called defaults, you must specify defaults__exact

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch 2 times, most recently from 11b044e to af250d9 Compare January 23, 2025 23:47
@john-parton
Copy link
Contributor Author

john-parton commented Jan 23, 2025

As an aside, my version of JSONObject I made long ago didn't take **kwargs, but rather you were supposed to pass everything as arguments

So instead of

JSONObject(
    key="value"
)

You would do

JSONObject(
    Value("key"), F("value")
)

Which is obviously more verbose, but it let you create an object where one of the keys was actually dynamic from another column.

It would also resolve the ambiguity of providing an absent_on_null flag, because it didn't take any keyword arguments.

I'm not exactly sure how that would resolve the issue with current django, though.

Another thought would be to have an alternate constructor as a classmethod

JSONObject.absent_on_null(**kwargs)

Or having to manually set a prop.

obj = JSONObject(...)
obj.absent_on_null = True

But the prop is error-prone and requires you to create a temporary variable.

I guess another solution is to just make it so you can't build objects with absent_on_null as a key and passing anything other than a bool raise an error, and then just document it. "Sorry, why would you even do this?"

@charettes
Copy link
Member

charettes commented Jan 24, 2025

I'd focus this PR on JSONArray and see what can be done for JSONObject in a distinct ticket/PR?

A potential solution could be to require that fields are provided as a list[str, Resolvable] when nulls exclusion is requested so allowed signatures would be of the form

@overload
def __init__(self, fields: list[str, Resolvable], *, exclude_nulls: bool = False): ...

@overload
def __init__(self, **fields: Resolvable): ...

def __init__(self, *args, **kwargs):
    if len(args) > 1:
        raise TypeError
    elif args:
        fields = dict(args)
        exclude_nulls = kwargs.pop("exclude_nulls", False)
        if kwargs:
            raise TypeError
    else:
        fields = kwargs

It also seems like a less useful feature for JSONObject than it is for JSONArray or even JSONArrayAgg IMO.

@john-parton
Copy link
Contributor Author

I'd focus this PR on JSONArray and see what can be done for JSONObject in a distinct ticket/PR?

A potential solution could be to require that fields are provided as a list[str, Resolvable] when nulls exclusion is requested so allowed signatures would be of the form

@overload
def __init__(self, fields: list[str, Resolvable], *, exclude_nulls: bool = False): ...

@overload
def __init__(self, **fields: Resolvable): ...

def __init__(self, *args, **kwargs):
    if len(args) > 1:
        raise TypeError
    elif args:
        fields = dict(args)
        exclude_nulls = kwargs.pop("exclude_nulls", False)
        if kwargs:
            raise TypeError
    else:
        fields = kwargs

It also seems like a less useful feature for JSONObject than it is for JSONArray or even JSONArrayAgg IMO.

Ack, I wish the type annotations were just in django. That's so clear.

It's funny, I was hacking on pretty much exactly this solution , minus type annotations.


return self.as_sql(
compiler,
connection,
template=(
f"%(function)s(%(expressions)s {null_on_null} RETURNING {returning})"
f"%(function)s(%(expressions)s {on_null_clause} RETURNING {returning})"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a VERY minor quibble, but if you don't have any source expressions, the generate SQL will end up having a triple space in it.

If the spaces were moved to lines 33 and 35, and then removed from the template here, it would resolve that, at the expense of making it slightly awkward right here.

Is that even something worth discussing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I were hand-writing the SQL, I would write it like JSON_OBJECT(RETURNING JSONB), but the ORM will emit JSON_OBJECT( RETURNING JSONB)

@john-parton
Copy link
Contributor Author

I removed array from the DB feature and added a comment that it's supposed to indicate support for the clause in both array and object contexts.

@john-parton
Copy link
Contributor Author

john-parton commented Jan 24, 2025

I think that the ability to provide key-value pairs to JSONObject explicitly is a very esoteric thing. I only did it in my application because I just generally make the functions look as close to the SQL as possible, ergonomics be damned.

As an example esoteric usage, if you're trying to get the database to omit valid objects for the Elasticsearch DSL, you would almost certainly need to be able to build objects with non-static keys.

For example https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

The key "user.id" refers to a field in elastic.

I think if you proposed this to me as a reasonable use of Django, I would think you're crazy, but that's one use off the top of my head.

I can create a feature issue if desired.

@john-parton
Copy link
Contributor Author

Looks like MySQL tests are failing. I think it's because it doesn't use the "as_native" definition where the code is.

I'll take a closer look.

@john-parton
Copy link
Contributor Author

Right, because MySQL doesn't support it. Duh. Toggled the value on the backend. Need to update the docs.

I am wondering if a workaround for the non-native backends would be worth it. I know a lot of bending over backwards is done for some functions and backends.

@john-parton
Copy link
Contributor Author

john-parton commented Jan 24, 2025

Alright, thought about it for a bit and decided the workaround wasn't as nasty as I thought it might be, at least for MySQL and Postgres 15.

I have absolutely no idea how to do it on SQLite. I believe it would require a subquery and an aggregate, or something really, really weird. Probably not safe to do it automatically.

If we go with this workaround version, I think we can just get rid of the supports_json_absent_on_null field completely, and just move it to be a check for SQLite.

@john-parton
Copy link
Contributor Author

Alright, last suite of changes:

  • Removed DB feature, because all the database backends except SQLite now have similar behavior
  • Reworked tests for If/Unless to check for SQLite, instead of DB feature
  • Updated docs to remove note that Postgres 15 doesn't work

@john-parton
Copy link
Contributor Author

john-parton commented Jan 24, 2025

Polishing:

  • There was some imports inside a method to avoid a circular import, but they were a 'shortcut' import from a parent module. Explicitly importing from where classes were defined removed the need for that.
  • There was a small duplication of a code in as_postgres and as_mysql to handle the case where there are no expressions. I moved it into the workaround method to keep it DRY, but it does result in some complicated super() invocations that might be difficult to reckon with. I believe there needs to be more test coverage for when expressions == [] and absent_on_null == True

@john-parton
Copy link
Contributor Author

Few extra tests added, tests reorganized a bit:

  • Test absent_on_null with an empty array
  • Test absent_on_null with two elements, one null and one non-null
  • Test absent_on_null with one element, both the case of a null element and non-null element

I think that's should be sufficient to exercise all of the branches in the workaround.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from c248d95 to 27d717d Compare January 24, 2025 17:32
@john-parton
Copy link
Contributor Author

Additional changes:

  • Restored the feature flag that an earlier version had. 3rd party backends will need to reach out if there's a backend that only has partial support for ABSENT ON NULL across the various functions.
  • Merged the two test classes again. I realized the class I created didn't have any setup, so I just squashed them back together
  • applied isort. I'll try to be more fastidious with this. I'm just used to my commit hooks and ruff
  • Squashed/merged

@john-parton
Copy link
Contributor Author

Note: I am still missing a release note. What version of Django would this be targeting? Should be able to make it into 5.2, I would expect.

@charettes
Copy link
Member

@john-parton 5.2 is feature frozen at this point so that would be 6.0.

@john-parton
Copy link
Contributor Author

Added a release note. Tweaked the docs to have a versionadded note.

I'm not the best at getting docs formatted exactly right, so some help there would be appreciated.

@john-parton
Copy link
Contributor Author

john-parton commented Jan 24, 2025

I should probably stop iterating, but I'm a bit obsessed at the moment, apparently.

The workaround that I had was pretty nasty in that mysql and postgres had to pass specific arguments to the method in order so that it could do the right concatenation operation.

I just pulled out the concatenation logic into its own function, and then there's actually no database-specific logic in the main workaround now.

I prepended an underscore to the function to indicate that it's private. If it's used incorrectly, it could cause pain. In particular, there's no attempt made to make sure that the expressions will actually resolve to arrays, and if you pass objects, the behavior is backend dependent.

Pulling the concatenation into its own function does let me write tests directly against that, which I have done.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch 5 times, most recently from 2f94da9 to 340d4e6 Compare May 28, 2025 07:00
@john-parton
Copy link
Contributor Author

Made the requested changes, rebased, force-pushed.

I also updated the "Admonition" section of the docs to properly reflect that usage of absent_on_null=True is not supported on MySQL, MariaDB and PostgreSQL versions prior to 16. Let me know if any changes are required there.

It seems like a pretty bad hack to use json loads/dumps in the SQLite function. Bad code smell. Let me know if there's a more straightforward or better way of handling that.

I ran tests locally against SQLite and Postgres.

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjustments @john-parton, glad you figured out the SQLite function registration interface.

Left some potential pointers in how the loads/dumps roundtrip could be avoided on SQLite.

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for given the custom SQLite function a try. Kind of awesome how SQLite provides an interface for that right?

I left a suggestion for an additional test that ensures the null eliding is performed at the database level and not faked at the Python level when retrieving the tuples.

Otherwise the patch LGTM 🎉 Once all squashed and rebase use the past tense in the commit message (Add -> Added) and mark the patch as not needing improvements anymore.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from 713cc5c to 661c779 Compare May 30, 2025 03:31
@john-parton
Copy link
Contributor Author

john-parton commented May 30, 2025

Rebased, force-pushed. Summary of changes:

  • Changed commit message to be entirely in the past tense.
  • Replaced the technique of defining a json_array_absent_on_null SQLite function with a json_strip_nulls function which wraps the standard json_array function. Implemented only the case where NULL or a JSON Array is passed to the function.
  • Changed my annotation. I annotated by new tests with json_array. Changed it to arr to match existing tests.
  • Added a new test with nested arrays.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from 661c779 to da32cb9 Compare May 30, 2025 03:34
@john-parton
Copy link
Contributor Author

Does anyone know why tests failed? I'm unable to understand the error message.

Can we retrigger tests?

@charettes
Copy link
Member

charettes commented May 30, 2025

It's issues with the CI infra, force-pushing an amended commit should do assuming the problem was transient.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from da32cb9 to 5f8346a Compare May 31, 2025 23:38
@john-parton
Copy link
Contributor Author

Amended the commit and force-pushed. Checks are going now.

@john-parton
Copy link
Contributor Author

john-parton commented May 31, 2025

We're getting django.db.utils.OperationalError: user-defined function raised exception on Windows SQLite, but not Linux. That's interesting. Maybe it has to do with the internal encoding of json objects? Perhaps they are bytestrings on windows instead of normal strings?

Side-note: I really wish an exception here could provide a traceback or some more context. I don't have a Windows machine, so this makes it really difficult for me to continue development.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from 5f8346a to 735746e Compare June 1, 2025 00:16
@john-parton
Copy link
Contributor Author

I looked again, and there's not a dedicated Linux/SQLite entry in the test matrix. It was just SQLite failing due to a mistake on my part. I've fixed it, ran the test suite locally again to 100% be sure we're good to go, and pushed again.

Comment on lines 893 to 900
.. admonition:: MySQL and MariaDB

MySQL and MariaDB don't support ``True`` for the ``absent_on_null`` parameter.

.. admonition:: PostgreSQL

``True`` for the ``absent_on_null`` parameter requires PostgreSQL version 16
or later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. admonition:: MySQL and MariaDB
MySQL and MariaDB don't support ``True`` for the ``absent_on_null`` parameter.
.. admonition:: PostgreSQL
``True`` for the ``absent_on_null`` parameter requires PostgreSQL version 16
or later.
.. admonition:: MySQL, MariaDB and PostgreSQL
The ``absent_on_null`` parameter is ignored with MySQL, MariaDB and
PostgreSQL < 16 as they do not support it.
.. versionchanged:: 6.0
The ``absent_on_null`` parameter was added.

Then remove the .. versionadded:: 6.0 note
We also need a release note in the 6.0 notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your documentation here doesn't actually reflect the behavior. It's not actually ignored, it raises a NotSupportedError. Is there a specific way I should phrase that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    MySQL, MariaDB and PostgreSQL < 16 do not support the ``absent_on_null``
    parameter.

perhaps 🤔

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from 410e1db to 7063b40 Compare July 15, 2025 22:48
@john-parton
Copy link
Contributor Author

Please be patient while I un-break my fork. I foolishly decided to try to do something on my phone.

@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch 2 times, most recently from 3870596 to fbf02c0 Compare July 15, 2025 22:51
@john-parton john-parton force-pushed the feature/json-array-on-null-clause branch from fbf02c0 to 21b8807 Compare July 15, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants