-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add internal URI handling API #19073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first remarks. Did not yet look at everything.
static zend_string *parse_url_uri_to_string(void *uri, uri_recomposition_mode_t recomposition_mode, bool exclude_fragment) | ||
{ | ||
ZEND_UNREACHABLE(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to simply NULL
the pointer in the uri_handler_t
struct instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the same comment from @DanielEScherzer in the original PR, and I wrote him that I would like to avoid making the handlers optional if possible, because this way the existence of the handlers don't have to be checked before their usage - it's advantageous both for maintainability and performance.
The parse_url based implementation is special because it's not directly exposed for userland - it's just an internal URI "backend" for BC, and these handlers aren't necessarily needed for now. We could of course expose the to_string
handlers later for 3rd party extensions if we want to. Then the code should probably be changed to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function that triggers undefined behavior when called (this is what ZEND_UNREACHABLE implies for production builds) and not having a function (i.e. dereferencing a NULL pointer when trying to call the function) are functionally the same. In both cases the PHP binary will do something bad (ideally just crash).
Thus it seems to be preferable to clearly indicate that the handler is not available by using NULL rather than pretending there is a handler when calling it is unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see what you mean now. I can't comment on what method is preferable, but intentionally passing a NULL
value instead of a handler function, while callers of the handlers never expect NULL
also seems wrong. Normally, static analyzers would emit an error in this case (in PHP for sure, and I don't know about C
), that's why I didn't even think about this solution.
TBH the code which uses ZEND_UNREACHABLE()
is unreachable indeed if one uses the internal API: currently, no function is exposed that would make use of the relevant handlers.
static void *parse_url_clone_uri(void *uri) | ||
{ | ||
ZEND_UNREACHABLE(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
ext/uri/php_uri.c
Outdated
if (uri_handler_name == NULL) { | ||
return uri_handler_by_name("parse_url", sizeof("parse_url") - 1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url"
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
here works because that's the default indeed where php_uri_get_handler()
is called, the other "backends" can only be used if the config is explicitly passed (not null).
The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string
when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.
PHPAPI zend_result php_uri_get_scheme(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_SCHEME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_username(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_USERNAME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_password(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PASSWORD, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_host(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_HOST, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_port(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PORT, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_path(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PATH, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_query(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_QUERY, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_fragment(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_FRAGMENT, read_mode, zv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of these new helpers is not clear to me. It feels like just another layer of indirection by moving the enum into the function name. There's also already uri_property_handler_from_internal_uri()
, why doesn't it work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions come from the time when the property was passed as a zend_string, so having separate methods used to make sense. You are right, these are not really needed anymore, so I'm fine with removing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the alternative code is quite much longer, and a bit more difficult to use:
- zend_result result = php_uri_get_host(internal_uri, URI_COMPONENT_READ_RAW, &host_zv);
+ zend_result result = php_uri_property_handler_from_internal_uri(internal_uri, URI_PROPERTY_NAME_USERNAME)->read_func(internal_uri, URI_COMPONENT_READ_RAW, &host_zv);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+it makes the handlers directly available for usage, which I wanted to avoid for now (because write handlers are not always available)
No description provided.