Skip to content

set_parse_action handler receives seemingly erroneous loc argument #557

@bernd-wechner

Description

@bernd-wechner

Essentially, I am using set_parse_action() to determine to location of parse elements within the source string (if there's another way, I'm all ears). The parse action defined in the documentation as having a fingerprint of:

Each parse action fn is a callable method with 0-3 arguments, called as fn(s, loc, toks) , fn(loc, toks) , fn(toks) , or just fn() , where:

    s = the original string being parsed
    loc = the location of the matching substring
    toks = a list of the matched tokens, packaged as a ParseResults object

And so I write me one to test this:

def test_value_location(s, loc, toks):
    pfx = "\t\t"
    print(f"{pfx}{s=}")
    print(f"{pfx}{loc=}")
    print(f"{pfx}{toks=}")
    value1 = ''.join(toks)
    value2 = toks.asDict()['value']
    value3 = s[loc:loc+len(value1)]
    print(f"{pfx}{value1=}")
    print(f"{pfx}{value2=}")
    print(f"{pfx}{value3=}")

and implement a parser. Only I find that the loc that is reported is inconsistent and in one scenario IMHO broken (either a bug, or surprisingly unintuitive and hard to comprehend behaviour that warrants clear documentation - which is may even have and I've not found it).

The issue seems to be that if value follows a white space element be that Empty(*), White() or '' loc is out by 1!

To confirm this, I wrote a test script (attached as: pyparsing-bug.py.zip)

In synopsis, it defines two test lines to parse, and 4 scenarios to test. The first line is of the form "setting = value" the second of the form "setting value", and the scenarios define an assignment operator as '=', Empty(), White() or ''. In each scenario are parse action handler prints its arguments and three versions of the value parsed (that handler is above). The first values if from toks as a list, the second from toks as a dict, the third from the source string at loc.

I expect all the values to be identical in all successfully parsed scenarios and I expect the first line to parse only in the first scenario and the second line in the remaining three scenarios. And that is exactly what I find except that the third value (from the source string at loc) demonstrates an inconsistency. That value is at loc in the empty scenarios seems right and out by one in the non-empty scenario.

Surely the whole point of loc, is to point to the value parsed regardless?

The full output in which you can see value3 is wrong in Line 0 scenario 1:

Line 0: setting_1 = value-1 # comment 1
	Scenario 0: '='
		s='setting_1 = value-1 # comment 1\n'
		loc=11
		toks=ParseResults(['value-1'], {'value': 'value-1'})
		value1='value-1'
		value2='value-1'
		value3=' value-'
		Result:
			['setting_1', '=', 'value-1', '#', ' comment 1']
			- name: 'setting_1'
			- trailing_comment: 			['#', ' comment 1']
			- value: 'value-1'

	Scenario 1: Empty
	Failed to parse: Expected {quoted string using single or double quotes | W:(-.0-9A-Za-z)}, found '='  (at char 10), (line:1, col:11)

	Scenario 2: <SP><TAB><CR><LF>
	Failed to parse: Expected {quoted string using single or double quotes | W:(-.0-9A-Za-z)}, found '='  (at char 10), (line:1, col:11)

	Scenario 3: ''
	Failed to parse: Expected {quoted string using single or double quotes | W:(-.0-9A-Za-z)}, found '='  (at char 10), (line:1, col:11)

Line 1: setting_2 value-2 # comment 2
	Scenario 0: '='
	Failed to parse: Expected '=', found 'value'  (at char 10), (line:1, col:11)

	Scenario 1: Empty
		s='setting_2 value-2 # comment 2\n'
		loc=10
		toks=ParseResults(['value-2'], {'value': 'value-2'})
		value1='value-2'
		value2='value-2'
		value3='value-2'
		Result:
			['setting_2', 'value-2', '#', ' comment 2']
			- name: 'setting_2'
			- trailing_comment: 			['#', ' comment 2']
			- value: 'value-2'

	Scenario 2: <SP><TAB><CR><LF>
		s='setting_2 value-2 # comment 2\n'
		loc=10
		toks=ParseResults(['value-2'], {'value': 'value-2'})
		value1='value-2'
		value2='value-2'
		value3='value-2'
		Result:
			['setting_2', ' ', 'value-2', '#', ' comment 2']
			- name: 'setting_2'
			- trailing_comment: 			['#', ' comment 2']
			- value: 'value-2'

	Scenario 3: ''
		s='setting_2 value-2 # comment 2\n'
		loc=10
		toks=ParseResults(['value-2'], {'value': 'value-2'})
		value1='value-2'
		value2='value-2'
		value3='value-2'
		Result:
			['setting_2', 'value-2', '#', ' comment 2']
			- name: 'setting_2'
			- trailing_comment: 			['#', ' comment 2']
			- value: 'value-2'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions