3d8141d26a3b01ff948e00956cb0723a89dadf7f - platform/external/expat.git

commit	3d8141d26a3b01ff948e00956cb0723a89dadf7f	[log] [tgz]
author	Snild Dolkow <snild@sony.com>	Mon Nov 20 16:11:24 2023 +0100
committer	Snild Dolkow <snild@sony.com>	Mon Jan 29 17:09:36 2024 +0100
tree	0bec893af7eab5ff0ddefd05d389063b7c302e6d
parent	60b74209899a67d426d208662674b55a5eed918c [diff]

Bypass partial token heuristic when nearing full buffer ...instead of only when approaching the maximum buffer size INT/2+1. We'd like to give applications a chance to finish parsing a large token before buffer reallocation, in case the reallocation fails. By bypassing the reparse deferral heuristic when getting close to the filling the buffer, we give them this chance -- if the whole token is present in the buffer, it will be parsed at that time. This may come at the cost of some extra reparse attempts. For a token of n bytes, these extra parses cause us to scan over a maximum of 2n bytes (... + n/8 + n/4 + n/2 + n). Therefore, parsing of big tokens remains O(n) in regard how many bytes we scan in attempts to parse. The cost in reality is lower than that, since the reparses that happen due to the bypass will affect m_partialTokenBytesBefore, delaying the next ratio-based reparse. Furthermore, only the first token that "breaks through" a buffer ceiling takes that extra reparse attempt; subsequent large tokens will only bypass the heuristic if they manage to hit the new buffer ceiling. Note that this cost analysis depends on the assumption that Expat grows its buffer by doubling it (or, more generally, grows it exponentially). If this changes, the cost of this bypass may increase. Hopefully, this would be caught by test_big_tokens_take_linear_time or the new test. The bypass logic assumes that the application uses a consistent fill. If the app increases its fill size, it may miss the bypass (and the normal heuristic will apply). If the app decreases its fill size, the bypass may be hit multiple times for the same buffer size. The very worst case would be to always fill half of the remaining buffer space, in which case parsing of a large n-byte token becomes O(n log n). As an added bonus, the new test case should be faster than the old one, since it doesn't have to go all the way to 1GiB to check the behavior. Finally, this change necessitated a small modification to two existing tests related to reparse deferral. These tests are testing the deferral enabled setting, and assume that reparsing will not happen for any other reason. By pre-growing the buffer, we make sure that this new deferral does not affect those test cases.