commit | b0cb880cf75a9a9a07d3a676b68dce0eee66f7a6 | [log] [tgz] |
---|---|---|
author | Tyler Pirtle <rtp@google.com> | Thu Aug 12 11:12:40 2021 -0700 |
committer | Bill Neubauer <wcn@google.com> | Wed Mar 16 15:32:32 2022 -0700 |
tree | b1f9efb67e9f2c0a436806e58a20fd03697c4f82 | |
parent | f39a63e84c1fdad417cab8808c76cfbbd48ec410 [diff] |
use a map instead of a slice for the fuseRange filter. It looks like it wants to do some form of random access based on SrcStart / SrcEnd, so this might be a better fit. <<< b/196234339 was the problem There's an off-by one that I don't entirely understand so I'm trying to split the difference here. The filter variable is currently a slice and we're seeing an off-by-one in production with the amd_vulkan/LICENSE: ``` panic: runtime error: index out of range [3] with length 3 goroutine 1646 [running]: ``` Given that SrcStart / SrcEnd appear to be positions in the text file and the `i` variable seems to move between that range, it seemed natural to replace filter with a map of indexes instead of a slice...this way we can preserve the somewhat random-access pattern that appears to be happening but avoid any range errors. >>> PiperOrigin-RevId: 390415408
The license classifier is a library and set of tools that can analyze text to determine what type of license it contains. It searches for license texts in a file and compares them to an archive of known licenses. These files could be, e.g., LICENSE
files with a single or multiple licenses in it, or source code files with the license text in a comment.
A “confidence level” is associated with each result indicating how close the match was. A confidence level of 1.0
indicates an exact match, while a confidence level of 0.0
indicates that no license was able to match the text.
Adding a new license is straight-forward:
Create a file in licenses/
.
.header
” to it. See licenses/README.md
for more details.Add the license name to the list in license_type.go
.
Regenerate the licenses.db
file by running the license serializer:
$ license_serializer -output licenseclassifier/licenses
Create and run appropriate tests to verify that the license is indeed present.
identify_license
is a command line tool that can identify the license(s) within a file.
$ identify_license LICENSE LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794) LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829) LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)
The license_serializer
tool regenerates the licenses.db
archive. The archive contains preprocessed license texts for quicker comparisons against unknown texts.
$ license_serializer -output licenseclassifier/licenses
This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.