Skip to content

Commit 2c377b3

Browse files
committed
Move up key phrase filter to improve results
This will prevent false-positives for the following scenario: 1. There are multiple matches, with a smalller match that id contained 2. It is filtered because of containment in the larger match 3. The larger is discarded because of not having keyphrases 4. There is no match left, we now have a false negative As a practical example: 1. In query there is `licensed under the AGPL 3.0` 2. In rule1 I have `AGPL 3.0` 3. In rule2 I have `licensed under the {{AGPL 3.0 license}}` 3. The query will initially match both rules 4. The match to rule1 may be discarded as contained in the rule2 match 5. The match to rule2 may be discarded as missing a key phrase By moving the key phrase filter up before the containment filter the larger match may be filtered out, giving the smaller matches a chance to stay. Signed-off-by: Mike Rombout <mike.rombout@elastisys.com>
1 parent 40166d0 commit 2c377b3

15 files changed

+21
-14
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This SDK is distributed under the Apache License, Version 2.0, see LICENSE for more information.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
license_expression: apache-2.0
2+
is_license_notice: yes
3+
relevance: 100

src/licensedcode/data/rules/apache-2.0_571.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<licenses>
22
<license>
3-
<name>Apache License 2.0</name>
3+
<name>{{Apache License 2.0}}</name>
44
<url>http://repository.jboss.org/licenses/apache-2.0.txt</url>
55
<distribution>repo</distribution>
66
</license>
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
<licenses> <license> <name>Apache License, Version 2.0</name> <url>http://apache.org/licenses/LICENSE-2.0</url> <distribution>repo</distribution> </license> </licenses>
1+
<licenses> <license> <name>{{Apache License}}, Version 2.0</name> <url>http://apache.org/licenses/LICENSE-2.0</url> <distribution>repo</distribution> </license> </licenses>

src/licensedcode/data/rules/bsd-new_531.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<licenses>
22
<license>
3-
<name>Berkeley Software Distribution (BSD) License</name>
3+
<name>{{Berkeley Software Distribution}} (BSD) License</name>
44
<url>http://www.opensource.org/licenses/bsd-license.html</url>
55
<distribution>repo</distribution>
66
</license>

src/licensedcode/data/rules/mpl-1.0_14.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MPL:
22

3-
"The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/
3+
"The contents of this file are subject to the {{Mozilla Public License}} Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/
44

55
Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.
66

src/licensedcode/data/rules/mpl-1.0_8.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
* The contents of this file are subject to the Mozilla Public License
1+
* The contents of this file are subject to the {{Mozilla Public License}}
22
* Version 1.0 (the "License"); you may not use this file except in
33
* compliance with the License. You may obtain a copy of the License at
44
* http://www.mozilla.org/MPL/

src/licensedcode/data/rules/mpl-1.1_36.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MPL v1.1:
22

3-
The contents of this file are subject to the Mozilla Public License
3+
The contents of this file are subject to the {{Mozilla Public License}}
44
Version 1.1 (the "License"); you may not use this file except in
55
compliance with the License. You may obtain a copy of the License at
66
http://www..com/mpl.html

src/licensedcode/data/rules/mpl-1.1_40.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
The contents of this file are subject to the Mozilla Public License Version 1.1
1+
The contents of this file are subject to the {{Mozilla Public License}} Version 1.1
22
(the License); you may not use this file except in compliance with the License.
33
You may obtain a copy of the License at http://www.mozilla.org/MPL/
44
Software distributed under the License is distributed on an AS IS basis,

src/licensedcode/data/rules/npl-1.1_1.RULE

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
* The contents of this file are subject to the Netscape Public
2-
* License Version 1.1 (the "License"); you may not use this file
1+
* The contents of this file are subject to the {{Netscape Public
2+
* License}} Version 1.1 (the "License"); you may not use this file
33
* except in compliance with the License. You may obtain a copy of
44
* the License at http://www.mozilla.org/NPL/
55
*

src/licensedcode/data/rules/proprietary_133.RULE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
* The contents of this file are subject to the KnowledgeTree Public
1+
* The contents of this file are subject to the {{KnowledgeTree}} Public
22
* License Version 1.1 ("License"); You may not use this file except in
33
* compliance with the License. You may obtain a copy of the License at
44
* http://www.ktdms.com/KPL

src/licensedcode/match.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -1498,6 +1498,10 @@ def _log(_matches, _discarded, msg):
14981498
all_discarded.extend(discarded)
14991499
_log(matches, discarded, 'GOOD')
15001500

1501+
matches, discarded = filter_key_phrase_spans(matches)
1502+
all_discarded.extend(discarded)
1503+
_log(matches, discarded, 'KEY PHRASES')
1504+
15011505
matches = merge_matches(matches)
15021506
if TRACE: logger_debug(' #####refine_matches: before FILTER matches#', len(matches))
15031507
if TRACE_REFINE:
@@ -1540,10 +1544,6 @@ def _log(_matches, _discarded, msg):
15401544
all_discarded.extend(discarded)
15411545
_log(matches, discarded, 'HIGH ENOUGH SCORE')
15421546

1543-
matches, discarded = filter_key_phrase_spans(matches)
1544-
all_discarded.extend(discarded)
1545-
_log(matches, discarded, 'KEY PHRASES')
1546-
15471547
if merge:
15481548
matches = merge_matches(matches)
15491549

Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
license_expressions:
2+
- apache-2.0
23
- apache-2.0
34
- apache-2.0
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
license_expressions:
22
- mit
3+
- mit
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
license_expressions:
2+
- erlangpl-1.1
23
- erlangpl-1.1

0 commit comments

Comments
 (0)