Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
nerval
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Named Entity Recognition
nerval
Commits
ede796fc
Commit
ede796fc
authored
10 months ago
by
Solene Tarride
Committed by
Yoann Schneider
10 months ago
Browse files
Options
Downloads
Patches
Plain Diff
Fix score computation when threshold=1.0
parent
b6d65f0f
No related branches found
No related tags found
1 merge request
!54
Fix score computation when threshold=1.0
Pipeline
#185561
passed
8 months ago
Stage: test
Stage: release
Changes
1
Pipelines
63
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
nerval/evaluate.py
+21
-9
21 additions, 9 deletions
nerval/evaluate.py
with
21 additions
and
9 deletions
nerval/evaluate.py
+
21
−
9
View file @
ede796fc
...
...
@@ -23,6 +23,26 @@ PRED_COLUMN = "Prediction"
CSV_HEADER
=
[
ANNO_COLUMN
,
PRED_COLUMN
]
def
match
(
annotation
:
str
,
prediction
:
str
,
threshold
:
float
)
->
bool
:
"""
Test if two entities match based on their character edit distance.
Entities should be matched if both entity exist (e.g. not empty strings) and their Character Error Rate is below the threshold.
Otherwise they should not be matched.
Args:
annotation (str): ground-truth entity.
prediction (str): predicted entity.
threshold (float): matching threshold.
Returns:
bool: Whether to match these two entities.
"""
return
(
annotation
!=
""
and
prediction
!=
""
and
editdistance
.
eval
(
annotation
,
prediction
)
/
len
(
annotation
)
<=
threshold
)
def
compute_matches
(
annotation
:
str
,
prediction
:
str
,
...
...
@@ -158,24 +178,17 @@ def compute_matches(
# Normalize collected strings
entity_ref
=
""
.
join
(
current_ref
)
entity_ref
=
entity_ref
.
replace
(
"
-
"
,
""
)
len_entity
=
len
(
entity_ref
)
entity_compar
=
""
.
join
(
current_compar
)
entity_compar
=
entity_compar
.
replace
(
"
-
"
,
""
)
# One entity is counted as recognized (score of 1) if the Levenhstein distance between the expected and predicted entities
# represents less than 30% (THRESHOLD) of the length of the expected entity.
# Precision and recall will be computed for each category in comparing the numbers of recognized entities and expected entities
score
=
(
1
if
editdistance
.
eval
(
entity_ref
,
entity_compar
)
/
len_entity
<=
threshold
else
0
)
score
=
int
(
match
(
entity_ref
,
entity_compar
,
threshold
))
entity_count
[
last_tag
]
=
entity_count
.
get
(
last_tag
,
0
)
+
score
entity_count
[
ALL_ENTITIES
]
+=
score
current_ref
=
[]
current_compar
=
[]
return
entity_count
...
...
@@ -263,7 +276,6 @@ def compute_scores(
if
(
prec
+
rec
==
0
)
else
2
*
(
prec
*
rec
)
/
(
prec
+
rec
)
)
scores
[
tag
][
"
predicted
"
]
=
nb_predict
scores
[
tag
][
"
matched
"
]
=
nb_match
scores
[
tag
][
"
P
"
]
=
prec
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment