Implement bag-of-entities metric

Create a new command ie-eval boe --label-dir labels/ --prediction-dir predictions/ to evaluate bag-of-word metrics for entities

Similar to PRImA Research's texteval but for entities

Regroup the text by entity type

>>> label.text 
"Anne Marie and Louise were on a ski trip this week"
>>> label.entities
[("per", "Anne"), ("per", "Marie"), ("per", "Louise"), ("date", "this week")]
>>> merge_entities(label.entities)
{"per": "Anne Marie Louise", "date": "this week"}

>>> prediction.text 
"Anne and Marie were on a ski trip this past week"
>>> prediction.entities
[("per", "Anne"), ("per", "Marie"), ("date", "this past week")]
>>> merge_entities(prediction.entities)
{"per": "Anne Marie", "date": "this past week"}

Compute True Positives / False Negatives / False Positives

based on counts (considering a list of words - including duplicates) or index (consider a set of words - duplicates are ignored)
tokens could be words, subwords or characters

Category	TP	FP	FN
date	2	1 (past)	0
per	2	0	1 (Louise)
total	8	1 (past)	1 (Louise)

Compute and display Precision, Recall, F1

Category	Precision (%)	Recall (%)	F1 (%)	Support
date	66.7	100.0	80.0	2
per	100.0	66.7	80.0	3
total	88.9	88.9	88.9	9