Models
Entity
Entity is the model that puts the 'E' in 'NER'. spaCy defines an Entity as "a 'real-world object' that's assigned a name – for example, a person, a country, a product or a book title".
Fields
label(primary key) - machine-friendly representation of the Entity as used in spaCyname- user-friendly representation of the Entity
Example
from ner_trainer.models import Entity
entity = Entity.objects.create(
label='PROVINCE',
name='Province'
)
Phrase
Phrase is text that may contain zero or more Entities.
Fields
text(unique) - phrase text. This field is unique to prevent duplicate Phrases during bulk imports.skipped- whether the phrase is skipped. Useful when a phrase could be bulk imported more than once.
Methods
as_spacy_train_data()- returns aspacy.gold-compatible representation of the tagged Phrase:
('I like London and Berlin.', {
'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
})
Example
from ner_trainer.models import Phrase
phrase = Phrase.objects.create(
text="Nova Scotia is one of Canada's three maritime provinces."
)
Custom Managers
active_objects- returns a queryset of allPhraseinstances that haven't been skipped:
>>> Phrase.active_objects.all() == Phrase.objects.filter(skipped=False)
True
tagged_objects- returns a queryset of activePhraseinstances that have been tagged (i.e., have relatedPhraseEntityobjects):
>>> Phrase.tagged_objects.all() == Phrase.objects.filter(skipped=False, entities__isnull=False)
True
PhraseEntity
A PhraseEntity joins an Entity to a Phrase and stores where in the phrase that entity is located.
Fields
phrase-Phrasecontaining this named entityentity-Entitydefined betweenstart_indexandend_indexstart_index- start character index of theEntityin thePhraseend_index- end character index of theEntityin thePhrase
Methods
as_spacy_tuple()- returns a tuple ofstart_index,end_index, andentity.labelfor use training the spaCy NER model.