Annotation of the form

It includes 7 steps: lemmatization, linguistic structure, standardized structure, formal structure, poetical structure, type of speech, figure of speech. Common annotating rules have been established with the help of linguists from the different annotated languages. The annotators can explain and justify all their annotations in a note.

Lemmatization reduces conjugated or declined forms to their minimal form. With respect to the neo-Latin languages, the entry of the dictionary is used:  the infinitive form for the verb, the adjective in masculine singular form. Lemmatization helps eliminate morphological or graphical variants. The terms of the BSS are the starting point, of which we modernize the form or the spelling. The elements derived from the Middle Ages lexicon are kept and modernized according to actual use. Concerning Semitic languages, we do not reduce them to their roots, we keep the categories (noun, verb, adjective…); verbs are in given in their 3rd person masculine singular / past. The number of lemmatized items should correspond exactly to the number of elements of the BSS, articles, pronouns, prepositions coupled with the words are independently lemmatized and joined together with a hyphen. Concerning Arabic, roots are indicated in a note.

Example: lavóse > lavar-se ; agora > ahora; tuelle > toller

(to know more go to :

The linguistic structure should also be aligned with lemmatization and the text of the BSS and contain the same number of items in the same order, linked together with a hyphen if necessary. The annotated categories are basic and the labels have been chosen out of the list of Leipzig Glossing Rules (hypertext link:, p. 8-10). Global labels such as DET (determinant) are preferred to article, demonstrative, possessive. Verbs indicate essentially the person.  Additional information is recorded by using periods; a space separates two units, compound units are linked together with a hyphen.

(to know more go to :

A standardized structure or formal mould (pattern) helps bring out models of sententious or proverbial formalisation.  The lexical schema model is repeated and modernised if necessary, varying verbal syntagms are represented by Y followed by a number, nominal syntagms are represented by X followed by a number.

(to know more go to :

A formal structure corresponds to the BSS logical breakdown into separate clauses; the clauses are enclosed by angle brackets following XML syntax <E.1> </E.1> <E.2> </E.2>.

(to know more go to :

The speech indicates the type of enunciation by using a pre-established list. It gives priority to relevant speech categories related to the BSS: truncated dialogue, invocation, conjecture…

The poetical structure is optional and its formulation free. It concerns the annotators interested in rhythm, metrical characteristics, rime, poetic types.

The figures of speech are inserted in a pre-established list. The figures are not all represented in our list, only the most common ones in our BSS and the most well-known, to allow annotators who are not stylistics specialists to give the appropriate information.

Example of formal labeling