TEI and Persian

Home Forums Recordings TEI and Persian

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #675
    Irene K. F. Kirchneradmin
    Keymaster

    Last year, we at IDHN got a query about problems with TEI and Persian. Specifically, the team was transcribing early modern legal documents and encoding them in TEI XML using oXygen’s XML editor, for publication online. Here is the report by Elizabeth Williamson (https://humanities.exeter.ac.uk/english/staff/ewilliamson/) on the suggested solutions and their experience with it:

     

    Possible solutions for encoding right-to-left text in oXygen XML editor

    We would like to thank the IDHN members for their multiple helpful suggestions!

     

    Problem:

    Our project team is transcribing early modern legal documents and encoding them in TEI XML using oXygen’s XML editor, for publication online. However, we are having problems with oXygen’s capacity to handle right-to-left languages (namely, it is difficult to tag words and phrases in oxygen without the text ‘jumping’ around).

     

    Solutions:

    1.     A work-around, which isn’t perfect but is easy to do, is to change oXygen’s settings so that whenever we add a new tag, the text is put on a new line in the editor (without adding a line break in the TEI) – this doesn’t disrupt the eventual display but does seem to stop the words jumping around. We have gone for this option, and have also changed our encoding process slightly so that we include much of the richness in the TEI header, and paste content into pre-prepared tags in the body of the text.

     

    2.     Suggestion by Matthew Thomas Miller mtmiller@umd.edu

    “Because of these technological issues and a few other practical ones, both in the manuscript project mentioned above and our broader Open Islamicate Texts Initiative (OpenITI) we have decided to go another route to TEI. Publishing in TEI is still our ultimate goal, but we have elected to put our texts first in an intermediate text “standard”: the much more simple and Arabic-script compatible OpenITI mARkdown. As a part of the OpenITI AOCP project, for which we just received a large grant from the Mellon Foundation, we are producing an automatic OpenITI mARkdown-to-TEI XML converter that will be developed by Dr. Raff Viglianti, who is on the TEI International Technical Council. As a part of this project Raff will also be working with us on adding/adjusting TEI for Arabic-script needs and ultimately converting all of our texts to a TEI version with the converter he is developing for the project.”

     

    We looked into this option but have decided not to use this form of markdown, since the conversion is not ready yet, and we have already invested work in creating a detailed TEI schema, and find it richer for encoding the details of these varied manuscripts that we are interested in.

     

    3.     Suggestion by Mohammad Emami <mohammad.emami@wadham.ox.ac.uk>

    Add a Right-to-Left Embedding character at the beginning of the Persian/Arabic chunk of text. It is available through Oxygen’s Menu > Edit > Insert from Character Map, or by using the key combination U+202B. Tagging numeric text, e.g. a date, may be tricky: you would additionally need to add an RTL Mark before fetching a tag.

     

    This does work and stops the text jumping around. However, it corrupts the way the tagging is displayed, making it look like invalid XML though the software processes it correctly. On reflection we decided this would be more difficult and confusing for the encoding team than the transcribed text moving (which is frustrating but they can put it back where it’s meant to be). It also seems to be the case that if I add an embedding I need to add one at the start of each line, which gets laborious.

     

    4.     Other helpful suggestions included writing the Arabic text in another program and then pasting it into Oxygen (bearing in mind that this doesn’t prevent the jumping text issue for tags added to the pasted body text). Some suggested transliterating into an ASCII script as a preprocessing step, though this does not fit easily into our workflow.

     

Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.