Because good research needs good data

Status message


Warning message

Just to let you know, we are no longer updating this section.
This is retained as a resource but nothing new has been added since late 2009. No further additions will be made by the DCC.


Date added 13 December 2006
Last edited 12 November 2009

Full Title

Text Encoding Initiative


The Text Encoding Initiative is a major international initiative within the academic community to provide a standard set of Standard Generalized Mark-up Language (SGML) and Extensible Mark-up Language (XML) tag definitions which can be used to represent all kinds of electronic information, in particular the datasets generated and used by research projects in linguistics, literature and the humanities in general.

TEI is highly modular and extensible and is particularly relevant for bibliographic material. Basic tag sets are provided for prose, verse, drama, speech, dictionaries and terminological databases, and a method has been defined for creating customized mixes from these basic sets.

Additional tag sets are provided to capture information related to linking, analysis (including feature structure analysis), certainty, transcriptions, critiques, names and dates, nets (graphs, digraphs, trees, etc), figures and corpora.

Standards Developing Organisations



No information available.

Lifecycle Actions

Access, Use and Reuse
Create or Receive
Description and Representation Information

Standard Frameworks

Digital Archive Standards
Digital Repository Standards

Standard Type

XML DTD and Schema

Current Version

2008 - TEI P5 [external]
Schema and implementation guidelines for downloading.

Further Information

Alternative Current Versions


Previous Versions

1999 - TEI P3
Documentation no longer available.
2002 - TEI P4 [external]
DTD and implementation guidelines for downloading.

Referenced Standards