Canonical Text Service


Public Text Inventories
(!) = in editing process
(capped at 10k)
dsb (!)1lower sorbian text corpusSerbski Institute
pbc120 copyright-free mutlilingual parallel bible translations Parallel Bible Corpus
pcp1Chr├ętien de Troyes's Le Chevalier de la Charrette (Lancelot, ca. 1180)The Princeton Charrette Project
tg 1 2 3 4 5 6 7 TextgridThe Digital Library in Textgrid
tgap1Thomas Gray Archiv Poems Thomas Gray Archive
voth1David Boder: Voices of the Holocaust Voices of the Holocaust Project
Tools Namespace Resolver provides endpoint URLs based on URN namespaces
CTS Explorer provides an meta overview about available CTS instances
Source Code Resources CTS Server (Git hosted via
Python API WIP
Suggested Citation Tiepmar, J. (). Canonical Text Service.

Impressum and Data Protection Policy

This is a non-commercial academic research and data webservice.

Dr. Jochen Tiepmar, Fraunhoferstra├če 7, 04178 Leipzig, Saxony, Germany.
Preferably Email: tiepilab at or the usual academic communication channels.

Data Protection Policy
No user data is collected besides IP access logs that are stored by Apache Server software. These access logs are deleted automatically. Data sets are provided according to their public license or prior individual agreements. Tools may include publicly available software licenses (namely plotly.js and cytoscape.js).


What is a Canonical Text Service?

The Canonical Text Services protocol defines interaction between a client and server providing identification of texts and retrieval of canonically cited passages of texts. The official specifications by David Neel Smith and Christopher Blackwell can be found here. To put it relatively simple: CTS serves text passages that are specified by URN like references. It is specified in a way that allows to create CTS URNs for any possible text passage in a document. The data can be requested using GET requests that are provided in an URL. Each request must contain one parameter request which specifies the CTS function to use. Function specific parameters - like the URN - are added as additional GET parameters.

Is the implementation feature complete?

Subpassage notation, GetPassagePlus and error messages are missing but will soon be implemented as well as a lot of additional features that extend the CTS protocol (e.g. license management on passage request level). See this dissertation for more information about what is planned.

How about data persistency and versioning? Can I reliably cite text passages via URNs or can the text content change?

CTS URNs are meant to be persistent references. However, mistakes and improvements happen and structure markup can change if documents are still edited. There is no clear solution for this problem but some kind of versioning will be implemented (e.g. numbered updates). Text corpora that are still worked on are marked with (!) in the above table. Generally CTS URNs can be considered safe for citation purposes.

How reliable is this service? Will you monetize it once people depend on it?

The server is financed privately and I am using these webservices for my own programming work and research. The software is open source and can be recreated by anyone. It is planned to implement CTS Cloning, which will allow decentralized distributed backups for texts once they got "CTSified"; this will eliminate any dependency on individual servers as it will allow anyone to mix and host their own data instances. Monetizing this service will not be neccessary and would be counter productive for me personally.