Home

NTI Chinese-English Dictionary and Buddhist Text Project

This is an open source digital humanities project combining source code and language and creative artifacts. The users of the software artifacts are people studying Chinese and Buddhism, translators, and researchers analyzing Buddhist texts either for religious practice, personal interest, or as an academic subject. There are three parts to this project: (1) dictionary and corpus data, (2) PHP web project, and (3) Python command line tools.

This GitHub project is intended for contributors and other who would like to reuse the resources included. If you simply need to use the dictionary and language tools use the web site chinesenotes.com. A preview of the redesigned web site is here.

Dictionary Data

It is located under the data directory. Under that you will find a directory for corpus data and a directory for dictionary data. The dictionary data can be either read as text files by the Python programs or loaded into the database for use on the web site. For details about the structure of the dictionary see the file dictionary.ddl. For details on loading the data into MySQL see the file dictionary-readme.txt. You do not need to load the dictionary data into a database in order to use the command line tools.

There are many illustrations for the dictionary, which are kept under the directory web/images. These are intended to illustrate specialist terms relating to symbols, characters, artwork, and deities.

Corpus Data

The corpus is the text that we want to analyze. It is mainly text from the Buddhist Canon and related text documents written in literary Chinese and Sanskrit in ancient and classical periods. Because of the sheer size and difficulty of translating it, most of the Chinese Mahayana Buddhist Canon has not yet been translated into any modern language. Besides the Canon there are many related historic documents. Besides the quantity of documents there are problems that arise from a myriad of ways that Sanskrit terms were translated into Chinese and understanding of the Sanskrit terms. Buddhist Sanskrit has a very different vocabulary to the variety of Sanskrit used in Hindu texts. Because of the complexity of Sanskrit grammar and extensibility with long compound words Sanskrit dictionaries cannot list all possible inflections and compounds. See the Resources section in the web site for details on the linguistic challenges.

Command Line Tools

If you want to do some text crunching on a local system, this is a good place to start. The tools give output to a command line and generate HTML. However, this is the least well developed part of the project. All you need to do is to download the project to your local machine and install Python 2.7, which is standard on Linux. The details of the tools are the readme-python.txt file.

Web Project

The web project gives a web interface for using the dictionary. The web pages can search the dictionary and navigate the corpus. To install the web project locally

Install the Apache HTTP server with PHP extension and MySQL. Your best bet is a LAMP stack.
Load the dictionary into the database. Follow the instructions in dictionary-readme.txt
Set the Apache HTTP server document root to the location of the web project.
Download the Prototype JS JavaScript Library and place it in the script directory under the web document root.
The web project has dependencies on the AngularJS JavaScript Library and Bootstrap. These are loaded from external locations so that it is not necessary for them to be checked into GitHub or downloaded to set up the web project. Use of Angular JS is going to replace Prototype JS eventually.

License

The license for the web site and dictionary content is Creative Commons Attribution-Share Alike 3.0. The license for source code and markup templates, is Apache 2.0. All materials are copyright Fo Guang Shan Nan Tien Institute (佛光山南天大學).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly