Chinese Lexical Database (2016)


I created this Chinese Lexical Database (CLD). The CLD is a new large-scale lexical database for Mandarin Chinese that provides over 150 descriptive and lexical-distributional variables for more than 30,000 words in simplified Chinese. The information in the CLD can be used for the construction of experimental stimuli and the analysis of experimental data in psycholinguistic research on simplified Chinese. The CLD can be downloaded for free, and an online search interface is provided at


Taiwanese Southern Min Corpus (2009)


This is a tiny spoken corpus of Southern Min built by me and Dr. John Newman during the summer of 2009. All the spoken data are provided with audio files, transcription, POS tagging, and free translation. There are mistakes that need to be fixed. Hopefully, this corpus can be enlarged and improved in the near future.

Alexander von Humboldt Professorship introduction video