About Coquery

Features

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.

../_images/showcase.png

Corpora

  • Use the corpus manager to install one of the supported corpora
  • Build your own corpus from PDF, MS Word, OpenDocument, HTML, or plain text files
  • Filter your query for example by year, genre, or speaker gender
  • Choose which corpus features will be included in your query results
  • View every token that matches your query within its context

Queries

  • Match tokens by orthography, phonetic transcription, lemma, or gloss, and restrict your query by part-of-speech
  • Use string functions e.g. to test if a token contains a letter sequence
  • Use the same query syntax for all installed corpora
  • Automate queries by reading them from an input file
  • Store your results as CSV files or Praat TextGrid files (for time-annotated corpora)

Analysis

  • Summarize the query results as frequency or contingency tables
  • Create a G-test matrix for query results to detect statistically significant differences
  • Run statistical tests of independence, and estimate the effect sizes
  • Calculate entropies and relative or normalized frequencies
  • Fetch collocations, and calculate association statistics like mutual information scores or conditional probabilities

Visualizations

  • Use bar charts, heat maps, or bubble charts to visualize frequency distributions
  • Illustrate diachronic changes by using time series plots
  • Show the distribution of tokens within a corpus in a barcode or a beeswarm plot

Databases

  • Either use easy-to-use internal databases, or connect to a powerful MySQL server
  • Access large corpora on a MySQL server over the network
  • Link data tables from different corpora, e.g. to include phonetic transcriptions in a corpus that does not contain them.

Supported corpora

Coquery already has installers for the following linguistic corpora:

Note that in order to use these corpora, you first need to obtain the corpus data from the linked websites.

If you are missing a corpus from the list of supported corpora, you can either program a custom installer for your corpus, or you can contact the Coquery developer whether an installer for your corpus may be included in a future release of Coquery.

Development

See the Change log for changes.

License

Coquery is free software released under the terms of the GNU General Public License (version 3). This license gives you the freedom to use Coquery for any purpose. It also allows you to copy, modify, and redistribute the software for as long as the modified software is also licensed under the GNU GPL.