Language identification

The goal of language identification is to deduce or correctly guess the language of a certain text with statistical methods. It is also called language guessing (1).

Klingon has some unique characteristics which can be useful to identify the language (2):
  • the characters differ in capitalization: q and Q are different characters, D, I and S only appear capitalized, ch, tlh and v only appear non-capitalized.
  • the apostrophe appears relatively often, many times at the beginning or the end of a syllable (See MOVED TO... Phonology).
  • the apostrophe also appears in the middle of a word, sometimes as a double.
  • -be' and -'a' appear frequently as a suffix.

What Klingon does NOT have:
  • There are no letters f, k, x, z.
  • No word starts with a vowel.

guesslanguage.js(3) is an existing open source project, which enables the recognition of Klingon and additional languages via JavaScript. It works with statistical data on the base of n-grams.

See also

References

1 : Language identification on Wikipedia, retrieved 21 October 2016

2 : Wikipedia:Language recognition chart, retrieved 21 October 2016

3 : Project page on GitHub, retrieved 21 October 2016

External links

Category: General    Latest edit: 24 Jul 2017, by KlingonTeacher    Created: 24 Feb 2017 by DirkSchlSser



KlingonCourt
There exists a street named Klingon Court in Sacramento, California. Since ...
FluentSpeakers
Most estimates say that there are about 20 30 fluent speakers of Klingon ...
MultisyllabicVerbs
There are several verbs in Klingon consisting of more than one syllable ...

N E W :
WORD
DEFINITION
SECTION

click to enter

'xi.fan hol
'muz.gom

BETA VERSION

Day of the week:
Today is ghInjaj


starred Random Article


  • Search

Other languages:

flag_en.png English
flag_de.png Deutsch
flag_nl.png Create New Topic
flag_fr.png Create New Topic

Visit us on Facebook:

 
The Klingon Wiki - Teaching Klingon to the galaxy

All text is available under the terms of a Creative Commons License.
Star Trek™, Klingon™ and related names are trademarks of CBS Corporation and Paramount Pictures, and are used under "fair use" guidelines.