Languages 101: Alphabet soup

10 Tuesday Jul 2012

Posted by polyglossic in Languages 101, writing

Tags

alphabets, Arabic, languages, linguistics, Native American, writing, writing systems

I may have mentioned this before, but I love writing systems. From a purely aesthetic perspective, I think writing systems are beautiful and I could stare at them all day. From an intellectual perspective, I am fascinated by how writing systems encode language, how writers choose to organize their thoughts, and how the originators of each system worked out how to represent their languages with their unique sounds.

Notice that I didn’t just say “I love alphabets,” which would be less of a mouthful to say. That’s because not all writing systems are, strictly speaking, alphabets. Writing systems come in several different types!

The first distinction to be made between writing systems are whether they are phonemic, meaning each character represents a sound or set of sounds, or logographic, meaning each character represents a word or idea. Chinese characters are logograms – they each represent a word or a meaningful part of the word. Of course, since languages are made up of many thousands of words, this means that Chinese readers and writers must learn thousands upon thousands of unique characters. But this also means that speakers of China’s many wildly different dialects can all read the same books and newspapers, even if they can’t understand each others’ speech – because the characters aren’t connected to pronunciation, speakers of Mandarin and Cantonese and Wu can all understand the idea represented no matter how different their pronunciation of that idea might be.

Phonemic systems are far more common, and come in a variety:

Alphabets – a true alphabet has characters (“letters”) for consonants and vowels. Sometimes sounds have to be represented using a combination of letters – like th or ay in English – but all of the sounds are explicitly written out in the script. The Roman alphabet that I’m typing in right now is a true alphabet; so are Cyrillic and Greek, for example.

Abjads – abjads are also called “consonant alphabets,” because they have characters for their consonants but infrequently or never mark vowels. Arabic is the abjad I’m currently waltzing with; long vowels are written out in Arabic and short vowels can be marked using diacritics, but usually these are left off in anything other than beginning Arabic textbooks. For someone used to true alphabets, these systems can be a bit like riding a bike without training wheels…does that say kataba? Kitabu? Kutba? Hebrew works this way as well.

Abugidas – these can also be called “alphasyllabaries”, and do behave like a hybrid between a true alphabet and a true syllabary. Consonants are represented, and bare consonants are assumed to be followed by a particular vowel unless another specific vowel is represented. Devanāgarī, the script used to write Sanskrit and many other Indian languages, works this way: consonants are followed by an /a/ sound unless otherwise marked, so if you see the characters r – m – y – n, you read “ramayana”. Cultures that were influenced by Sanskrit writing tend to have systems like this, even if they use a different script; Tibetan and Thai are examples.

सर्वे मानवाः स्वतन्त्राः समुत्पन्नाः वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समानाः एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्नाः सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।

Syllabaries – in a syllabary, each character represents a whole syllable, instead of a single sound. That means that the syllables ha, hi, and hu would each have a different character. Many orthographies developed for Native American languages are syllabaries (they happen to be well suited to their phonologies), with Cherokee being probably the most famous. The word “Cherokee” is our anglicized way of pronouncing the word tsa-la-gi; since that word is three syllables, it is three characters in the syllabary. One of the sets of characters used for Japanese is also a syllabary.

So that’s how writing systems are classified! Of course, as with anything else humans do, sometimes systems don’t fit into neat little categories – I realize I’ve really oversimplified Chinese, for example, but you get the gist.

Do you have a favorite script?

16 thoughts on “Languages 101: Alphabet soup”

Wayne Pearson (@Crwth) said:

July 10, 2012 at 12:37 pm

Writing systems are what drew me to linguistics as well. The old Collier’s Encyclopedia I grew up with had a chart showing the progression of the alphabet from old Etruscan to what we have today, and I was hooked.

One my favorite scripts is probably Balinese; I discovered it when researching the Unicode Standard and found that no one had developed a Unicode font for it. Poor, neglected script! So, I developed my own, and while doing so, fell in love with the flow of the characters.

I’ve also written an app for Windows Phone which allows me to browse many of the scripts in Unicode: . My five-year-old daughter, any time she sees my phone, asks if she can “play with your funny letters!”

Reply
- polyglossic said:
  
  July 10, 2012 at 2:46 pm
  
  Wayne,
  I had never looked at Balinese before…how gorgeous!! (Here is a sample for anyone else who’s curious.)
  
  I have zero knowledge of how Unicode fonts actually work. Was it hard to develop? I know that it was a big deal for the Cherokee Nation when they got one for their syllabary, and I’ve been talking with some people recently about the effects of compatibility (or lack thereof) of scripts for internet usage. In this age, it seems like not being usable on the computer could be the kiss of death for a writing system.
  
  I also don’t know much about the kinds of computer fonts that can be read by the different devices. Do Windows devices not usually read all Unicode fonts?
  
  Sorry for all the questions, you just know a lot about things that I’m clueless about but interested in!
  
  Reply
  - Wayne Pearson (@Crwth) said:
    
    July 11, 2012 at 2:35 pm
    
    A’s analogy below is pretty good regarding Unicode. The issue here is that while there are various Balinese font files using the non-standard encodings A mentioned, there are no TrueType ones that follow Unicode mapping.
    
    It wasn’t hard, but only because I kinda ‘stole’ the characters from somewhere else, and wrote software to create a font for me based on those. Because of that, I’m not able to release this font to the general public (the source of the stolen characters specifically says that you may not use them for such a purpose.)
    
    Cherokee does indeed have support in Unicode, and fonts are available for rendering it on computer. I just browsed it on my phone now! *:^)
Zen said:

July 10, 2012 at 4:41 pm

Arabic being my first language, I can write the letters pretty easily, but I always hear people saying that it looks more like squiggles than anything else, haha. The diacritics are especially tricky because many times they relate to grammar, and that’s something a lot of Arabic speakers still face difficulty with.

Reply
- polyglossic said:
  
  July 10, 2012 at 4:57 pm
  
  The script is one of the main reasons I wanted to learn Arabic! They’re beautiful squiggles 😉
  
  I wonder what learning to read Arabic as a first language is like. When you were little and learning to read, did they mark the vowels everywhere for you like they do for second-language Arabic learners? Or do they already assume you know what the words are?
  
  Reply
  - Zen said:
    
    July 10, 2012 at 5:09 pm
    
    I actually don’t remember, haha. I do know that I used to read a lot when I was a kid, and most of the time I was a step ahead of my classmates.
    
    Though I have noticed with my brother (first grader) that they emphasize the vowels and diacritics a lot in their reading lessons, and have several reading exercises just for practicing correct pronunciation.
  - polyglossic said:
    
    July 10, 2012 at 5:20 pm
    
    I just thought of another question that I never thought to ask before now. Do they teach you to read in MSA or in your own dialect?
    
    Sorry for all my questions 🙂
  - Zen said:
    
    July 10, 2012 at 5:27 pm
    
    No worries! All students are taught to read in MSA. Some dialects are difficult to understand… for example, I can never for the life of me get what Tunisians and Moroccans are trying to say; so the written language always employs MSA to at least have one understandable means of communications between Arab countries.
A said:

July 10, 2012 at 11:16 pm

I think part of what drew me to Japanese was the writing system. And the more I learned about the language, the more I appreciated the total mishmash of different scripts. I can’t think of any other language where it’s totally normal to find four different ones (Kanji, hiragana, katakana, and Roman letters) in a single sentence. And I found the inflectional system — Kanji root character with hiragana endings, as in 食べます– to be especially elegant.

Also, I think it should be “abugida,” not “abudiga,” I can always remember this because the letters loosely follow the order A-B-C-D, just like “abjad.”

As for your Unicode question: there are a couple separate things going on here. Here’s an analogy that might help:

Consider a radio group that owns a certain range of frequencies. A station needs a certain piece of that range to express itself. Some stations don’t need many frequencies, but some need a lot. But even if these stations buy some frequencies, people can only hear them if they have the right radios. And since some frequencies are rare, it’s not always worth the money to make a new radio to support them.

The radio group is the Unicode Consortium, the frequencies are “byte codes,” the stations are different scripts, and the radios are different typefaces (also known as “fonts”). Even if a script has a reserved range of byte codes, it might not be supported by a Unicode typeface, especially if the script was added recently or if the typeface was created by a small company without many funds.

In the days before Unicode, there was pretty much just a bunch of ham radio. Some old sites still use pre-Unicode encodings that modern computers don’t understand. You can still use non-standard encodings for different scripts, but without a standard scheme, it’s extremely unlikely that a script will thrive in the digital world.

Reply
- polyglossic said:
  
  July 11, 2012 at 9:37 am
  
  You are absolutely right about the spelling of that word! I have been transposing those letters, both in spelling and in pronunciation, apparently forever. I guess this is a demonstration of how sometimes the brain only sees what it thinks is there and not what’s really there. 🙂 I’ve corrected it now.
  
  I didn’t know that Japanese might use different scripts in the same word, for some reason I thought they would use each one for different contexts. Does that mixing happen frequently or just in specific circumstances, like inflectional endings?
  
  Reply
  - A said:
    
    July 11, 2012 at 11:21 pm
    
    I’m not sure; it’s been a while since I studied Japanese. But as far as I can recall, it’s mainly related to inflectional endings, mainly for verbs and adjectives.
    
    Most nouns are written solely in Kanji, but some are written in just hiragana and some use a mix of both (e.g. 折り紙 “origami”). Incidentally, note that the “ori” part of “origami” is itself a modified verb, where 折 is the root sense (“fold”) and り (“ri”) is the inflectional ending.
    
    I’ve heard that katakana can be used instead of hiragana for the sake of emphasis. But usually it’s just used for onomatopoeia, loanwords, and neologisms.
Joshua Chandler said:

July 12, 2012 at 3:12 pm

Greek, Arabic and Hebrew scripts all look amazing! I don’t know if I could choose a favorite from among those… xD but I also like the swedish and danish/norwegian alphabets. That Cherokee syllabary looks interesting! Is there an equivalent to lower case / upper case in Cherokee or do all the syllables have one size?

Reply
- polyglossic said:
  
  July 12, 2012 at 3:53 pm
  
  Joshua,
  All of the Cherokee characters are the same size. I think capitalizing certain letters or characters is actually not the norm among all of the writing systems of the world, at least in terms of just a tally of the systems. (I’m going to read up on that though and get back to you if I’m wrong!)
  
  Reply
Joshua Chandler said:

July 12, 2012 at 3:17 pm

Oh yes, also, I like the runic scripts adopted by the old Germanic cultures :3 I’ve met a neo-pagan or two that write in runes on occasion!

Reply
- polyglossic said:
  
  July 12, 2012 at 4:18 pm
  
  Oooo runes are pretty fascinating too. I hadn’t thought about those in a while!
  
  Not far from where I grew up we have our own little runic mystery. When I was a little kid I thought it was the coolest thing in the world that Vikings had explored Oklahoma…turns out it’s probably not true. But it was a neat thought 🙂
  
  Reply
Molly said:

July 15, 2012 at 8:04 pm

I, too, love me some writing systems – yet another impressive demonstration of just how boundless and ever-varying languages can be! As for a favorite… well, I’m fickle. I’m currently taken with the elegance of Devanagari, but I have no doubt that I’ll easily expand my affections to a new writing system, whenever I start working on another new language. 🙂

Reply