Fantasdeck Intro: The Three 'S's

The First 'S': Sorting

Making sure that you're learning the right things at the right time involves three steps:
  1. Measure the difficulty of all of your content.
  2. Master the easier stuff first.
  3. Advance toward the harder stuff.
They're practically axioms of pedagogy. However, it's in following these axioms where foreign-language curriculum developers often suck most. While many deep confusions go into your average foreign-language textbook, the most obvious is the attempt to teach the main elements of written and spoken language (phonemes, morphemes, lexemes, graphemes, and tagmemes) in an order like this:

The Sucky Sorting Order:

  1. Learn most or all of the phonemes and graphemes first.
  2. Learn the remaining units in the following order: tagmeme → morpheme/lexeme.
  3. Increase the number of tagmemes.
  4. From (1~3), learn more morphemes and lexemes.
  5. From (1~4), learn the language (hopefully)!
"There's got to be a better way!"
This method basically guarantees that you'll never really master proper pronunciation or decoding while you're being drilled on increasingly difficult grammar points and vocabulary items. Few adult learners can make it through this balancing act. (Also, good luck retaining those Chinese graphemes!)

That's not to mention the basic lack of repeated exposure in any of these steps. Unless you have a perfect memory, you're better off doing nothing than you will be by following such a sucky learning order. At least you'll save yourself the frustration of not having learned the language "because it was too hard." Here's the secret, though: It's not you; it's them.

Publishers of foreign-language educational materials laugh your frustration to the bank. In other words, as long as you're willing to blame yourself, someone else profits by selling inadequate materials. But, really, what are these companies going to do? Actually teach you a language? How does that business model survive?

Back to the topic at hand, there is one way out of this balancing act. That is to exercise measurable constraint. It makes sense to learn a target language by studying an artificially limited part of it, but why not set those constraints to things that can actually be calculated? There's virtually no reliable way of measuring which tagmemes are more common across all languages. However, it is plainly doable with graphemes. And, with this shift, the remaining order in which one learns a foreign language progresses more naturally.

A Fantastic Learning Order:

  1. Learn basic units in the following order: grapheme → phoneme. 
  2. Increase the number of graphemes. 
  3. From (1~2), learn more phonemes.
  4. From (1~3), learn more morphemes and lexemes. 
  5. From (1~4), learn more tagmemes. 
  6. From (1~5), learn the language!
Even when the written language is logographic instead of phonemic (like Chinese is), the number of phonemes that you first learn is still constrained and ordered. Thus, every language that we practice in this way lets us explore combinations of sounds in words and of words in sentences. At first, those combinations are limited, but they gradually expand to cover the whole language as we introduce new ones. (The only real challenge lies in determining the order in which a language's graphemes should be introduced.)

However, a big problem can arise if we only sort graphemes and start adding data. We could end up with infinitely many entries (words and sentences) before we ever reach the next grapheme. Thankfully, the next two 'S's save our skin.

The Second 'S': Splitting

Languages can do quite a lot with quite a little. Even with measurable constraints, people can write whole novels without using the most common vowel, and many can convincingly speak a language fluently despite not knowing some basic words. Corpus linguists and logicians (of which I am both) know this all too well. Just toying with basic syntax (i.e., tagmemes) alone, we can combine them to make sentences that even native speakers have trouble understanding; for instance:
  • That rat that the cat that the dog chased ate died.
Now, most people would attempt to cover parts of the sentence to make sense of the whole thing. That's a good tactic when it's your native language. But, what if it's not? What if the sentence instead read:
  • 被狗追的貓吃的那隻老鼠死了。
Unless you already know the language, you won't know what to cover where. That's where Splitting comes in:
  1. That rat that the cat that the dog chased ate died.
  2. That rat died.
    The cat that the dog chased ate that rat.
  3. That rat died.
    The cat ate that rat.
    The dog chased the cat.
  1. 被狗追的貓吃的那隻老鼠死了。
  2. 那隻老鼠死了。
  3. 那隻老鼠死了。
Making each piece more manageable makes the larger whole easier to acquire. And, with further depth, we can see where and how the language is put together (understand its tagmemes).

It can also lead us to question parts that don't fit perfectly. For instance, "貓被狗追了" in (3) doesn't translate directly to "The dog chased the cat"; it more directly translates to, "The cat was chased by the dog." However, rather than having you do that, I fill these details in for you with transliterations that remain grammatical in the source language (English in this case).

This is especially true when parts are elliptical. Native English speakers know that these two sentences say the same thing:
  1. That rat that the cat that the dog chased ate died.
  1. That rat the cat the dog chased ate died.
English learners don't, however. Without knowing what's not there, they'll have little hope of knowing what we, as natives, insert there. I resolve this in the Splitting process with one rule: The target language's elliptical parts remain elliptical if, and only if they are also elliptical in the source language. In all other cases, I fill the ellipses and create a maximally natural transliteration from it before offering a translation. This process makes understanding longer and harder parts of the language less daunting, and it allows you to learn the syntax of the language without explicit grammar instruction.

In other words, the point of Splitting is: I break the language down; you put it back together.

But, do you have to learn five split sentences just to learn the sixth? Maybe. It depends on how distinct they are, both apart and together. And that is where we come our third 'S'.

The Third 'S': Saturation

There are way too many sentences of a language to even try to list every one. In fact, it's mathematically impossible to do so! So, when does this Splitting and Sorting end? When will a sentence database finally be "complete?"

To answer this question, I have produced an algorithm (called "the Filter") that measures the distinctness of a sentence against everything that comes before it in the sorted order. If the Filter determines that a sentence is distinct enough, it means that it is worth trying to split it into pieces that are (a) morphemes or lexemes, (b) idiomatic sentences, or (c) distinct sentences in themselves, and add them. If a sentence is not distinct enough, I delete it. Adding smaller pieces makes it harder for the larger whole to also be added, but sometimes they make it, too. Other times, the larger whole is the only part that makes it.

The Filter gives due respect to the learner. If you understand, "I like her," and, "I saw him at the park," you probably don't need to be shown, "I saw her at the park," or, "I like him." Your brain will do that for you. The Filter helps you avoid redundant learning. Besides, I run hundreds of thousands of sentences through the Filter for each language. There are plenty of unique sentences out there to find and to build. There's no need to force one in when so many others are waiting.

However, the main purpose of the Filter is to bring every database to Saturation. Saturation is the state when no more sufficiently distinct sentences from a corpus of text need to be considered for Splitting. There is an actual, mathematical definition for this, as well. But, what you need to know is this: The point of database is to help you acquire enough of a language to not need a database. You don't want to spend your whole life using flashcards or apps, no matter how sophisticated they are. Your goal is to have no need for them because you'll be fluent and literate in the language. That's why Saturation is so important. It mathematically decides a legitimate end to your study.

Taking only the six sentences above as an example, we can see the Filter in action:

Sorted Only:

  • 貓被狗追了。
  • 那隻老鼠死了。
  • 那隻老鼠死了。
  • 貓吃了那隻老鼠。
  • 被狗追的貓吃了那隻老鼠。
  • 被狗追的貓吃的那隻老鼠死了。

Sorted and Filtered:

  • 貓被狗追了。 
  • 那隻老鼠死了。 
  • 被狗追的貓吃的那隻老鼠死了。
As we can see, only half of the six sentences got through the Filter. In a much larger database, however, many novel sentences can provide the tagmemes that the Filter filtered out here. And that brings us back to the beauty of measurable constraint: You acquire a great mass of language by patiently growing the pool of its parts. Learners need only be honest in their answers and be willing to practice routinely. Practice with sorted, split, and saturated data does the rest.

No comments:

Post a Comment