My current outlook on LLMs is that they are some combination of bullshit to fool people who are looking to be fooled, and a modest but potentially very important improvement in the capacity to search large corpuses of text in response to uncontroversial natural-language queries and automatically summarize the results. Beyond this, I think they’re massively overhyped. The most aggressive hype is that they are an AGI development project - in other words, that they’re close to being conscious, generative minds on the same order as ours, which can do as wide a range of tasks as a human. This is clearly false. The more moderate hype is that they can do meaningful generative work within the domain where they were trained: written language content (which can of course be converted to and from audio language content pretty well). For instance, they might in some limited sense be able to internally represent the content of the language they're indexing and reproducing. This would necessarily entail the capacity for "regular expressions for natural language." I believe that even this much more limited characterization is false, but I am less confident in this case, and there are capacities they could demonstrate that would change my mind. Language learning software seems like a good example. It seems to me that if LLMs contain anything remotely like the capacity of regular expressions for natural language that take into account the semantic values of words, they should make it relatively easy to create a language learning app that is strictly better than the best existing automated resources for smartphone users trying to learn the basics of a new-to-them language.
The consensus recommendations for a way to learn the very basics of a spoken language with relatively low time investment - filling the gap that another audiobook or podcast might fill - seem to be the Pimsleur or Paul Noble audio courses, both of which I've tried. They satisfy the following desiderata:
Not a phrasebook: New words and grammatical forms are introduced and explained in a logical series, so that later learning builds on earlier learning, and each incremental package of information is as small as possible.
No nonsense: Words are combined into sentences that make sense, and sentences are eventually combined in ways that are contextually appropriate. For example, the user should never be asked to form the sentence “the elephant is taking a shower,” except in specific contexts that make that sentence an exceptionally likely one. (Duolinguo fails this criterion.)
Reuse: Already-learned words are repeated in new contexts and combinations (flashcards fail this criterion), which helps with:
Spaced repetition: At first, a new word is used several times in a relatively short interval. Then it’s occasionally brought up again, often enough to make it easy to retain material at minimal review cost.
Prioritization: Common and simple words come first, and ones that the user is most likely to need even as a very basic speaker (e.g. times of day, and words a tourist needs, about meals and hotels).
The main limit of the Pimsleur and Paul Noble courses is that they are static. This means that they can’t adapt to the learner’s particular needs or conditions. Making an app interactive increases its complexity and thus the difficulty of producing it at a given level of quality. Most popular interactive language app developers have responded to this problem by reducing the complexity of the material presented to the user, so their apps frequently do not even satisfy all of the above criteria. My friend Micah and his cofounder Ofir created a program, LanguageZen, that satisfies the above desiderata, and additionally uses automation to generate new material with these additional virtues:
Automatic adaptive prioritization: The program evaluates the learner’s responses, identifies which specific words or grammatical concepts they’re having trouble with, and prioritizes these for more frequent review.
Specialized content libraries: They built a variety of libraries of topic-specific material that the user can select from depending on their needs and interests (e.g. ordering in restaurants, business language, etc.), which are then integrated with what the user has already learned.
LanguageZen was initially developed on a scrappy startup budget, and the team built two excellent products: Spanish for English speakers, and English for Portuguese speakers. But their development effort necessarily involved the up-front capital cost of hiring skilled linguists to shape the material, and because not everyone wants to learn the same language, two language offerings were simply not enough to take off virally, since friends could only effectively recommend LanguageZen to friends who wanted to learn the same language. (By contrast, someone who likes Duolinguo for German can recommend it to their friend who wants to learn French or Hebrew or Chinese, not just their friend who wants to learn German.) So while their product was good enough to attract and retain a significant user base for their product, the project won't take off until and unless investors step up to help them over that hurdle.
But if LLMs can meaningfully and usefully generate new structured language material, they should make it much easier not only to extent the capacities of LanguageZen into new languages and expand its static content libraries, but to implement the following improvements:
Adapting spaced repetition to interruptions in usage: Even without parsing the user’s responses (which would make this robust to difficult audio conditions), if the reader rewinds or pauses on some answers, the app should be able to infer that the user is having some difficulty with the relevant material, and dynamically generate new content that repeats those words or grammatical forms sooner than the default. Likewise, if the user takes a break for a few days, weeks, or months, the ratio of old to new material should automatically adjust accordingly, as forgetting is more likely, especially of relatively new material. (And of course with text to speech, an interactive app that interpreted responses from the user could and should be able to replicate LanguageZen’s ability to specifically identify (and explain) which part of a user’s response was incorrect, and why, and use this information to adjust the schedule on which material is reviewed or introduced.)
Automatic customization of content through passive listening: I should be able to turn the app onto “listen” mode during a conversation with speakers of a foreign language. For instance, I study Tai Chi with some Chinese speakers, few of whom speak much English. So my teacher has limited ability to instruct me verbally, and I can’t follow much of the conversation when I break for lunch. I should be able to set the app to “listen” mode, and it should be able to identify words and concepts that come up frequently in such conversations, and related words and concepts, in order to generate new material that introduces these, with timing and context that satisfies all the above criteria, without retaining a transcript or recording of those conversations (to satisfy privacy concerns).
Specifically, a rules-based system tracking the above considerations could detect the need to insert additional content into the sequence based on the above considerations, and instruct an LLM to generate that content within well-specified parameters. For instance, it might give the LLM a prompt equivalent to "generate twenty sentences, limited to [range of grammatical forms] and [list of already-learned vocabulary], all of which use at least one word from [list of prioritized words], with at least one word from [list of prioritized words] in each sentence." Then it could implement some mixture of asking the user to form those sentences in the target language, and asking the user to translate those sentences from the target language. More complex requests like constructing short conversations may also be feasible.
My current impression is that current AI technology is simply not good enough to implement a high-quality version of this product, between two commonly spoken languages with large text corpuses, without a huge time investment from experts carefully shaping and vetting its material and effectively curating static topic libraries within which the automation could at best make minor or highly supervised, human-in-the-loop variations. Someone might be able to make a lot of money changing my mind.
Pingback: AI #66: Oh to Be Less Online | Don't Worry About the Vase
Are Micah and Ofir currently looking to raise capital, and if so, can you connect me with them? You can message me at the throwaway email [REDACTED] and I'll give you my real email. Thanks!
Thank you, emailed
Pingback: AI #66: Oh to Be Less Online