Note: This article was updated in July 2024 to reflect that I’ve adopted LLMs to handle transcription errors much more efficiently.
Journaling - by which I mean keeping a personal diary - didn’t come naturally to me. I rarely miss a day now and have found it useful both in the moment and as a resource to return to. The key has been to reduce friction and to go all-in on voice.
How It Started
In late 2020, during “the pandemic”, I created a new Google Doc and typed:
2020-12-06 9:30pm
First session, after the initial measurement a week ago. I think mostly rested except maybe close to the hip; definitely tired so not in best place for training. 6kg/side for Tailor, 6kg for the rest. Horse Stance hurts at the *top*? Had about an 80cm heel-to-heel width, getting thighs close-ish to parallel to the ground. [...]
A few times per week, I would add similar entries, with varying degrees of sparsity, on my workouts. Eventually this stopped and picked up again a few months later, in February 2023, stating,
Haven’t been using this for a long time, going to use it as a personal log for everything now. I’ve been doing that for work for well over a year now and works great.
That’s right, I got into journaling through work, where it’s evidently useful (took me long enough). As promised in that entry the habit did stick in private life too, so I can forever date treasured memories such as this one:
Ugh sliced my left index finger w bread knife
or the below, variations of which are sadly a commonplace occurrence,
Just really crap eating basically, finished the day with chocolate followed by Magnum followed by chocolate again
But I remember that it was difficult, for a long time, to maintain the habit. It felt like a chore, one in which I was relying on willpower as a strategy. The need to type these entries out - often in bed on the phone after the kids were in bed and I was more than ready for the same - discouraged verbosity and effectively rendered many entries impersonal and terse.
That was fine at work, but if one my kids did something new and cute one day or I’d had a big fight, the diary would neither serve as a prompt to reflect on this nor would it leave much of an account of what actually transpired. My personal journal was useful, but left a lot to be desired.
Things got noticeably better when voice typing on Android improved (and I realized that disabling multi-language would significantly improve transcription accuracy and latency). I could dictate my diary while reading along on the screen and correcting any errors. This made entries significantly more verbose and their content more conversational, allowing me to more easily reflect on the happenings of the day, at least on some days.
But it still wasn’t good enough — too much friction, and lots of transcription errors slipping through, requiring eyes on the screen and yet resulting in entries that are tricky to decipher even today. When I took an extended leave from work this past year, I decided to revisit once again and looked into third-party transcription services.
Not wanting to host any infrastructure myself, I found audionotes.ai, whose free tier was enough to experiment with. I’d transcribe on the app, and would copy over into the doc. The output was a lot better - speaking clearly is required but transcription errors occur mostly for names. Also, transcription not occurring in real time surprisingly ended up being better: I would pay less attention to the phone while journaling.
At some point, they added a WhatsApp bot, which is such a little change from opening the app, but it significantly reduced (perceived) friction for me.
This brings us to…
How It’s Going
Every night, or rarely multiple times during a day, I’ll excuse myself for a few minutes and find a relatively quiet, private area where I record a WhatsApp voice message to the audionotes bot. As of recent updates, voice messages can be paused while recording, which means I can record these while walking outside, which I often do, despite occasional loud cars or passersby. I have an audionotes lifetime membership, to avoid limitations on payload sizes.
I speak for usually five to ten minutes, during which I try to reflect on the day on top of chronicling it. The transcription comes back usually within a minute of submitting the message.
Transcriptions are not perfect, and in particular names or foreign words are often mangled. After all, how would the model know that my daughter is spelled “Ajuna” but pronounced Ah-yoo-nah? I used to correct this manually, which was a drag and punishing especially once a recording went past the five minute mark; you’d correct the same things over and over.
Luckily, LLMs with longer context windows came along (starting with Claude 3 Opus). Now I have a long thread with Claude (at the time of writing, 3.5 Sonnet) into which I paste the transcription (following the prompt “here’s the next entry:”), resulting in an AI-edited artifact. I glance over that one and if something looks off, I’ll send a follow-up message (“It’s Kathrin not Catherine”) and the model will reproduce the fixed entry - but note that each kind of mistake only has to be corrected once, since I am using a single conversation1 for all journal entries, and so corrections made in one are remembered for all future entries (until it may one day need refreshing, after falling out of the context window). Once I’m satisfied, I copy-paste the result at the top of my Google Doc, under a dated heading.
Occasionally, I’ll still type shorter entries directly on the phone or the laptop, but the above has become the default.
And that’s it! Journaling has turned from a chore that I did because I knew it was worth it to a part of my evening wind-down, and the fidelity has improved significantly in the process.
As for using the diary - its primary use for me is in reflection, but I also used it to look stuff up. For example, if on a recent phone call you told me that your significant other was mad at their boss the other day and might quit, I probably jotted it down here, because this stuff matters once we speak again and I am otherwise poor at remembering it. The large context windows mean you can get passable answers by directly copy-pasting the entire thing into the prompt and asking questions. But more often than not, I’ll just search for things directly in Google Docs.
I assume “diary-like” functionality will be a part of future generations of mobile phone assistants (assuming “pendants” don’t take that market, which I assume they won’t). I’ve noticed that the current-gen Pixel phones already transcribe conversations (even with multiple speakers), and do it very well.
If you found this post useful or think there’s something I should try, I would be happy to hear about it.
if you use a “custom GPT” or the like, I don’t know how to make the error corrections persist - I don’t want to have to edit the GPT every time I spot a transcription errors, as this is a more cumbersome process than just replying inline.