How we built Memory at Mindsera
A behind-the-scenes look at how we designed a memory system for an AI journaling app.
Why memory
One of the most common pieces of feedback from Mindsera users in the past 12 months: their journal keeps forgetting important details. Their partner’s name, the goals they’ve set, the issues they’re facing.
We’ve had basic vector search (RAG) since the early days. It finds relevant entries based on similarity and pulls in content from there. Still useful, but there’s a lot of luck involved. With a hundred entries, the chance of RAG finding all the important details is quite low.
We also considered adding a simple memory textbox where users could fill it out themselves. But memory is one of those features that should work fully in the background, just like our brain. You don’t want to manually pick what to remember and what to forget.
So since early this year, we’ve been trying to crack real memory — a system that remembers what matters and forgets the noise.
What makes memory good
A well-functioning memory system comes down to two things:
- It must be accurate
- It must know when it’s needed
For general AI apps like ChatGPT and Claude, memory is a double-edged sword. Useful in some cases, but it over-indexes its answers in others. Earlier this year we were building Call Mode for Mindsera (a voice journaling feature with an AI agent), and for a month almost every AI question I asked Claude got tied back to Call Mode, even when it had nothing to do with it. Super annoying and counter-productive.
Memory often turns AI chatbots into a self-boosting loop, similar to social media these days. You only see a tiny slice of content hyper-relevant to you, and you lose sight of the bigger picture. When you ask about something, the response only considers your angle and you miss alternatives. I often find myself going to another AI provider that hasn’t built memory about me to get an unbiased opinion.
For Mindsera, we think about memory differently. Its main goal is to guide you, improve the quality of questions, and find unnoticed patterns between your current and past writing. We often write about problems without realizing we had similar issues a year ago. The goal is to help users see those patterns and act on them faster.
Learning from existing solutions
Before building anything, we did thorough testing on existing memory providers.
OpenClaw is an interesting example with its markdown-based memory system. It stores a single curated file (memory.md) for long-term facts plus daily append-only memory logs. Simple, and it works nicely for chat-based systems, but it only focuses on what’s currently relevant. There’s almost no logic for actively evolving memory.
Then there’s the SaaS players: Supermemory, Mem0, and similar tools. They promise advanced user profiles, memory graphs, and retrievals that get smarter with every interaction. After testing them with my own entries, it became clear they’re also built for regular chatbots. They stored basic info about me, but it felt sterile and boring.
Building our own memory
Taking the learnings from these systems, we decided our memory would have two building blocks: facts and categories.
Facts
For every entry, we generate hard facts that serve as the truth for that point in time. Simple, short sentences like “User went to university in Netherlands” that can be clearly deduced from the entry. Every entry usually produces 5–10 facts, so you end up with hundreds of small details over time.
Categories
Once facts are generated, we group them into categories. Categories are blocks of text put together from the facts. We settled on eight static ones: about me, preferences, people, work & career, goals & aspirations, health & habits, beliefs, and patterns.
Hardcoding them into groups helps us later pick only the relevant groups for a given question. To generate the categories, we considered two approaches:
- Chunk the facts into smaller time periods (e.g. monthly), generate result for each time period, and then have a prompt write final result from the monthly periods.
- Process all the facts in a single prompt
Oddly enough, option 2 produced much better results. Even though many facts are outdated, the model benefited from seeing all the context at once and could figure out what’s still relevant vs what isn’t. A year ago this would have epically failed, but with today’s thinking models, it’s near perfect.
Evolving memory
With categories in place, the core memory is done. But in a real life AI tool, constantly evolving memory is what really matters. So the question becomes: how do you append new facts and grow memory naturally?
We use two events: daily evolve and monthly cleanup.
Every night, a cronjob takes new entries per user and creates facts from them. Using those facts, we evolve the existing categories, which basically means merging the old categories with the new facts. This works for a while, but there’s a problem: LLMs are great at merging, but they don’t know what’s no longer relevant. We can’t just prompt them to “remove things that aren’t relevant anymore” without giving them context about those things.
To fix this, we run a monthly cleanup. We take all the existing facts and regenerate each category from scratch. This keeps the categories relevant and continuously filters out the noise.
Using memory
I’ve been journaling for the past 6 years, and reading my memories for the first time was magical. A perfect summary of your life, the people close to you, and how far you’ve come.
But while fun to read, the real challenge was including memory in our AI responses.
Including memory in AI responses
We can’t pass the entire memory into every prompt for two reasons:
- It over-indexes on non-relevant stuff
- It makes prompts slower and more expensive
To fix this, we use two methods to pick relevant categories before every prompt call:
- Similarity-based: we calculate vector similarity between the memory and the user’s text. If a category’s score crosses a threshold, we include it in the prompt.
- LLM-based: an LLM decides which categories are relevant.
This gives us precise control over how often and what parts of memory show up in which features.
User control
Whichever AI tool you use, it spells out “AI can make mistakes” and rightly so. For that reason, we added a feature where you can manually add, edit, or delete certain memory aspects, or wipe the whole shebang.
Also worth noting is that your memories are visible only to you in Mindsera.
In my own memory, about 98% was spot-on but a few smaller things had to be adjusted. For example, it used my bench press PR from 2024 (when I properly started going to the gym) as the current record. Since that was the only data point it had about my gym progress, it treated it as the source of truth. The more you journal, the more accurate memory becomes.
Reflections
The fun thing about building memory: there’s no right or wrong. Another AI app could build it entirely differently (the current hype is around graph-based memory) and it might work just as well. Your specific domain and your customers should define the exact memory architecture.
We launched memory for all Mindsera users in early April. We’re still measuring results, but I’d say the feeling is significantly more magical, and the concept of an “AI journal” really does go hand in hand with a well-functioning memory system.