A couple months ago, our team finished up our own in-house handwriting recognition model, after more than 3 years of research and work. Features like handwritten search and converting handwriting to text are now powered fully by Goodnotes’ own technology.
We sat down for a chat with Angus, one of the machine learning engineers that worked to make this happen, to learn more about the work that went into shipping this technology.
Hi Angus! Thanks for having a chat with us. First things first, how would you explain machine learning to a five-year old?
Machine learning is the process of teaching computers to perform tasks that would usually only be possible for humans. We achieve this by feeding the computer with examples - lots and lots of examples - kind of like how humans learn stuff!
Consider teaching a computer to label objects in photos. We would give it millions of photos containing cars, dogs, doctors, apricots, and so on. After learning patterns to classify between them, the computer gains the ability to “guess” which of them is in new photos.
While that’s pretty neat, scientists have taken this idea further and have come a really long way in teaching machines to recognize patterns and make predictions based on data. Just 7 years ago, a computer beat a world champion at Go, a game that was once thought impossible for a machine to master, by having a computer play itself repeatedly, simulating millions of real games. And today, we're teaching machines to create beautiful movies all by themselves - no humans needed!
PS. And just for fun, did you know that one of the first tests that early machine learning algorithms needed to pass was to distinguish between pictures of chihuahuas and muffins?
What does an average day for a machine learning engineer look like at GoodNotes?
How an engineer at GoodNotes spends their time is unique to their project and their working style. I currently own two projects that span multiple teams, so 60% of my time is spent in meetings, writing plans across team members, reviewing proposals and code, or giving feedback. On the other hand, an engineer who is currently focused on a single module within a project spends more time writing code and technical plans.
That being said, here’s what an average day might look like in the company right now. In the morning, you plan your tasks for the day, check in on Slack for company- or team-wide updates, do some code reviews (or improve your own pull request based on the reviews received), or read a new ML paper. Time permitting, you get started with your main task of the day, be it whipping out a technical proposal or starting to investigate a new model.
Over a long lunch and coffee, you unwind a bit, but often end up organically discussing the project with your peers, or a newly released large language model. After lunch, you shift gears to the meat of your project, churning out code and iteratively refining it. This often involves training a ML model, which may take up to 2 days, so we try to make sure we do the best preparation work we can (eg. optimizing hyperparameters, setting up the right logging and configuration).
If you’re in Hong Kong, you then usually reach the time for cross-functional meetings, reflecting on the progress and planning targets for the rest of the sprint. (If you’re in Europe, this happens in the morning). After the meetings, you continue to sprint toward the same goal you’ve set for a couple more hours: investigating another model architecture, having an ad-hoc chat with a colleague about memory optimization, or opening a pull request for your work. ML engineers often fail to take their minds off the models they are training, so we periodically pull up ClearML or TensorBoard to monitor the training progress even after getting home.
Tell us more about how you and the team shipped the handwriting recognition feature?
The ML team at GoodNotes is proud to have shipped our in-house handwriting recognition engine at the end of 2022, which now supports 12 languages and powers millions of GoodNotes users. This was a highly complex, truly full-stack, and 3-year long project that occupied a full 10-person ML team. We needed help from external contractors to collect training data and continually assess the quality of our models. In-house, we staffed a Data Operations specialist to supervise the data verification process, 2 researchers to investigate data- and model-side techniques to improve the core algorithm, and at least 4 full-stack ML engineers and an additional iOS engineer responsible for deploying the models to our production environment and monitoring their performance. As with most other product launches, our in-house handwriting recognition engine was rolled out in phases, from private beta (a limited subset of users to collect feedback), to controlled experimentation for 10% of users, to eventually now powering 100% of users.
We value handwriting recognition so much because it forms the backbone for the big plans we have in store. We recently shipped automatic language detection for handwritten notes, which is a big win for foreign language learning.
What were the biggest challenges that the team faced while working on handwriting recognition?
Machine learning engineering is often portrayed as a siloed art where the practitioner spends day and night fine-tuning the most accurate model. Yet in reality, only around 10% of our time goes into this model development stage. If you’re an ML engineer, you’ll know that any ML launch requires an end-to-end pipeline spanning data collection, feature extraction, model development and evaluation, monitoring, and testing. In my opinion, three unique challenges arose from the necessity of this pipeline.
First, striking a balance between engineering and research work. For instance, the team had to find a model size that was fast and small enough to run fully on-device, while still being >95% accurate. We also had to set up stringent tests to prevent discrepancies from emerging between the research and production environments.
Second, every engineering team in a fast-growing startup inevitably faces organizational challenges, especially for our team which was spread across Hong Kong, Spain, Russia, and Japan! As the adage goes, there can never be a “best” process in Agile software development. Rather, we have to constantly iterate on and learn about what works best for us in terms of the frequency and format of retrospectives, tooling for planning and tracking work, and efficient cross-functional communication.
Finally, at GoodNotes we deeply believe in the value of stopping periodically after each iteration to evaluate whether the team is headed in the right direction. This was difficult, but extra important for such a complex project. On one hand, we had the long-term ambition of building the world's best handwriting recognition engine; on the other, we had to deal with product timelines and resource constraints and avoid going into rabbit holes of failed experiments. This introduced some real challenges in time allocation and prioritization, like determining whether some form of automation is worth spending time on (think CI/CD for model training), and being extra deliberate about setting up the right metrics for success.
What do you like most about working at GoodNotes?
While each of our Values at GoodNotes resonates deeply with me, two of them in particular make me love my job. Firstly, we “dream big”. More than being the best note-taking tool on iPad, we strive to fundamentally change the way people learn, collaborate, and educate through innovations in digital paper. Our culture encourages moonshots. As such, I feel empowered to propose and tackle “hard” problems in ML, such as information retrieval from your GoodNotes library, or recognition of hand-drawn doodles, music, or chemical diagrams, or automatic question generation from learning materials. Often, these ideas start from 3-person teams in our bimonthly hackathons, but end up materializing in the product roadmap and delighting users. Having a clear connection between my daily work and the company’s bigger ambitions is an immense source of motivation for me.
Closely related, we encourage “taking ownership”. We are a workplace that prioritizes relationships over process, and collective success over boundaries. As a consequence of our fast growth, each person in the company owns a big piece of the puzzle and has a direct stake in the outcome of their project. They are also given the autonomy to drive it to success the way they like (while being provided the opportunity to grow in the way they’d like). For instance, the opportunity to lead two cross-functional projects is challenging, but it offers me the opportunity to achieve a significant impact in the company’s roadmap, and build up trust and expertise among my team.
What advice do you have to individuals out there who might be curious about getting into machine learning?
There's no right way to get into machine learning. Part of that is because the possibilities in the field are endless! You might want to become a designer for AI products, a machine learning engineer, a data scientist, a founder. Or you might be aiming to become knowledgeable enough about how generative AI works to hold meaningful conversations in tech.
Also, some approaches may work better than others depending on your learning style. Some people learn best by doing, so if that's you, pick a domain you're interested in (stocks, sports, art, you name it) and participate in Kaggle competitions; do your own fun project like finding a dataset with annual NFL statistics and training a classifier to predict each year's winner, or train a sentiment classifier on stock market Tweets! With new generative AI platforms like Midjourney or ChatGPT, you may want to start with generating some AI art or fiction. And when you have a feel for its capabilities, you'll naturally want to know “how does it all work?”
Others (like myself) might be better off with a more principled, bottom-up approach. If that's more your style, there really is no better time than now to start. There are now well-presented, reliable resources available for different types of learners, including online courses, textbooks, blog posts, ChatGPT, and new papers from Google, OpenAI, or the open source community every week. Here are 3 resources that have helped me a lot personally:
- Andrew Ng’s Machine Learning Specialization provide the best overview of classical and modern machine learning
- The 3blue1brown YouTube channel is excellent at providing intuitive explanations of ML concepts
- The DeepMind x UCL Deep Learning Lecture Series is my favorite course for developing a foundational understanding of deep learning
The most important thing is to be open to trying and failing and to persevere. Failed experiments can be frustrating, but building a machine learning system end-to-end is incredibly rewarding. For those wanting to build a career in ML, I would encourage you focus on building a solid foundation by learning the basics and asking “why” all the time. I am grateful to have gotten into machine learning and I really think AI is the prime adventure of our time, so have fun!