Practical Lessons from a Kaggle Competition
Tue Jan 30 2024
I recently participated in the Fast or Slow - Predict AI Model Runtime competition, and finished 14th of 617 (top 3%). You can view my solution writeup here, but this post isn’t a technical deep-dive. Instead, I want to talk about the end-to-end journey and some lessons that became salient over the course of the challenge.
The Problem
Given XLA representations of AI architectures and the compiler configurations for nodes in that architecture, rank the compiler configurations from best to worst in order of runtime. The XLA representation is effectively a computation graph. Each node represents a specific operation - a convolution, a parameter, a multiplication, a dot product, etc. Some of these nodes are configurable, and different compiler parameters can affect the efficiency of the operation.
If this all feels a bit vague or unclear, don’t worry - the technical aspects aren’t important.
Lesson 1: Explore the problem, not the domain*
* As long as you know the fundamentals
When you start a new project, there’s a lot you don’t know, and you’ll continually be tempted to keep diving into theory and papers forever if you’re not careful. The theory and research on graph regression is deep, and you could spend months here. It’s better to let the problem at hand guide your research, at least for a while, than to try to explore the whole domain yourself.
I did screw up here. The SOTA for this problem space were variations of Graph Neural Networks, and I had never used one before. I probably spent weeks diving into theoretical differences between different message passing blocks. It’s not that this was intrinsically a waste of time, but it wasn’t optimal. Luckily I caught myself - later than I should have, but I did nonetheless. Weeks had gone by; I now thoroughly understood several GNN variants, but I hadn’t really touched the problem yet. It turned out that I didn’t need a good chunk of what I had learned. I would have been able to refine what I was studying had I been simultaneously exploring the actual problem at hand. But I was so caught up by the idea that I didn’t understand the domain, I was afraid to touch the problem at all.
Once I began actually working on the problem, I found there was plenty I still didn’t know - but at least now I knew what the real gaps were. So long as you have the bare minimum understanding, which I certainly did, you’re better off DFSing on gaps as they arise.
I’d like to stress, though, that this advice works best for intermediate ML practitioners onwards. If you’re brand new, you will absolutely suffer if you totally skip the fundamentals.
Lesson 2: Data > Model
This is something that all ML specialists know in their hearts, but there is a certain gravity that fights against the best behaviour. Here’s an inconvenient truth:
- Manipulating the data is often kind of boring, but high leverage
- Manipulating the model is deeply interesting, but low leverage
Early on, I was an idiot. I did things I knew better than to do. My early models were learning, but they were learning poorly. The loss was wildly erratic and it would frequently jump up and “forget” what it had learned. I knew that the problem was occasionally the gradient was so big that it was destroying the descent process. So I fixed it by adjusting the model - LayerNorms and GeLu activations helped to stabilize the process. It was still erratic but it was less erratic.
It was only after I’d spent days trying to fix the model that I finally started adjusting the data. Playing with different methods of normalizing it, ensuring I understood the features and the feature distributions, all that jazz. I found some of the features were highly log-normal and my blanket shotgun Z-normalization approach didn’t handle these right. They were the root cause of my problem. I mean, of course they were. Of course the issue was in the data. I was just too blinded by the excitement of toying with a models innards that I put off investing the time in the obvious place.
Once I properly preprocessed the data, not only was the learning much more stable, but my scores, even with a relatively simple model, jumped up to a clean silver medal on the leaderboard. It took maybe an hour to make the code changes. An hour, after spending days playing with making the model more robust, trying fancy new techniques and methods. What a moron.
My next huge leap also came from (mostly) from the data - finding heuristic graph representations that let the model learn faster and helped me iterate faster.
Model changes aren’t useless, but they should never be the first stop. I eventually found certain model tweaks and novelties that pushed my score from the middle of Silver to the bottom of Gold. But these explorations should happen after you feel relatively sure you’ve squeezed your data for as much as it’s worth.
Note: Often, your model choice is going to dictate what sort of feature manipulations make sense. My graph augmentations only worked in the context of a downstream GNN, for instance. There’s no shame in saying “OK, this modeling approach is fine, let’s try another one” and going all the way back to the drawing board - if you have time.
Lesson 3: Run Toward Discomfort
This lesson applies to all things, not just tech.
While I am fairly experienced in machine learning, it had been a long time since I’d done a Kaggle competition. I was initially a little averse to checking out the discussion posts and notebooks that other competitors were posting. There was a gnawing feeling, not quite conscious, that using these posts was tantamount to admitting my own skills were insufficient. Fortunately, I had this experience many times before in other settings, and quickly recognized it for what it was: My ego begging me not to pop it. I will never hear it’s pleas.
I think the principle is at least a little typical and very human. “If I take lessons from another competitor, doesn’t that mean I’m not as good as them?”
- It doesn’t mean that, they might just know something you don’t.
- Actually maybe they are better. If so, good, that means you’re in the right place.
It’s tough to get great at something if you don’t have a lot of self-efficacy. A common byproduct of that self-efficacy is a reasonably big ego, and when you have a big ego, it can be a painful to recognize your insufficiencies. But it’s also the absolute, best way to grow as an individual.
If you ever find yourself as the most capable, smartest person around, one of two things have happened.
- You’ve become delusional, seek help.
- You’ve let your fear of discomfort crush your potential.
The discomfort you feel when you see someone amazing at what they do is a wonderful thing, because that discomfort is absolutely integral to lifting yourself up to those heights. The discomfort is both aspiration and envy - and moreover, it’s sign that you care about being great.
The trick is to run toward it.