I didn't make it past my first interview.
This first interview was one hour long and was purely focused on how to programe GPUs to push the GPU to its limits. All of my experience is on NVIDIA GPUs, so just talked about my CUDA and PTX programming experience.
There were no coding questions on this first interview.
Although I felt like I severely impressed my interviewer, I did not make it past the first round, as there were better candidates, so competition must be pretty fierce.
The interviewer was nice and friendly.
Interview questions [1]
Question 1
Provide an example of a kernel that reached high Tensor Core utilization. How did you accomplish it.
How would you get the roofline graph of a kernel (wether the kernel is compute-bound or memory-bound).
Provide me an example where PTX programming is necessary (I felt like I really impressed my interviewer here by talking about the many PTX exclusive instructions available on NVIDIA GPUs).
Do you have experience coding GPU specialised algorithms like FlashAttention?
Why are vectorised loads better than non-vectorised loads (they use 128-bit load instruction).
Etc...
Just really focused on proving my knowledge of how GPUs work and how to programme them.