- An assembly question about a SIMD architecture that I cannot remember exactly.
- Given an array of points (x and y coordinates), implement a data structure (a heap) for insertion and deletion of n nearest ones.
- Implement a CUDA kernel that computes the frequencies of numbers given in an array
- Given a very large file containing integers, implement an algorithm that outputs sorted version. (also do space and time complexity analysis)
- Implement an optimized CUDA kernel for matrix transpose (also had discussions about GPU architecture)
- Present a system you designed and implemented, and explain the changes for desired features asked on the conversation.
- Various questions about deep learning like explanation of optimization algorithms, difference between gradient descent and stochastic gradient descent, underfitting, overfitting, gradient clipping and many more.