Technical Assessment (20 minutes)
LeetCode Question: Longest Palindromic Substring
Coding live where the interviewer can see your code on a shared editor.
Discussion and Questions (40 minutes)
Self-Introduction
Project Discussion: Explain a project related to LLM
LLM Creation: How is a Large Language Model created?
Model Architectures: Explain the different models and their architectures (e.g., GPT, LLAMA, Falcon, BLOOM)
Positional Embeddings: What is the purpose of positional embeddings?
Normalization Techniques: Difference between BatchNorm and LayerNorm
Retrieval-Augmented Generation (RAG):
- What is RAG? Explain its purpose.
- Even if you use RAG, there are hallucinations occurring. Why is this so and what can you do to mitigate?
- What other current academic advancements in RAG? Explain about some frameworks that are trending.
Scenario Question: If we have large-scale data (billions of records) from the web, how can I efficiently select math-related data? Discuss using distributed computing frameworks if possible.