How do you handle missing data in a dataset? Explain the difference between apply(), map(), and applymap() in Pandas. What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN? How do you find duplicates in a table? How do you interpret a p-value? How do you prevent overfitting in a machine learning model? Explain precision, recall, and F1-score. When would you use a decision tree over logistic regression? How would you measure the success of a recommendation system? Imagine you're given messy, real-world data with missing values and outliers. Walk me through how you'd clean and prepare the data.
Check out your Company Bowl for anonymous work chats.