Data Coding round: The interviewer shared 2 table schemas with example data rows. I had to write SQL for all the questions based on that. One table was impressions table (id, visitor_id, page_name, referrer_page_name, ts, ds) and the other was actions table (id, page_impression_id,action, ts, ds). Some helpful date functions are: current_date(), DATE_ADD(current_date, INTERVAL 1 DAY). Q1: Write a query to find which visitor visited the most number of distinct pages yesterday and how many distinct pages they visited. Q2: Write a query to find the total number of actions taken on each page (i.e. page_name) visited yesterday. Q3: Write a query to list the pages (i.e. page_name) for page_impressions where visitors used the “BUTTON CLICK” action more than once yesterday. Q4: We want to periodically send a survey to all “super users”. “Super users” are defined as visitors who have at least one impression every day over each of the last seven days. Write a query that would return the “super users” as of yesterday. Q5: We suspect that our page impression logging is currently broken, such that loggers might send duplicate impressions (two or more records exist with the same exact information). Write a query that can dedupe the events in the impression table so that duplicate events are filtered out. Q6: As an effort to simplify the application, we’ve been asked to identify which pages users spend the most time on. Write a query that provides us with an estimation of the avg duration for how long a user spends on each page. Be sure to call out any assumptions you are making and how they might affect the query results.
Check out your Company Bowl for anonymous work chats.