Apple interview question

What is KV cache ? how does it help in LLM inference ?