I wrote last week about LLM-led evaluations, or passing a grading rubric to an LLM to get it to grade LLM-generated content.
My rather subjective first take, pending proper Opus, Mistral Large, and GPT-4T evals: https://open.substack.com/pub/skykhan/p/ai-wars-anthropic-strikes-back
My rather subjective first take, pending proper Opus, Mistral Large, and GPT-4T evals: https://open.substack.com/pub/skykhan/p/ai-wars-anthropic-strikes-back