77.5% of the novel responses generated by COMET were deemed “plausible” by human evaluators. That’s less than 10% shy of human-level performance.
Sapere Aude
77.5% of the novel responses generated by COMET were deemed “plausible” by human evaluators. That’s less than 10% shy of human-level performance.