GENIE: A Fine-Grained Measure for Novelty
How GENIE works
GENIE evaluates novelty in four stages: automatic feature discovery, question generation, population building, and population-relative dissimilarity scoring. The figure shows how GENIE is instantiated on the creative writing task.
Q-GENIE Visualizer
Q-GENIE scores help pinpoint the details that make a response unique with respect to the population. Given a question, explore how novel answers extracted from target responses are relative to answers extracted from population responses. Click the population bubble to view its contents. Target answers that are near the population are less novel (i.e. more in-distribution with the population).
Dataset statistics
Novelty score distribution by feature
Kernel density estimates of GENIE scores across the target document set, per feature.
Model novelty scores
We computed the GENIE scores for target responses generated by 18 models across 50 creative writing prompts, with respect to a population of 4,500 responses generated by 21 models. Select a feature to view model rankings and use the filters to narrow by model family or type.