1Cademy - Addressing Distribution Mismatches in Cat Apps

Learn Before

Kitten Uploads Revealing Dev/Test Distribution Mismatch

Essay

Addressing Distribution Mismatches in Cat Apps

Question: Explain why a cat app evaluated primarily on adult cats might have an unrepresentative dev/test set if users upload many kitten pictures.

Sample answer: If an initial dev/test set mainly contains pictures of adult cats, the system is being evaluated on its ability to recognize adult cats. However, if users upload a lot more kitten images than expected after the app is shipped, the model encounters a different distribution of data than what it was tested on. This implies that the initial dev/test set distribution is not representative of the actual distribution the system needs to perform well on in the real world.

Key points:

Dev/test set mainly contained adult cats
Users uploaded unexpectedly high numbers of kittens
Dev/test set distribution is not representative
Mismatch with actual distribution

Rubric: The response must explain that evaluating on adult cats does not reflect real-world usage if users upload kittens, explicitly concluding that the dev/test set is not representative of the actual distribution.

0

1

Updated 2026-06-13

Contributors are:

Who are from:

References

Learn Before

Related