True/False

Even after a system surpasses average human-level performance on the full dev/test set, human comparison can still provide value on specific data subsets.