What does that mean in practice? My favorite example is the tendency of image recognition systems to look at a photo of a grassy hill and say ‘sheep’. Most of the pictures that are examples of ‘sheep’ were taken on grassy hills, because that’s where sheep tend to live, and the grass is a lot more prominent in the images than the little white fluffy things, so that’s where the systems place the most weight.
A more serious example came up recently with a project to look for skin cancer in photographs. It turns out that dermatologists often put rulers in photos of skin cancer, for scale, but that the example photos of healthy skin do not contain rulers. To the system, the rulers (or rather, the pixels that we see as a ruler) were just differences between the example sets, and sometimes more prominent than the small blotches on the skin. So, the system that was built to detect skin cancer was, sometimes, detecting rulers instead.
A central thing to understand here is that the system has no semantic understanding of what it’s looking at. We look at a grid of pixels and translate that into sheep, or skin, or rulers, but the system just sees a string of numbers. It isn’t seeing 3D space, or objects, or texture, or sheep. It’s just seeing patterns in data.
“The data clearly indicate that being able to read is not a requirement for graduation at (Madison) East, especially if you are black or Hispanic”