There is a slight problem (i am not sure if this is valid case, just thinking loud), if the data were gathered from public sources like fb, they might argue that those data are public.
Yes, I think this is what they'd argue too. But something to think about: Recently Google was asked (in France[1]) to start paying publishers for including news snippets in search results. That data is also "public" in the sense that it is available for crawling, but it is still owned by creators who can (and did) exercise rights to limit use. Presence of a third platform like Facebook further complicates this (I don't know if you sign off your rights to your photos when you upload them to FB for example).
In raw, mathematical black/white terms, yes, I grant that image with my name next to it is technically public. However, uploading a profile portrait is typically intended for someone who searches for your name to determine whether or not a particular URL represents the person they met offline. I have to walk past a security camera to go through a store checkout - my image is in that camera because I want to buy stuff, not because I want my image to be public - and I might present my ID to the clerk because I want to be identified as of age to legally buy alcohol, not because I want the camera to link my face to my name (and my purchase list) or anything creepy like that.
When a person uploads a profile picture or appears in a security camera feed, they typically have an intent that doesn't match Clearview's use case, and an expectation that the stuff that Clearview.ai is trying to do with the image was humanly impossible. Historically, it has been impossible. True, some people are good with faces, and I'm sure some of them work in law enforcement or advertising, but no one can cross-reference 7 billion profile pictures to every security camera on the planet, and remember who went where at what time.
I'd argue that there's a fundamental difference in whether a right does or does not apply based on scale. A human looking at one data point needs to be approached ethically and legislatively differently from a machine looking at a million identical data points, because the use cases are different.
Clearview.ai is trying to make a land grab on human rights, asserting that because the things that they're trying to do have not yet been prohibited (because they're complicated, and because no one realized they were feasible) that they ought to continue to be allowed to do them.
I imagine scraping Facebook might be similar to the result in the hiQ v. Linkedin case, involving scraping public LinkedIn information. At the time, the EFF and a lot of others celebrated the ruling. For example, the EFF characterized it as a victory for "...the wide variety of researchers, journalists, and companies who have had reason to fear cease and desist letters threatening liability simply for accessing publicly available information in a way that publishers object to."
I feel like there ought to be a line somewhere. Like, maybe you can access the data because it's public, but to (for example) profit off of it or even share it further, you need more than just the fact that it's public.
For some reason open source licenses come to mind. You can use this code (or image) under terms XYZ and this license has to go with it too. That way, me letting github to show my code publicly doesn't give you license to do whatever you want with it. Or me letting linkedin show my picture doesn't give you license to do whatever you want with it. Maybe we need something like that.
If you make your profile available publicly then I'd argue it is indeed public. As far as I'm aware, Clearview doesn't have a relationship with Facebook to access non-public data and instead they just operate a web crawler storing anything that's being served without requiring auth.
An image at the top of a news article on cnn.com is "public" in the sense that anyone can access it. But the company and the photographer still retain rights to that image - you can't take it and use it for whatever you like.
What is confusing here is that everyone imagines that clearview (and google, and fb, ...) are really storing those pictures. In reallity they just train their ai. There is no trace of that picture on their servers once you delete it. But ai is capable of recognizing you, in case of clearview from picture. In case of google and fb from your picture, browsing habits, contacts, gps coordinates, your friends, semantics of your texts, ... The only difference is that google and fb are not so stupid to advertise this. But capability is there.
IANAL but I think the GDPR does not just look at the data in isolation, but considers the data and what it is used for.
Thus if I give a company access to my data it does not give them a carte blanche to use it however they see fit, instead I have allowed usage of the data for a set of purposes.
I think that it can be like how github lets you have public repos that are covered under whatever license you like. Being public doesn't invalidate those licenses. Or if Disney made a movie free on youtube for a day, it doesn't lose legal protection over that movie.