Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> "Behind the scenes, Google doesn't only have public data," says Suchanek. It can also pull in information from Gmail, Google+ and Youtube."You and I are stored in the Knowledge Vault in the same way as Elvis Presley," Suchanek says.

I really hope Google does not use Gmail data for projects other than ads. They really needs to ask users to opt-in to this kind of data sharing. I'm ok with gmail being read for ads, but almost anything else is unethical, especially some experimental knowledge base.



> I really hope Google does not use Gmail data for projects other than ads.

It's already used by the Google Now cards on Android, and it's a fantastic feature. If I book a flight, I automatically get a card that reminds me to leave for the airport at the correct time (taking traffic into account), without any interaction on my part.


If any flight itinerary hits gmail at all, in fact, it ends up in Now - as I've found out from itineraries forwarded by family and friends. Has been borderline annoying on occasion, since I don't generally care much if someone else's flight has been delayed.


Last week I searched google for more information about a specific compiler error code. Later that day google now showed me flight info for some flight that happened to have the same code.


I find that fantastically useful when I'm traveling to meet family somewhere or when family is traveling to meet me. You can handily swipe them away if they are not useful though.


You really shouldn't treat your family that way.


It doesn't make any sense to do that, for exactly the reasons you mention. You'd gain little value and basically ruin all public trust in you.

Luckily the guy who said that is from Télécom ParisTech, i.e. he was completely speculating.

Public posts from google+ and youtube are fine, though.


Maciej Cegłowski: The Internet with a Human Face http://idlewords.com/bt14.htm

One of the best discussions bar none of this issue I've seen.


Why should google care what you are ok with after they already have all your data? If you don't want them to be able to engage in activities like this then don't give them your data in the first place.


It's still my data. Post office employees are not allowed to read my letters, even though I have given them into their care.

There are very good reasons why we, as a society, have agreed to disallow many activities that are physically possible. There's a good case to be made that such a rule should be explicitly added where organizations are entrusted with private data.


I'd love a world in which email providers would be held to the same standards as the post office.

But that's not the world we currently live in.

The only thing that sets a limit on what google can do with your data is the amount of data you give them. They also have terms of service and privacy policies but these can change over time and/or be re-interpreted in creative new ways to enable whatever it is they want to do next.


Well yes. The main part of your comment is descriptive: you are describing the state of the world as it is today.

However, there is a normative side to the debate as well. This is what I (and you in your first line) explicitly referred to. This side is about asking what state of the world is desirable. It is perfectly legitimate and good to ask this question, so that we might hopefully act upon the answer once it has been found. That is how progress is made in the world.


Legal restrictions do apply - for EU citizens, quite a few rights cannot be taken away by 'terms&conditions' of online companies.


What standard? The US Post office scans the mail it handles.


The concept of data ownership doesn't make much sense. Data is infinitely copyable and infinitely inferrable, thanks to magic of causality (at Google scale, if I couldn't read something from your mail, I could probably correlate it out of your search queries, web browsing patterns and location history). The discussion should be about ways to obtain a particular piece of information and the ways to use it.

The perfect example to illustrate this is actually what waterlesscloud wrote downthread:

> If I leave some loose hairs on an airline seat, does the airline now own my dna?

Do you own your DNA? What the hell would that even mean?


Do you own your DNA?

Yes. Intellectual property, clean and simple. If someone can make a buck off my DNA, then I get my cut. Prevents exploitation such as this:

http://en.wikipedia.org/wiki/Henrietta_Lacks

"Neither Lacks nor her family gave her physician permission to harvest the cells. At that time, permission was neither required nor customarily sought. The cells were later commercialized. In the 1980s, family medical records were published without family consent. This issue and Mrs. Lacks' situation was brought up in the Supreme Court of California case of Moore v. Regents of the University of California. On July 9, 1990, the court ruled that a person's discarded tissue and cells are not their property and can be commercialized."


No, the "intellectual property" term is absurd, not clean and simple. You cannot own a piece of data like you'd own a physical object. I'll pass the mike to RMS here.

http://www.gnu.org/philosophy/not-ipr.en.html

Also you call developing a vaccine to cure Polio an exploitation? As far as I can tell from cursory reading of that article, this "exploitation" was hugely beneficial to society.


this "exploitation" was hugely beneficial to society.

Which explains why so many people were reluctant to acknowledge the source.


It most certainly is not your data. It's on their servers, in their apps, and running through their network. They decide what they do it with, how long they keep it, and whether or not you even have access to it. Comparing them to the post office doesn't really make sense either considering that's a public service, and one a depressingly large number of people want to get rid of. Google is a for profit company and their data is how they make money.

If you don't like that reality then don't use their service. It really is that simple.


Nope. My data is me. The totality of my data is literally my identity. If you know everything about me, you can steal my identity and assume my living role.

I don't like the reality of the US War Machine killing innocents simply to enrich crony war profiteers. By your reasoning, I should stop paying taxes too.


> By your reasoning, I should stop paying taxes too.

You can. Depending on how principled you are about thing like this. You'd still need to give up your American citizenship otherwise it doesn't matter where you live on the planet.


I'm unconvinced: what about UPS? They're not a public service and are a for-profit company.


You can phrase that even stronger. In many countries, the post office is privatized. Examples:

http://en.m.wikipedia.org/wiki/Deutsche_Post

http://en.m.wikipedia.org/wiki/KPN


[deleted]


No it doesn't. That privacy policy is pretty clear that they won't share personal information without my consent, and I'm quite certain a court would agree.


Sure it make sense. The post office could do the same nasty things with your mail. But they don't, even though they could. It's not because you can you should.


If I leave some loose hairs on an airline seat, does the airline now own my dna?


The very concept of "owning data" is nonsense, as your example clearly shows.


They certainly can scan it if you leave it behind.


No, but they have access to it.


The USPS is only partially public. It is mostly a private company with a government influenced charter, which is the same basic structure as every corporation.


Gmail users permit google to analyze their email for their own purposes.


Emails are all about sender and receiver. Often only one participant is using a GMail address. Using email text for ad purposes is one thing, analyzing the email text where Google only acts as a carrier and using it for A.I. purposes is whole different thing.


None of that makes any difference to Google. Their view is that people benefit from what they do.


email providers and all other providers need to have access only to encrypted data eventually, hopefully soon (to remove the temptation to use this valuable data...)


I should be able to use services from companies based on some terms and expect those terms to be respected.

You're basically saying I shouldn't expect any sort of fair treatment or rights from any service provider on the Internet. I don't want to play on your Internet.


Jacques is describing the Internet as it currently is. If you don't want to play on it, you need to do something about it or stop playing.


Google does respect its terms. Their terms of service let them do whatever they want with your information, and they can update their terms at any time.


Instead of downvoting... maybe someone could point out where and how exactly Google violated their own terms?


Google can (and does) change its terms without notice.

It's modestly better about this than many other SaaS / PaaS providers, but not by much.

I'm having a conversation at this moment with the chief architect of G+ over the G+/YouTube Anschluss in which the two services were integrated. I had separate accounts on each prior to this, repeatedly refused to combine accounts, and yet found them combined as of last November.

Worse: individual users have little or no recourse against such actions.

As for Gmail, as has been pointed out, parties not using Google directly have their private correspondence entered into Google's systems. And not just when emailing Gmail addresses, but many domains for which email is handled via Gmail.

Similar arguments could be made for many other online service providers as well. I don't consider Google to be significantly different from many of these, either for better or worse. But they're certainly a massive and major part of the problem, particularly for their size and scope.

Bruce Schneier and Eben Moglen have made this point quite well, particularly in their December, 2013 Columbia Law School talk, and Schneier's April, 2014, Stanford Law School lecture.

Maciej Cegłowski, "The Internet with a Human Face", makes the case far better yet. http://idlewords.com/bt14.htm


The actual corpus that is worth using is the book corpus. While Google can't provide public access to all of the books it has scanned there is no restriction on them using the data in the books to feed this project. Given the amount of information they have scanned from libraries and elsewhere that is a much better source.


Is anyone doing the same for the books scanned by Archive.org?


Reminds me of Doctorow's Scroogled. http://craphound.com/scroogled.html

The funny thing is Doctorow makes references to "just metadata" years before it became a public issue, however this goes beyond metadata, and will eventually contain facts about people, not just tangential stuff.

"This isn't P.I.I."—Personally Identifying Information, the toxic smog of the information age—"It's just metadata. So it's only slightly evil."


One quibble: Metadata is facts about people. Logging meta-data isn't a slippery-slope toward also catching facts. It's a problem from the jump.

"Joe goes to the gym three times a week" is a fact.

"Joe's network activity originates from a gym on the following schedule" is not only at least an equivalent fact, in practice it's far superior to the simple case. It can give you subtleties [1], it's less susceptible to subterfuge [2], it gives you actionable evidence of specific occurrences, etc.

Consider the CIA doesn't use meta-data to target hellfire missiles because it's less identifying than actual data. They use it because it's far better.

[1] Joe never goes to the gym on Saturday. Joe goes to the gym more during the spring than the winter. Joe almost never misses a day when Sally is at the gym. Joe and Sally nearly always leave at the same time.

[2] It's trivial for someone to say they go to the gym on a schedule they don't. It's not even too difficult to get a second or third party to fudge, embellish or outright lie on their behalf. It's much more difficult to get a second or third party to help you make your device convincingly take the claimed routine, without you creating any conflicting meta-data that gives up the ruse.


This is important and should be a top-level post. I think many people miss this in discussions about privacy, and this is the reason I believe the privacy is dead. Everything you do is metadata about everything else you do, about everything else people around you do. You can infer any piece of "data" you want given enough (meta)data sources and computing power.

Thus the only way we can keep privacy would be to roll back the last 50 years of technological progress, and that's why I'm starting to entertain a thought that we (as a society) should drop the concept entirely and tackle the change head-on, instead of being dragged there by force by the ongoing progress of technology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: