10 de septiembre de 2019

Victory! Ruling in hiQ v. Linkedin Protects Scraping of Public Data

In a long-awaited decision in hiQ Labs, Inc. v. LinkedIn Corp., the Ninth Circuit Court of Appeals ruled that automated scraping of publicly accessible data likely does not violate the Computer Fraud and Abuse Act (CFAA). This is an important clarification of the CFAA's scope, which should provide some relief to the wide variety of researchers, journalists, and companies who have had reason to fear cease and desist letters threatening liability simply for accessing publicly available information in a way that publishers object to. It's a major win for research and innovation, which will hopefully pave the way for courts and Congress to further curb abuse of the CFAA.

The Trouble with the CFAA

Passed in 1986, the CFAA is the federal anti-hacking law, which imposes both criminal and civil liability on anyone who accesses a computer connected to the Internet "without authorization" or "exceeds authorized access." Because the statute does not define "without authorization," interpreting its meaning in the context of modern Internet usages has been notoriously difficult for courts around the country. The hiQ case is just the latest in a series of high-profile Ninth Circuit decisions about the CFAA, in which the appeals court has too often vacillated between limiting the CFAA to its original purpose and adopting more expansive interpretations that risk criminalizing widespread, innocuous online-behavior.

A key question in many early cases was whether companies and websites could enforce their computer use policies, like terms of service or corporate computer policies, through the CFAA's concept of unauthorized access. In 2012, the Ninth Circuit issued a strong ruling in United States v. Nosal (Nosal I)  explaining that it refused to turn the CFAA "into a sweeping Internet-policing mandate." The court instead chose to "maintain[] the CFAA's focus on hacking," holding that violating a company's or website's terms of use cannot give rise to liability. Otherwise, nearly anyone who used the Internet would face potential of criminal liability, for example by violating a social media site's terms of service that prohibited even lying on a user profile.

Unfortunately, the Ninth Circuit muddied its own clear rule in two subsequent decisions, a second decision in the Nosal case (Nosal II) and Facebook v. Power Ventures, both involving password sharing. In Nosal II, the court found that "without authorization" is not limited to the circumvention of technical access mechanisms, like password barriers, and concluded that using someone else's valid login credentials may violate the statute. Then, in Power Ventures, the court found that a data aggregator that had consent to access Facebook users' accounts using their passwords nevertheless violated the CFAA by continuing to scrape data after Facebook sent a cease and desist letter and blocked one of Power Ventures' IP addresses.

The dispute between hiQ and LinkedIn

EFF warned that the Ninth Circuit's misguided decisions in Nosal II and Power Ventures would enable further abuse of the CFAA, and LinkedIn provided an example of why just weeks later.

HiQ Labs' business model involves scraping publicly available LinkedIn data to create corporate analytics tools that could determine when employees might leave for another company, or what trainings companies should invest in for their employees. Perhaps because it intended to develop its own products that would compete with hiQ, LinkedIn served a cease and desist letter, stating it would implement technical measures to stop hiQ from accessing the website at all and relying on the Power Ventures case to argue that any further access to this public information would violate the CFAA. Rather than waiting to be sued, hiQ itself filed suit, obtaining a preliminary injunction in the district court, which found that hiQ was "likely to succeed" on its claims and holding that automated access to public information is likely not a violation of the CFAA. (The court used conditional "likely" language because preliminary injunctions are assessed on the chance a party will succeed after a full determination of the merits.)

On appeal, EFF filed an amicus brief, along with the search engine DuckDuckGo and the Internet Archive, urging the court to recognize that scraping is a commonplace technique that supports research in the public interest, among other beneficial uses. As a technical matter, web scraping is simply machine-automated web browsing, and accesses and records the same information, which a human visitor to the site might do manually. So-called good bots allow researchers to investigate racial discrimination on Airbnb, journalists to reveal price disparities on Amazon, and companies like DuckDuckGo and Google to use bots to make search engines return useful results.

Thankfully, the Ninth Circuit recognized how damaging it would be to extend its prior rulings to publicly available information as with LinkedIn profiles scraped by hiQ. As the court rightly pointed out, authorization commonly means that something is not generally available, and that access requires permission of some sort, whereas here, "the default is free access." Thus, using automated scripts to access publicly available data is not the sort of "breaking and entering" into computers that the Computer Fraud and Abuse Act is intended to police. This ruling upholds the district court's grant of a preliminary injunction, but the case could proceed to a further stage.

This is an extremely important holding that limits the mistakes in the Ninth Circuit's earlier rulings in Nosal II and Power Ventures. The court says that those earlier cases control in situations where authorization is generally required—because data is not public—and the website owner either revokes that authorization or never gives it in the first place. But the court relies on a very narrow interpretation of public information that may not hold up in practice. Once someone logs on to Facebook, for example, a wealth of "private" information is available to every user of the service, making this information essentially publicly available. And, as we pointed out in these earlier cases, if a user grants a third party access, the third party has a form of authorization, even if the website itself would prefer the third party not have access. In any case, if authorization turns on whether or not someone has to log in to a free service, then this incentivizes a move to shield public information behind a log-in page.

Next Steps

Too often, the CFAA is used to chill speech and paint benign and even competitive uses of technology as malicious. While this decision represents an important step to putting limits on  using the CFAA to intimidate researchers with the legalese of cease and desist letters, the Ninth Circuit sadly left the door open to other claims, such as trespass to chattels or even copyright infringement, that might allow actors like LinkedIn to limit competition with its products. And even with this ruling, the CFAA is subject to multiple conflicting interpretations across the federal circuits, making it likely that the Supreme Court will eventually be forced to resolve the meaning of key terms like "without authorization." Meanwhile, EFF will be on the lookout for more opportunities to protect research and innovation, and we'll continue to protect security researchers as part of our Coders Rights Project.

Related Cases: 



☛ El artículo completo original de Camille Fischer lo puedes ver aquí.

No hay comentarios:

Publicar un comentario