Skip to content

Scraping public websites likely doesn’t violate the Computer Fraud and Abuse Act, court holds

Post categories

  1. Newsgathering
Federal appeals court ruling has potentially major implications for data journalists, favorable for newsgathering

A federal appeals court last week issued a “hugely important” decision with potentially major implications for data journalists when it held that using computer programs to collect publicly available information from the internet — or “scraping” — likely does not violate the Computer Fraud and Abuse Act, the main federal computer crime statute.

On Sept. 9, a three-judge panel of the United States Court of Appeals for the Ninth Circuit issued its opinion in hiQ Labs v. LinkedIn. HiQ is a data analytics company whose business model largely relies on using “bots” to “scrape” data from publicly available LinkedIn profiles. In May 2017, LinkedIn sent hiQ a cease and desist letter, noting that the company’s scraping violated LinkedIn’s user agreement. LinkedIn stated that further scraping of its data would violate California state law, federal copyright law, and the CFAA, which is codified at 18 U.S.C. § 1030.

Two weeks later, hiQ filed a lawsuit in the U.S. District Court for the Northern District of California seeking an injunction against LinkedIn and asking the court to declare that its conduct was legal. The district court granted the preliminary injunction in favor of hiQ, but LinkedIn appealed the decision to the Ninth Circuit.

Last week, the Ninth Circuit panel affirmed the district court’s grant of a preliminary injunction and held that it did not abuse its discretion in granting hiQ the injunction. The panel focused on the fact that the data hiQ scraped was already public information, “available to anyone with a web browser.” The court said the scraping was thus not a violation of the CFAA. The reasoning behind this is noteworthy for journalists for multiple reasons.

The Impact of the CFAA

Famously inspired by the 1983 Matthew Broderick hacker film “WarGames,” the CFAA has both criminal and civil provisions. For data journalists, who scrape websites all the time to build their datasets, the CFAA presents a growing legal concern: The government has used the law to aggressively prosecute internet activists and hackers, and private companies have used the civil cause of action to threaten legal action for scraping.

Such threats are possible because the CFAA does not include a definition of “hacking.” Rather, it refers to accessing a site “without authorization” or in a manner that exceeds authorization. Courts have, in turn, interpreted “without authorization” as access to a site in a way that violates a website’s terms of service. The CFAA therefore transforms a violation of a website’s terms of service into a crime or the basis for a lawsuit.

As Orin S. Kerr, a professor at the University of California, Berkeley School of Law notes, since Congress passed the law in 1986, the federal appeals courts have split in interpreting “without authorization.” Some circuit courts have broadly looked to whether scraping violates a website’s terms of use or service, while other courts have more narrowly interpreted it to require the technical circumvention of some kind of code-based access restriction — what most people would consider “hacking.”

These varying interpretations of “unauthorized access” have led to concerns among press freedom and open internet advocates. As one of the authors of this piece previously wrote for the Reporters Committee in 2018, “the law ballooned to cover an array of activity beyond what we would consider hacking.” The Reporters Committee for Freedom of the Press  has explored the implications of the CFAA for press freedom broadly, publishing an analysis of the CFAA charge against WikiLeaks founder Julian Assange and a post in support of a proposed safe harbor for journalists who want to scrape data from Facebook.

Why the Ninth Circuit decision is significant

The Ninth Circuit’s own CFAA precedent is admittedly complicated, but the court’s reasoning in hiQ is a promising development for journalists and scrapers, mainly because the court seemed to narrow the interpretation of “without authorization.”

Indeed, open internet advocates and press freedom groups celebrated the ruling in United States v. Nosal (Nosal I), a case from 2012 in which the Ninth Circuit adopted a very narrow interpretation of “without authorization” to focus on the original intent of the law. Following a reverse and remand by a Ninth Circuit panel, the court considered the issue in an en banc opinion, which means the case was heard by all of the judges on the court. The court noted that the government’s proposed construction of the CFAA, which would have covered merely downloading information one had authorization to access with a bad purpose, would have over-criminalized common types of computer activity. Such an interpretation would expand the CFAA’s scope “… far beyond computer hacking to criminalize any unauthorized use of information obtained from a computer.”

In United States v. Nosal (Nosal II), however, a divided panel on the  Ninth Circuit held that accessing a computer “without authorization” was not limited to what is traditionally thought of as “hacking.” Instead, Nosal, a former employee of executive search firm Korn/Ferry, was held to have violated the CFAA when he accessed a “protected computer” using the valid login credentials of a current employee to access Korn/Ferry’s database. In other words, the Ninth Circuit interpreted the CFAA to cover password sharing, which prompted Judge Stephen Reinhardt, in dissent, to write, “… despite the majority’s attempt to construe Nosal I as only applicable to ‘exceeds authorized access,’ the case’s central lesson that the CFAA should not be interpreted to criminalize the ordinary conduct of millions of citizens applies equally strongly here.” Reinhardt added “… I would hold that consensual password sharing is not the kind of ‘hacking’ covered by the CFAA.”

Then, in Facebook, Inc. v. Power Ventures, Inc., the Ninth Circuit held that Power Ventures, a company that operated a domain that allowed users to aggregate all of their social networking information in one place, had violated the CFAA by copying Facebook data in violation of that platform’s terms of service. Even though Power Ventures had consent to access Facebook users’ accounts with the users’ log-in information, Facebook sent Power Ventures an individualized cease and desist letter, putting it on notice that it was accessing Facebook data “without authorization.”

Concerns remain that the Power Ventures holding could extend CFAA liability to scrapers who receive some kind of specific notice that their conduct violates a website’s terms of service.

The court could have simply followed this thinking in hiQ. Instead, the court differentiated between scraping data available on public LinkedIn profiles and scraping data shielded behind log-in information.

First, the court explained that the information the defendant accessed in Nosal II required log-in information, and it was information that “no one could access without authorization.” Second, the court noted that in Power Ventures, Facebook “tried to limit and control access to its website” through requiring users to sign in with a username and password. Finally, the court reiterated the strong language from Nosal I, noting, “As we explained in Nosal I, we therefore favor a narrow interpretation of the CFAA’s ‘without authorization’ provision so as not to turn a criminal hacking statute into a ‘sweeping Internet-policing mandate.”’ The decision even appeared to go a step further by discussing the legislative history of the law and saying, “The CFAA was enacted to prevent intentional intrusion onto someone else’s computer — specifically, computer hacking.”

Here, hiQ was scraping data that did not require logging in, and its behavior did not amount to what the court considered to be “breaking and entering.”

“It is likely that when a computer network generally permits access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA,” the court said. Therefore, there is no liability under the CFAA for accessing this “publicly available data.”

This reasoning may represent a significant shift in understanding the meaning of “without authorization,” and an important constraint on the holdings in Nosal II and Power Ventures. Further, the decision calls into doubt the government’s reasoning in a troubling case from outside of the Ninth Circuit, United States v. Auernheimer. Known as the “weev” case, the government prosecuted in New Jersey federal court Andrew “weev” Auernheimer for violating the CFAA. In 2010, Auernheimer’s friend Daniel Spitler discovered that AT&T had left the email addresses of iPad owners exposed on the internet. Spitler wrote a script “slurper” that collected roughly 114,000 email addresses from AT&T’s servers. Auernheimer distributed the list of email addresses to various media organizations to publicize the security flaw. Both were charged under the CFAA, but Spitler accepted a plea deal and Aurenheimer was convicted in March 2013.

Aurenheimer’s conviction was later overturned on appeal due to problems with venue. However, the decision by the U.S. Court of Appeals for the Third Circuit largely avoided addressing an important part of the case — that the data was not hidden behind a password. Instead, the court dropped the discussion into a footnote, stating, “… the Government needed to prove that Auernheimer or Spitler circumvented a code – or password-based barrier to access.” The footnote went on: “Although we need not resolve whether Auernheimer’s conduct involved such a breach, no evidence was advanced at trial that the account slurper ever breached any password gate or other code-based barrier. The account slurper simply accessed the publicly facing portion of the login screen and scraped information that AT&T unintentionally published.”

If, under hiQ, the test is whether the information is password-gated or otherwise restricted, versus open to all on the public internet, that seems to be a workable bright line in an area of law sorely lacking in them. The weev case is just one of many examples demonstrating how parties have weaponized the CFAA by sweeping up non-hacking behavior. Such bright lines are thus crucial in computer crime law, which can trench on free speech and press rights. This is particularly true with respect to data journalism, which often relies on automation to collect and collate large amounts of information.

Of course, LinkedIn may choose to fight this decision by seeking a review before the entire panel of judges on the Ninth Circuit court or continuing to litigate the merits (this decision merely granted a preliminary injunction against LinkedIn). And there continues to be a circuit split over the meaning of “without authorization” that many expect the Supreme Court will have to resolve. Until then, however, we can call this particular decision a favorable one for newsgathering and press freedom.

The Reporters Committee regularly files friend-of-the-court briefs and its attorneys represent journalists and news organizations pro bono in court cases that involve First Amendment freedoms, the newsgathering rights of journalists and access to public information. Stay up-to-date on our work by signing up for our monthly newsletter and following us on Twitter or Instagram.

Stay informed by signing up for our mailing list

Keep up with our work by signing up to receive our monthly newsletter. We'll send you updates about the cases we're doing with journalists, news organizations, and documentary filmmakers working to keep you informed.