This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

No Comments

What is NOT Hacking?

Law, Professional, Technology Comments (0)

A federal court has decided that some things are not the same things as hacking.

Here is the Healthcare Advocates Decision. And we’ll take the background on the case from the decision:

Healthcare Advocates is a patient advocacy organization that assists its members in their dealings with health care providers. The Harding firm is a boutique law firm located in suburban Philadelphia that focuses its practice on intellectual property law. Healthcare Advocates was the plaintiff in a lawsuit filed in June of 2003 by its founder and president Kevin Flynn in which he alleged that a competitor of the company infringed trademarks and misappropriated trade secrets belonging to Healthcare Advocates. The Harding firm represented the defendants in that lawsuit, an action which was dismissed by this Court on summary judgment. Flynn v. Health Advocate,
Inc., 2005 WL 288989, No. 03-3764 (E.D. Pa. Feb. 8, 2005) (hereinafter the “Underlying Litigation”).

The present civil action arises out of events that occurred in the pre-discovery phase of the Underlying Litigation. The facts of this case are relatively simple. Healthcare Advocates commenced the Underlying Litigation by filing a complaint on June 23, 2003. After receiving the complaint, the Harding firm began investigating the facts behind the allegations contained therein. The investigation led the Harding firm to search on the Internet for information about Healthcare Advocates. On July 9, 2003, and July 14, 2003, employees of the Harding firm accessed a website operated by the Internet Archive (www.archive.org), and viewed archived screenshots1 of Healthcare Advocates’ website (www.healthcareadvocates.com) via a tool contained on Internet Archive’s website called the Wayback Machine. The Wayback Machine allowed the Harding firm to see what Healthcare Advocates’ public website looked like prior to the date the complaint was filed in the Underlying Litigation.

Viewing the content that Healthcare Advocates had included on its public website in the past was very useful to the Harding firm in assessing the merits of the trademark infringement and trade secret misappropriation claims brought against their clients. The Harding firm also printed copies of each archived screenshot of Healthcare Advocates’ public website that they viewed via the Wayback Machine. The images were used during the course of the Underlying Litigation. The Harding firm did not actively save any of the screenshots they viewed onto their computer hard drives.

In this civil action, Healthcare Advocates alleges that the Harding firm’s use of the Wayback Machine to obtain archived screenshots constituted “hacking.” While the word hacking is not defined in the Complaint, Healthcare Advocates claims that the Harding firm manipulated the Wayback Machine on July 9, 2003, and July 14, 2003, in a way that rendered useless a protective measure that it had employed on its website. The protective measure at issue was a robots.txt file. Healthcare Advocates placed this file on its website as a means of preventing the public from accessing archived screenshots of www.healthcareadvocates.com that were present on Internet Archives’ database. Healthcare Advocates believes that the robots.txt file acted like a digital padlock. Since the Harding firm did not have the “key,” Healthcare Advocates argues that they could only have obtained these protected images by breaking the robots.txt “lock.”

By way of background, the Internet Archive is a nonprofit organization that has created an online library of digital media in an effort to preserve digital content for future reference. Its digital database is equivalent to a paper library, but is filled with digital media like websites instead of books. The library includes a collection of chronological records of various websites which Internet Archive makes available at no cost to the public via the Wayback Machine. The library’s records include more than 85 billion screenshots of web pages which are stored on a computer database in California. Internet Archive’s database provides users with the ability to study websites that may have been changed or no longer exist.

The chronological records are compiled by routinely taking screenshots of websites as they exist on various days. Internet Archive collects images through a process called crawling.

[...]

Internet Archive’s adherence to the robots exclusion protocol provided two benefits to website owners in practice. First, for those websites that did not have a robots.txt file present at the website’s inception, but included it later, Internet Archive would remove the public’s ability to access any already archived screenshots stored in its database. The archived images were not deleted, but were instead rendered inaccessible to the general public. Second, the crawler employed by Internet Archive would be instructed not to gather screenshots of that website in the future. Those were the terms of the exclusion policy in effect when Healthcare Advocates placed a robots.txt file on its website.

Healthcare Advocates had not included a robots.txt file on its website prior to July 7, 2003. Consequently, Internet Archive’s database included screenshots of Healthcare Advocates’ website. Kevin Flynn, president of the company, remembered first placing a robots.txt file on the website on either July 7, 2003, or July 8, 2003. He is unsure of the exact date. Once the file was included, Mr. Flynn expected that the public would be denied access to any archived images of Healthcare Advocates’ website stored in Internet Archive’s database in accord with the exclusion policy. Normally, the public would have been denied access. However, on the dates in question Internet Archive’s servers malfunctioned, and provided Healthcare Advocates archived images to those who requested them.

The images were blocked through an automated process. When requests were made via the Wayback Machine, the servers automatically checked to see if a robots.txt file existed on the website which was the origination of the archived images being requested. If a robots.txt file was present, then the Wayback Machine would return a message stating that the archived images were blocked by the website owner via a robots.txt file. Internet Archive blocked the archived screenshots on an all or nothing basis. If a website owner blocked any portion of his website, then public access was denied for all web pages contained in the database. But, when the Harding firm used the Wayback Machine on July 9, 2003, and July 14, 2003, the servers which checked for robots.txt files and blocked the images were malfunctioning. Internet Archive’s servers did not respect the robots.txt file on Healthcare Advocates’ live website. Thus, the Harding firm was able to view and print copies of archived screenshots of Healthcare Advocates’ website stored in Internet Archive’s database.”

Simply put, the IA archiver didn’t pull the robots.txt file when their policies said that it should have been pulled, and as a result, a law firm was able to view webpages that Healthcare Advocates didn’t want them to see. Healthcare Advocates thinks that the law firm used some form of Internet magic to make IA actually disregard the robots.txt file. The judge disagrees with Healthcare Advocates.

Now, there’s more to this than merely the use of IA’s Wayback Machine, though. Read on:

Healthcare Advocates reads this answer to state that the Harding firm anticipated that they would be sued for violating a statute enacted to combat “hacking” when they accessed a public website via a web browser. Healthcare Advocates’ inference that the Harding firm immediately knew that its actions in using a public website to obtain archived screenshots of another public website would open them up to liability under the DMCA is unreasonable. This answer shows that the Harding firm knew these archived images were relevant to the Underlying Litigation, and that they had a duty to preserve any copies they printed. What the Harding firm should have anticipated was that the images they copied would be relevant, which they did and saved accordingly. (See generally Pl’s Mot. Partial Summ. J. Ex. B, Bonini Dep. at 250-51.) The Harding firm had no reason to anticipate that using a public website to view images of another public website would subject them to a civil lawsuit containing allegations of hacking.

Healthcare Advocates further argues that since the Harding firm clearly knew that the cache files were relevant, they should have immediately removed the computers from further use for fear that these temporary files might be lost. Healthcare Advocates believes that the failure to take this measure simply “shocks the conscience.” (Pl’s Br. Mot. Partial Summ. J. at 23.) As stated above, this Court has not seen any evidence showing that the Harding firm knew or should have known that a lawsuit under the DMCA was likely, or that temporary cache files would be sought. Thus, the failure to immediately remove computers that the firm used everyday, when they had no reason to believe that their actions would subject them to a lawsuit for “hacking,” is not an action that shocks the conscience.

The Court goes on to discuss that it’s the computer that deletes files in the temporary cache. Therefore, the law firm couldn’t really be held responsible for the computer doing what a computer does. As the Court explains: “The Harding firm did not purposefully destroy evidence. To impose a sanction on the Harding firm for not preserving temporary files that were not requested, and might have been lost the second another website was visited, does not seem to be a proper situation for an adverse spoilation inference.”

Now, the biggest thing that you’re going to hear about this case in the coming days (if you hear anything at all) is what Healthcare Associates attorney had to say: “We are pleased that, as a matter of first impression, a robot.txt file qualifies as a security measure that controls access.”

That’s wrong. It is just plain wrong. Here’s what the Court actually says:

The measure at issue in this case is the robots.txt protocol. No court has found that a robots.txt file universally constitutes a “technological measure effectively controll[ing] access” under the DMCA. The protocol by itself is not analogous to digital password protection or encryption. However, in this case, when all systems involved in processing requests via the Wayback Machine are operating properly, the placement of a correct robots.txt file on Healthcare Advocates’ current website does work to block users from accessing archived screenshots of its website. (Pl’s Mot. Partial Summ. J. Ex. F, Expert Report of Edward Felton at 10). The only way to gain access would be for Healthcare Advocates to remove the robots.txt file from its website, and only the website owner can remove the robots.txt file. Thus, in this situation, the robots.txt file qualifies as a technological measure effectively controlling access to the archived copyrighted images of Healthcare Advocates. This finding should not be interpreted as a finding that a robots.txt file universally qualifies as a technological measure that controls access to copyrighted works under the DMCA.

(emphasis added).

In other words, while in the context of the use of the Wayback Machine, a robots.txt file qualifies as a technological measure controlling access under the DMCA, this is NOT the case universally.

So, what do we learn from this? Well…

  1. Using the Wayback Machine isn’t hacking.
  2. Unless you’re specifically asked to keep them, temporary cache files get to remain temporary.
  3. Robots.txt both is and is not a measure controlling access under the DMCA, depending on whether you’re using the Wayback Machine or not.

Hat tip: Law.com

MickC @ August 16, 2007

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>