[Ed Note: The following post has been authored by Shivang Agarwal, currently in his final year at NALSAR University of Law. In an interesting read, he analyzes the nature of web crawling actions and the kind of information collected to assess whether claims of copyright infringement may be brought against entities running such search engines.]
Web crawling is a process by which programs, which are colloquially known as ‘web spiders’ or ‘web robots’, browse the World Wide Web in a methodical and automated manner in order to index information found on every web page they come across. Many legitimate service providers, including search engines, employ web spiders to provide up-to-date information and data to their users.
Web crawling results in the creation of an index of web pages, allowing users to send queries through a search engine and provide links to the webpages that match the queries. The index is a list of entries which consists of key words, titles, headings, meta data etc. which were taken note of by the web crawler and addresses of the webpages on which they were found. Web crawling also enables archiving of webpages, which involves storing and cataloguing large sets of webpages on servers which are connected to the internet and updating them periodically.
Thus, any potential contravention of the Copyright Act, 1957 (‘Copyright Act’) must be evaluated against the aforementioned uses and the nature of information indexed, stored or cached in the process of web crawling. Under Section 14(1)(a) of the Copyright Act, ‘copyright’ is defined as an exclusive right subject to the provisions of the Copyright Act, to do or authorise any of the stipulated acts in respect of a work or any substantial part thereof. Under Section 51(a)(i), a copyright is deemed to be infringed when any person, without a license granted by the owner of the copyright or the Registrar of Copyrights under this Act does anything that is an exclusive right conferred upon the owner of the copyright.
Firstly, it would be pertinent to discuss the copyrightability of information or data which is getting stored, cached or catalogued through web crawling. Courts in India have placed a heavy reliance on US copyright jurisprudence, to hold that copyright does not subsist in raw facts, data, ideas, information etc. Feist Publications Inc. v. Rural Telephone Service Co. Inc, cited with approval in Eastern Book Company v. D.B. Modak, held that facts are not copyrightable since the sine qua non for copyright is originality. “Original”, as a term used in copyright, means that the work is created by the author independently and that it possesses at least some degree of creativity. In R.G. Anand v. Delux Films, the Supreme Court propounded that a mere idea cannot be the subject matter of copyright.
Therefore, a contravention of the Copyright Act would firstly depend upon the material which is collected by the web crawler. A web crawling action which simply results in collection of bare facts, raw data such as historical information, data captured by sensors, machine inputs, information pertaining to unclassified commercial transactions etc. cannot be copyrighted. Hence, indexation, storage or usage of such data or information in any other form will not constitute a contravention of the Copyright Act. However, if the crawler caches or uses copyrighted works hosted on webpages, then it will invariably constitute a contravention of Section 13(1) of the Copyright Act which states that a copyright shall subsist in original literary, dramatic, musical, artistic works, cinematographic films and sound recordings.
Secondly, a contravention of the Copyright Act would largely depend on the nature of web crawling being carried out by a company. If the scope of web crawling activities is only limited to creation of an index which is used to provide the users with the location of webpages which contain the relevant information required by them, then it should not result in a contravention of the Copyright Act. Essentially, any index created through web crawling contains billions of webpages and is well over 100,000,000 gigabytes in size. Such an index is similar to an index in the back of a book i.e. with an entry for every word seen on every web page indexed. When a web page is indexed, it is added to the entries for all of the words it contains. Thus, the web crawler by indexing web pages performs a limited role of directing the users to webpages of their choice by making the URL of such pages available to them.
The key question which needs to be then asked at this juncture is how the work is being made ‘available’ to the public. Under Section 2(ff) of the Copyright Act, “communication to the public” means making any work available for being viewed by the public by means of display or diffusion, without issuing copies of the work, whether or not any member of the public actually views the work. Copyright is deemed to be infringed if any person, who is not the owner of the copyrighted work indulges in communication to the public of any work.
Although there are no precedents in India, in my opinion the judgment in Perfect 10 v. Amazon.Com would be pertinent. Herein, the US Court of Appeals for the Ninth Circuit held that just providing HTML instructions for the location of copyrighted subject-matter would not by itself cause the copyrighted subject-matter to appear on the user’s computer screen. The HTML merely gives the address of the copyrighted subject-matter to the user’s browser. The user’s browser then interacts with the computer that stores the copyrighted subject-matter. It is this interaction that causes the subject-matter to appear on the user’s computer screen. Essentially, the web-crawler will only display to the public the location and address of the webpages hosting the copyrighted work rather than the work itself. This would not amount communication of the work to the public under Section 2(ff) read with Section 51(a)(i) of the Copyright Act as a web-crawler does not host the actual work thereby making it available to be seen or heard or enjoyed by the users directly or by means of display or diffusion.
Having said that, there are other scenarios in which web crawling may amount to contravention of the Copyright Act. If a web spider or a bot in the course of crawling through web-page stores or caches web pages or even entire websites on servers connected to the internet, it will constitute a direct contravention of the Copyright Act under Section 51(a)(i). Such an action would amount to making copies of and storing subject-matter in which copyright subsists. The Copyright Act equates the storage of any work in any medium by electronic or other means to reproduction of the work in any material form.
Henceforth, a potential contravention of the Copyright Act would largely be dependent on the kind of content hosted by the websites which are crawled upon and the nature of the web crawling itself. Any web crawling action concerned with indexation and storage of bare facts or raw data is legitimate. For works which are original and presuppose creativity, an infringement would be dependent on the nature of the web crawling action. If web crawling is limited to providing the location of the webpages after matching them with the queries of the customers, then it should not constitute a contravention of the Copyright Act under Section 51(a)(i). However, storage or creation of copies of web pages hosting copyrighted works would invariably contravene the Copyright Act.