Knowledge for All will collect or generate and provide public access to mass quantities of journal article metadata, including such elements as title, author, date, and journal title.  How this metadata will be collected or generated has not yet been determined and each method may have its own legal implications (for a discussion of possible methods see Journal Article Data Collection and Creation), but there is still a basic question to be addressed of what content is and is not subject to copyright?

The content will originate, or potential copyright owners will reside, in many different jurisdictions, but primarily Canada, the United States, and the European Union.  There are no international copyright laws, but there are agreements that countries have to respect the copyright laws of other countries.  In the context of Knowledge for All, this means that for content that was created in another country, we will need to abide by the copyright laws of that country which protect that content.  The copyright laws of Canada, the United States, the European Union, and the United Kingdom and implications for the Knowledge for All project will be discussed here.  This discussion is based on information gathered from research and writings about copyright laws.  We have had no personal contact with a legal expert yet, but that is the recommended next step.


Under Canadian Copyright Law, compilations of factual data, including databases, are protected if they were independently created by the author and there was "a sufficient degree of skill, judgment and labour involved in arranging and selecting of the content in the database or other compilation" (Harris, 2000).  Interface, content, and software can all be protected by copyright but the content or data within the compilation are still only protected as a compilation, not as individual elements (Moyse, 2002).

United States

The United States offers less copyright protection for databases than Canada.  Like in Canada, they are protected as "compilations," but while Canada recognizes effort in selecting and arranging data, the US requires "creativity."  A report from the United States Copyright Office in 2003 calling for new legislation to protect databases summarizes current protection as follows: "What remains is a thin layer of copyright protection for qualifying databases. In order to qualify, they must exhibit some modicum of creativity in the selection, arrangement, or coordination of the data. The protection is thin in that only the creative elements (selection, arrangement, or coordination of data) are protected by copyright. Explanatory materials such as introductions or footnotes to databases may also be copyrightable. But in no case is the data itself (as distinguished from its selection, coordination or arrangement) copyrightable." 

European Union and United Kingdom

In 1996, the European Union passed the Council Directive 96/9/EC on the Legal Protection of Databases to give a high level of protection to databases under copyright law which may qualify as "original" and give a separate form of sui generis database right protection to databases which would not be considered "original" under copyright law (Commission of the European Communities, 2005).   Its purpose was to protect databases which may not be covered by copyright to encourage to production of commercial databases in the EU (Commission of the European Communities, 2005).  "While 'original' databases require an element of 'intellectual creation,' 'non-original' databases are protected as long as there has been 'qualitatively or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents' of a database" (Commission of the European Communities, 2005).  Database right lasts for 15 years, but can be extended if the database is updated.

The UK implemented the EU's Directive on database right in 1997 with the Copyright and Rights in Databases Regulations 1997.  These regulations similarily give copyright protection to databases that might be considered an original intellectual creation of an author and special database right protection for databases where there was a substantial investment in obtaining, verifying, or presenting the data within. 

Implications for Knowledge for All

In summary, under Canadian and United States copyright laws the majority of journal article metadata elements are factual data and so not protected by copyright.  Metadata elements that would be considered original creations of their authors, such as subject terms and abstracts, are subject to copyright.  Subject terms and abstracts are discussed in more detail below.  Blogger and law professor Michael Carroll (2009) addresses metadata in particular under US law in writing, "Under these principles, metadata is copyrightable only if it reflects an author’s original expression. For example, a collection of simple bibliographic metadata with fields named “author,” “title,” “date of publication,” would not be sufficiently original to be copyrightable."  Under UK law, however, factual data in a database could be protected by database right or copyright.  In their research on copyright status of metadata in the UK, Gadd, Oppenheim, and Probets (2004) note that under UK copyright law, metadata “is probably protected by copyright... [but] the key word here is 'probably.'”  They suggest that an individual record could be considered a "compilation" of data, but it is questionable whether this would be considered "original" and whether it would be protected.  They conclude that "the more creative effort that goes into the record (such as abstracting, indexing, etc.), the more likely it is that that record enjoys copyright." 

This suggests that Knowledge for All should tread more carefully in harvesting metadata from databases located in the UK and EU and perhaps favour databases located in Canada and the USA.  Copyright law with regards to databases and digital materials is changing and developing, and it is difficult to predict whether the future will bring increased or decreased protection.  A good way to avoid legal problems is to obtain permission to harvest data even when harvesting is explicitly permitted. 

The digital age has brought a lot of concern over unlawful copying and distribution of complete or partial works, such as full-text articles.  By not including full-text articles in its database, Knowledge for All may avoid a lot of copyright headaches.  Thus far there has not been as much concern about copying of metadata.  Gadd, Oppenheim, and Probets' survey of OAI data providers and service providers found that few were concerned with protecting metadata for individual records.

Subject terms and abstracts

Subject terms and abstracts are two elements of journal article metadata that would likely be considered literary works rather than factual information and so would be covered by copyright. 

The Knowledge for All system will ideally include both subject terms and abstracts.  Subject indexing of articles using controlled vocabularies will distinguish Knowledge for All from many commercial databases and tools like Google Scholar while abstracts are invaluable in providing detailed information to users about the content of articles. 

Knowledge for All contributors will select subject terms for articles from controlled vocabularies, except where the license terms of metadata from other sources allow copying of subject terms and those subject terms have been selected from an appropriate controlled vocabulary.  Thus, copyright is not a concern for subject terms.

Knowledge for All contributors could write abstracts for articles, but it would take a significant amount of time, which contributors may not be able to provide, and thus is not a feasible option. 

The first question to consider is who owns the copyright on author-supplied abstracts of published articles - the author or the publisher?  Authors are increasingly retaining some rights over their published material as institutional repositories and open access policies at institutions become more common.  Author-publisher agreements vary between publishers, so this would have to be considered on a journal-by-journal basis.  Where the publisher has supplied the abstract, clearly the publisher is the copyright owner.

But regardless of who owns copyright on abstracts, both the author and the publisher may be willing to allow reproduction of abstracts in the Knowledge for All database because the abstracts will potentially draw more users to read and perhaps purchase articles.  Most publishers allow unrestricted access to abstracts on their websites, assumedly because they see it as an important part of marketing the article, so they may not object to Knowledge for All reproducing abstracts in its database.

If we are not able to include abstracts in the Knowledge for All database itself, we should link every article to its abstract via the publisher's website.  We could look at the possibility of using a pop-up window to show the abstract on the linked page. 

It is recommended that further research be done on how publishers view the distribution of abstracts and how important abstracts are to the Knowledge for All community of users.


The citations, or bibliography, of an article could be covered by copyright as an original compilation of factual data within the body of the article itself. Thus to import an article's citations into the Knowledge for All system for the purpose of citation analysis would likely violate copyright. To get around this we could obtain permission from the publisher for post-prints, obtain permission from the author for pre-prints, or add the citation information by hand, possibly through a tagging system.


