Legal issues

Copyright of journal article metadata

Knowledge for All will collect or generate and provide public access to mass quantities of journal article metadata, including such elements as title, author, date, and journal title.  How this metadata will be collected or generated has not yet been determined and each method may have its own legal implications (for a discussion of possible methods see Journal Article Data Collection and Creation), but there is still a basic question to be addressed of what content is and is not subject to copyright?

The content will originate, or potential copyright owners will reside, in many different jurisdictions, but primarily Canada, the United States, and the European Union.  There are no international copyright laws, but there are agreements that countries have to respect the copyright laws of other countries.  In the context of Knowledge for All, this means that for content that was created in another country, we will need to abide by the copyright laws of that country which protect that content.  The copyright laws of Canada, the United States, the European Union, and the United Kingdom and implications for the Knowledge for All project will be discussed here.  This discussion is based on information gathered from research and writings about copyright laws.  We have had no personal contact with a legal expert yet, but that is the recommended next step.

Canada

Under Canadian Copyright Law, compilations of factual data, including databases, are protected if they were independently created by the author and there was "a sufficient degree of skill, judgment and labour involved in arranging and selecting of the content in the database or other compilation" (Harris, 2000).  Interface, content, and software can all be protected by copyright but the content or data within the compilation are still only protected as a compilation, not as individual elements (Moyse, 2002).

United States

The United States offers less copyright protection for databases than Canada.  Like in Canada, they are protected as "compilations," but while Canada recognizes effort in selecting and arranging data, the US requires "creativity."  A report from the United States Copyright Office in 2003 calling for new legislation to protect databases summarizes current protection as follows: "What remains is a thin layer of copyright protection for qualifying databases. In order to qualify, they must exhibit some modicum of creativity in the selection, arrangement, or coordination of the data. The protection is thin in that only the creative elements (selection, arrangement, or coordination of data) are protected by copyright. Explanatory materials such as introductions or footnotes to databases may also be copyrightable. But in no case is the data itself (as distinguished from its selection, coordination or arrangement) copyrightable." 

European Union and United Kingdom

In 1996, the European Union passed the Council Directive 96/9/EC on the Legal Protection of Databases to give a high level of protection to databases under copyright law which may qualify as "original" and give a separate form of sui generis database right protection to databases which would not be considered "original" under copyright law (Commission of the European Communities, 2005).   Its purpose was to protect databases which may not be covered by copyright to encourage to production of commercial databases in the EU (Commission of the European Communities, 2005).  "While 'original' databases require an element of 'intellectual creation,' 'non-original' databases are protected as long as there has been 'qualitatively or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents' of a database" (Commission of the European Communities, 2005).  Database right lasts for 15 years, but can be extended if the database is updated.

The UK implemented the EU's Directive on database right in 1997 with the Copyright and Rights in Databases Regulations 1997.  These regulations similarily give copyright protection to databases that might be considered an original intellectual creation of an author and special database right protection for databases where there was a substantial investment in obtaining, verifying, or presenting the data within. 

Implications for Knowledge for All

In summary, under Canadian and United States copyright laws the majority of journal article metadata elements are factual data and so not protected by copyright.  Metadata elements that would be considered original creations of their authors, such as subject terms and abstracts, are subject to copyright.  Subject terms and abstracts are discussed in more detail below.  Blogger and law professor Michael Carroll (2009) addresses metadata in particular under US law in writing, "Under these principles, metadata is copyrightable only if it reflects an author’s original expression. For example, a collection of simple bibliographic metadata with fields named “author,” “title,” “date of publication,” would not be sufficiently original to be copyrightable."  Under UK law, however, factual data in a database could be protected by database right or copyright.  In their research on copyright status of metadata in the UK, Gadd, Oppenheim, and Probets (2004) note that under UK copyright law, metadata “is probably protected by copyright... [but] the key word here is 'probably.'”  They suggest that an individual record could be considered a "compilation" of data, but it is questionable whether this would be considered "original" and whether it would be protected.  They conclude that "the more creative effort that goes into the record (such as abstracting, indexing, etc.), the more likely it is that that record enjoys copyright." 

This suggests that Knowledge for All should tread more carefully in harvesting metadata from databases located in the UK and EU and perhaps favour databases located in Canada and the USA.  Copyright law with regards to databases and digital materials is changing and developing, and it is difficult to predict whether the future will bring increased or decreased protection.  A good way to avoid legal problems is to obtain permission to harvest data even when harvesting is explicitly permitted. 

The digital age has brought a lot of concern over unlawful copying and distribution of complete or partial works, such as full-text articles.  By not including full-text articles in its database, Knowledge for All may avoid a lot of copyright headaches.  Thus far there has not been as much concern about copying of metadata.  Gadd, Oppenheim, and Probets' survey of OAI data providers and service providers found that few were concerned with protecting metadata for individual records.

Subject terms and abstracts

Subject terms and abstracts are two elements of journal article metadata that would likely be considered literary works rather than factual information and so would be covered by copyright. 

The Knowledge for All system will ideally include both subject terms and abstracts.  Subject indexing of articles using controlled vocabularies will distinguish Knowledge for All from many commercial databases and tools like Google Scholar while abstracts are invaluable in providing detailed information to users about the content of articles. 

Knowledge for All contributors will select subject terms for articles from controlled vocabularies, except where the license terms of metadata from other sources allow copying of subject terms and those subject terms have been selected from an appropriate controlled vocabulary.  Thus, copyright is not a concern for subject terms.

Knowledge for All contributors could write abstracts for articles, but it would take a significant amount of time, which contributors may not be able to provide, and thus is not a feasible option. 

The first question to consider is who owns the copyright on author-supplied abstracts of published articles - the author or the publisher?  Authors are increasingly retaining some rights over their published material as institutional repositories and open access policies at institutions become more common.  Author-publisher agreements vary between publishers, so this would have to be considered on a journal-by-journal basis.  Where the publisher has supplied the abstract, clearly the publisher is the copyright owner.

But regardless of who owns copyright on abstracts, both the author and the publisher may be willing to allow reproduction of abstracts in the Knowledge for All database because the abstracts will potentially draw more users to read and perhaps purchase articles.  Most publishers allow unrestricted access to abstracts on their websites, assumedly because they see it as an important part of marketing the article, so they may not object to Knowledge for All reproducing abstracts in its database.

If we are not able to include abstracts in the Knowledge for All database itself, we should link every article to its abstract via the publisher's website.  We could look at the possibility of using a pop-up window to show the abstract on the linked page. 

It is recommended that further research be done on how publishers view the distribution of abstracts and how important abstracts are to the Knowledge for All community of users.

Citations/Bibliography/References

The citations, or bibliography, of an article could be covered by copyright as an original compilation of factual data within the body of the article itself. Thus to import an article's citations into the Knowledge for All system for the purpose of citation analysis would likely violate copyright. To get around this we could obtain permission from the publisher for post-prints, obtain permission from the author for pre-prints, or add the citation information by hand, possibly through a tagging system.

References

Carroll, Michael. "Copyright in Database." Carrollogos: A Blog about Law, Technology, and Music (20 February 2009). Retrieved 29 March 2011 from http://carrollogos.blogspot.com/2009/02/copyright-in-databases.html

Carson, David O. Statement of David O. Carson General Counsel, United States Copyright Office before the Subcommittee on Courts, the Internet, and Intellectual Property Committee on the Judiciary and the Subcommittee on Commerce, Trade and Consumer Protection Committee on Energy and Commerce. Database and Collections of Information Misappropriation Act of 2003, Hearing, September 23, 2003 (United States House of Representatives 108th Congress, 1st Session).  Retrieved 29 March 2011 from http://www.copyright.gov/docs/regstat092303.html

Commission of the European Communities. First evaluation of Directive 96/9/EC on the legal protection of databases - DG Internal Market and Services Working Paper.  Brussels: Commission of the European Communities (2005). Retrieved 29 March 2011 from http://ec.europa.eu/internal_market/copyright/prot-databases/prot-databa...

Gadd, Elizabeth, Charles Oppenheim, and Steve Probets. "RoMEO studies 5: IPR issues facing OAI data and service providers." The Electronic Library 22.2 (2004): 121-138.

Harris, Lesley Ellen. Canadian Copyright Law (3rd Edition). Toronto: McGraw-Hill Ryerson (2001).

Moyse, Pierre-Emmanuel. Database Rights in Canada. Montreal: Leger Robic Richard, advocats (2002). Retrieved 10 March 2011 from http://www.robic.ca/publications

Pohl, Adrian. "Are Bibliographies Copyrightable? – The German Case." Open bibliography and Open Bibliographic Data blog 13 July 2011.  Retrieved 17 August 2011 from http://openbiblio.net/2011/07/13/are-bibliographies-copyrightable-the-ge...

United Kingdom Intellectual Property Office. Copyright: Rights in Performances, Publication Right, Database Right - Unofficial Consolidated Text of UK Legislation to 8 April 2010. London: United Kingdom Intellectual Property Office (2010). Retrieved 29 March 2011 from http://www.ipo.gov.uk/cdpact1988.pdf
 

Licensing of K4All content and data

by Samantha Read, Amanda Stevens, and Michelle Gruda

Purpose

This report outlines different data licensing options for Knowledge for All content and data that fit within the realm of open data, as defined by the Open Knowledge Foundation, and discusses advantages and disadvantages of available licenses.  We conclude with a recommendation to adopt a Creative Commons Attribution license for content related to the Knowledge for All project and an Open Data Commons Attribution license for data related to the Knowledge for All project.

Considerations in selecting a license

It is important that the Knowledge for All data license supports Knowledge for All’s core values, as defined in the Engagement Strategy:

  • Openness. Knowledge for All is developed using open source software, open data and aims to facilitate open access to scholarly journal literature as much as possible.
  • Accessibility. Knowledge for All aims to lift all barriers in access to scholarly journal literature, whether financial, legal, formal, linguistic or otherwise.
  • Collaboration. Knowledge for All is a flat organization that invites the international library and academic community along with members of the public at large to contribute and benefit from its development as equal participants regardless of institutional or individual affiliation.
  • Interdisciplinarity. Knowledge for All content aims to span all academic disciplines and represent a diverse range of subject areas.
  • Accountability. Knowledge for All is governed, developed and maintained by the very same community it is intended to serve and invites continuous feedback and input in order to ensure needs are met.
  • Sustainability. Knowledge for All is designed as a long term, dynamic solution intended to evolve and grow with the needs of the community it serves.

In terms of openness and accessibility, the Open Knowledge Foundation has defined open knowledge with eleven key points, which can be found on their website (http://opendefinition.org/okd/), and will be summarized here.  This definition has been supported and endorsed by many organizations involved in the open data movement.

  • Access. The work must be easily accessed, easily modified, and if there is any cost associated it must only be a reasonable reproduction rate. There must be no strings attached, such as limitations on database access.
  • Redistribution. The license allows others to copy and sell or give away the work. There cannot be any royalty required.  Reuse. The work may be modified, and redistribution of any rework will be subject to the same terms as the original work.
  • Absence of technological restriction. The format must make the above convenient, with no price attached. This is most easily done by using open data formats.
  • Attribution. The license can require that the creators be attributed. This must be convenient, so attach a list of authors.
  • Integrity. The license can require that a modified version of the work have a different name from the original.
  • No discrimination against persons or groups.    
  • No discrimination against fields of endeavor. The license may not block a particular field from using the work. This clause is largely to prevent licenses that prevent commercial users from participating. In the spirit of Open Knowledge, everyone is welcome.
  • Distribution of license. Other licenses may not be added to the Open Knowledge license, such as a non-disclosure agreement.
  • License must not be specific to a package. If a work is part of a package, the license must apply even if the work is removed from the package and redistributed separately.
  • License must not restrict the distribution of other works. The license must not restrict other works distributed along with it, such as insisting all other works be Open Knowledge.

Knowledge for All’s core values and the Open Knowledge Foundation’s definition of open knowledge should inform the selection of a license for Knowledge for All’s journal and journal article data.  Another important consideration is the licensing terms of existing collections of open data and how our license might limit availability of existing data collections.

Content and Data Licenses

In the world of open content, the most popular licenses are those created by the Creative Commons organization. The six Creative Commons licenses allow reuse of content with various restrictions while three Creative Commons licenses adhere to the tenets of open knowledge: Attribution (CC BY), Attribution Share-Alike (CC BY-SA), and CC0.  While the Creative Commons licenses were developed for content, the Open Data Commons developed similar licenses specifically for databases and data contained in databases. Open Data Commons licenses which could be used for Knowledge for All data include Attribution License (ODC-By), Open Database License (ODbL), Database Contents License (DbCL), and ODC Public Domain Dedication and License.  As explained in their FAQ, separate licenses were developed because a database and the data contained within it sometimes needs to be licensed separately and differently, copyright often applies differently to databases and content, and data in databases is often used differently than content. The Open Data Commons and Creative Commons licenses are explained below in order of least to most restrictive.

Public domain licenses

CC0 is the least restrictive Creative Commons license, as it removes all copyright restrictions from content and thereby places that content in the pubic domain.  A CC0 license can only be applied by authors or holders of copyright and related or neighboring rights over the content.  The Public Domain Mark can be applied by anyone to content that is already free of copyright restrictions. 

The ODC Public Domain Dedication and License places the data and database in the public domain and allows unrestricted sharing, reuse, reproduction, and adaptation of the database with no restrictions.

One can also apply a separate Database Contents License (DbCL) to the contents of a database licensed under the ODbL, which waives all rights in the individual contents.

Attribution licenses

CC BY allows other users to copy or remix the work in any way, including for commercial purposes. All users must do is attribute the work to the original creator. The creator should provide a link to the Creative Commons page explaining the user’s responsibilities, and as suggested by item 5 in the Open Knowledge Foundation’s list, provide an easily-accessed list of creators. The Open Knowledge Foundation’s web content is licensed as CC BY.

ODC-By allows users to copy, distribute, and use the database; to produce works from the database; and to modify, transform and build upon the database.  In turn, users must attribute any public use of the database or works produced from the database and make clear to others the license of the database. 

Attribution share-alike licenses

CC BY-SA has restriction for the user in addition to attribution. Users have free reign to copy, transmit and adapt the work, but anyone who remixes the work into something new must distribute that work under the CC BY-SA license, or a similarly open license. Wikipedia’s content is licensed as CC BY-SA.

Under the Open Database License (ODbL) users can do the same as above but must also keep the database open technologically and offer any adapted version of the database or works produced from it under the ODbL.  This license limits commercial reuse of the database or its contents, whereas the ODC-By does not.

Discussion

All of these licenses fulfill items 2, Redistribution, 3, Reuse, 5, Attribution, 7, No discrimination against persons or groups, and 8, No discrimination against fields of endeavour, on the Open Knowledge Foundation’s definition of open knowledge. These are the items that can be addressed in licensing; the others are matters of format. Item 8, regarding commercial ventures, is explicitly addressed in the About the Licenses page on the Creative Commons.

Considerations, then, turn to how well each license fits Knowledge for All’s core values and best allows Knowledge for All to reach its goals in delivering open journal content to all.

The public domain licenses best facilitate openness and accessibility in the sense that they put the fewest restrictions on data use and allow data to be most open. However, these licenses could also impede accessibility if using a public domain license means not being able to harvest data from existing data sources that have an attribution license. Known data sources that require attribution include the following:

Some of these data sources have a small amount of data while others have a large amount of valuable data.  This list only includes data providers who make their licensing terms explicit.  Not being able to harvest data from these providers for the Knowledge for All system could significantly slow down the project’s progress and require additional resources.

Using an attribution share-alike license limits openness to some degree because it requires those who reuse Knowledge for All data and content to apply a similar license or provide an open version of a proprietary product they build with the licensed data.  On the other hand, it further supports openness and availability of data by requiring that the data remain open, which aligns with Knowledge for All’s values of openness and sustainability. 

Using an attribution license does not make data any less open, as it only requires attribution.  An attribution license would allow Knowledge for All to use the above data sources and would showcase the collaborative aspects of the project by recognizing all data contributors, indexers, and editors.  “Attribution stacking” is one commonly discussed disadvantage of attribution licenses, whereby attribution credits become large and unwieldy, but keeping records of data sources and enhancers is a good administrative practice, as well as a way to recognize contributors and cite the authority of data.

Using an attribution license would also contribute to the sustainability of the Knowledge for All project as it would mean that those who use Knowledge for All data for other projects would credit Knowledge for All for the data, thereby marketing our project and directing new users to our site. 

Recommendations

It is recommended that Knowledge for All apply a CC-BY license to their content, such as website content, marketing materials, and documentation, and a ODC-BY license to their data, such as journal and journal article metadata.  This allows Knowledge for All to follow its core values and conform to the Open Knowledge Foundation’s definition of open knowledge while also supporting its sustainability and ensuring access to larger amounts of existing data.

Bibliography

Admin. Comments on the Science Commons Protocol for Implementing Open Access Data (2009 February 2).  Open Knowledge Foundation Blog. http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/

Ball, Alex.  How to License Research Data (2011). Digital Curation Centre http://www.dcc.ac.uk/resources/how-guides/license-research-data#x1-20002

Creative Commons (2011). About the licenses. http://creativecommons.org/licenses/

Open Data Commons (2011). Licenses. http://opendatacommons.org/licenses/

Open Knowledge Foundation (2011). Open Definition. http://opendefinition.org/okd/

Wikipedia (2011). Wikipedia:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License. http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribut...

 

Legal documents needed

Based on a review of publicly available legal documents for similar projects and ongoin analsysis of the Knowledge for All project, it is recommended that Knowledge for All compose the following legal documents.

Terms of use

The Terms of Use should cover:

  • User of content on site
  • Copyright status of content contributed by users
  • Use of software
  • Copyright status of "look and feel" of database
  • Acceptable use of site, including unlawful uses
  • How copyright violations will be dealt with
  • Disclaimer of liability
  • Content standards
  • Termination of account
  • Deletion of content
  • Viruses and hacking
  • Links to other sites

Contributors should be made to agree to the Terms of Use when they create an account.

A working document that addresses the first two items has already been created.

Privacy Policy

The Privacy Policy should cover:

  • Information about users collected by Knowledge for All
  • Where information about users is stored
  • How information about users is used
  • Disclosure of information
  • Deleting account
  • User's right to access information about themselves

FAQ

There should also be a FAQ on the website that summarizes many of the above legal information in plain language.

Further legal research needed

It is recommended that Knowledge for All consult with legal experts with extensive knowledge of international copyright laws. All approaches recommended here should be confirmed with legal experts and they should assist with creating all legal documents. It is also recommended that Knowledge for All contact and consult with organizations that have carried out similar large-scale projects, such as Zetoc.