Knowledge for All is developing an open, comprehensive, and interdisciplinary scholarly journal index through the support and collaborative efforts of libraries and researchers around the world. We are currently in Phase 1: Planning and Development and entering Phase 2: Product Development in February 2012. As part of the Planning and Development Phase we have created strategic planning documents that outline how the Knowledge for All project will be implemented and operated in Phase 2 and beyond, in the areas of content development, human resoures and workflows, technology, and community engagement. Please select one of these areas below to learn more.
Libraries have been in the business of sharing book or monograph records for a long time, and extensive infrastructure exists to facilitate this process, from centralized record creation and distribution facilities to defined data standards and formats. Citation or journal article records, on the other hand, have largely been monopolized by research database vendors, who sell libraries access to searchable sets of records at high prices. Libraries' recent surge toward adopting “discovery layers” that allow users to search journal articles, monograph records, and and other resources simultaneously has created an additional need for access to aggregate citation data, and vendors are capitalizing on this need by selling access to citation data collections at equally high prices.
While the need for libraries to have their own citation research databases and collections of citation data that they can use as they please is clear, having a centralized repository and record sharing system for journal article records is new. For ideas and models we can look to smaller scale citation data projects, reference management software tools and repositories, and broader digital collections such as institutional repositories. However, many questions need to be addressed, including what metadata standards and data elements will we use, how will we define the collection parameters, how will we generate the data, and how will we categorize and classify the records? The Content Planning section of the Planning and Development Research Report examines these questions. Recommendations are made based on information collected through a review of similar projects, literature searches, and informal interviews with key informants in the scholarly community.
This document was prepared by Amanda Stevens in April 2011, with assistance from students in Dalhousie University's School of Information Management's Digital Libraries class of fall 2010.
A formal collections policy for Knowledge for All should be developed and approved by the Board of Directors and/or appropriate Committees and members. The collections policy should delineate what will and won't be included in the Knowledge for All digital collection. Thus far it has been decided that the Knowledge for All collection will consist of journal and journal article metadata for all published scholarly journals, but within that are many grey areas which will be explored here. Recommendations are based on information found in library and information studies scholarly literature, publications by academic libraries, and interviews with researchers.
It is recommended that Knowledge for All adopt a collections policy that allows for gradual collections development and expansion, starting off with a narrow focus and outlining a schedule to expand over time. This would allow the collection to expand as the project becomes more established and acquires more resources.
The Collections Policy should define what is a scholarly journal in contrast to a book, magazine, grey literature, or other types of publications. Here is a working definition:
A scholarly journal is a publication that is published annually, semi-annually, quarterly, or monthly in print and/or electronic format. Its primary purpose is to to report primary results of research or overviews of research results to other researchers. Articles are written by experts in the field who cite their sources. Scholarly journals exercise quality control on content, usually through a peer-review system.
One easy way to distinguish scholarly journals from other types of publications is whether it is peer-reviewed. However, there are journals which meet all other criteria but are not peer-reviewed and so could still be considered scholarly. The Knowledge for All community will need to decide whether to include non-peer-reviewed scholarly journals. A search in Ulrichsweb retrieves 28,693 active and "refereed" (peer-reviewed) journals. There is another option to search for "academic/scholarly" periodicals, which retrieves 47,191 publications. However, this list includes things like "1,012 GMAT Practice Questions" and "Allyn Museum Bulletin" so the numbers cannot be considered accurate for our purposes. We are in the process of gathering our own data about journals, but in the meantime we will use Ulrichsweb.
It is recommended that Knowledge for All develop a checklist for definining a scholarly journal.
By this definition, the following types of publications would not be included in the Knowledge for All collection:
However, pre-prints will be included in the collection in order to considerably increase accessibility to free, full-text versions of articles. And it may be desirable or advantageous to include other publication types noted above in the Knowledge for All database in the future, as the project grows or as requested by the community. It should also be acknowledged that scholarly research is constantly evolving, with new forms of scholarly publication appearing, such as blog posts. The Knowledge for All Collections Policy should be frequently revisited and adapted based on the fluid nature of scholarly publishing and the needs of the community.
Within some disciplines there are special types of research sources which may not strictly fit the definition of scholarly journal but could be considered for the Knowledge for All collection because of their importance to researchers in those disciplines. Further consultation is needed with subject experts to identify these sources and determine the importance of these publications. Knowledge for All will then need to consider what would be required to accommodate inclusion of these sources in the collection and then determine the best approach. Here is a working list of these publications and their disciplines:
Additional researchers should be consulted to identify other special types of publications relevant to specific disciplines.
It is recommended that we begin with an initial list of journals that fit into the standard definition of scholarly journal and add other journals and types of publications if recommended by the community. A community consultation process will be developed that facilitates making these decisions.
A major strength of the Knowledge for All citation database is that it will allow users to to search across disciplines, and so the aim is to include journals from all subject areas, including scientific, technical, and medical; law; social sciences; humanities; and fine arts.
During the initial stages of building the Knowledge for All community, we may not have contributors with the expertise to index in all subject areas, or we may not have access to full-text journals in all subject areas. Thus, it may be necessary to limit by subject area during initial pilot and development phases.
Until the 1990s all journals were published in print format and article-level metadata was available in printed indexes. The first electronic journals began appearing in the late 1980s (Langschied, 1991), but did not become significant until the 1990s, when there was rapid growth. In 1991 there were 110 peer-reviewed electronic journals and by 1997 there were approximately 1,049 (Chan, 1999). A search for “refereed” and “online” active journals in Ulrichsweb, a comprehensive periodical index, yields 21,610 results while a search for just “refereed” active journals yields 28,693 results. Thus, approximately 75% of current peer-reviewed scholarly journals are published electronically. Many older print journals and their metadata are now available electronically. It is recommended that Knowledge for All include both print and electronic journals in its collection. However, we could choose to focus on electronic journals if metadata for these journals is more readily available.
Harvestable metadata could be more readily available for some journals for a number of reasons, and the Collections Policy could specify that these journals are collected first. This could include focusing on free and open access journals. The Collections Policy could also favour free and open access journals under the assumption that users will prefer to be able to link to full-text not dependent on access via subscription.
The first peer reviewed journals were published in 1665. In the 19th century there was an explosion in the number of journals produced caused by the increased specialisation and diversification of academic research and also inexpensive mass publication on cheap wood pulp based paper. Another growth period occurred post-WWII and commercial publishers began to take up journal publishing. In 1962 it was estimated there were around 30,000 scientific and technical journals (Bourne, 1962). Data regarding number of journals published during different time periods will be collected. As Knowledge for All aims to index all published scholarly journal literature, it will not exclude older publications. However, due to data availability issues, the resources required to index past literature, and the prioritization of research published in the last ten years for most disciplines, it is recommended that we focus first on current and recently published research, then work our way back.
It is recommended that Knowledge for All aim to include scholarly journals published in every language in its collection in order to truly be an international project. Indeed, providing localized international access to multilingual resources could be a unique advantage of K4All over other databases. A search in Ulrichsweb finds there are 25,528 refereed periodicals published in English and 3,165 that are not published in English. Some of the English language journals are also published in other languages or contain some text in other languages, but it is impossible to determine how many using Ulrichsweb. The number of non-English scholarly journals may increase if we broaden the definition of scholarly beyond peer-reviewed.
However, in order to include non-English scholarly journals for non-English speakers in the Knowledge for All collection, we will need to have:
Internationalization of Knowledge for All will be discussed further on the Internationalization page and in the Technology Plan.
It is recommended that we first focus on the collection English journals until the project is established enough internationally to include non-English language journals.
A final option for gradual collection development is to focus initially on high impact journals.
Bourne, Charles P. “The World's Technical Journal Literature: An Estimate of Volume, Origin, Language, Field, Indexing, and Abstracting.” American Documentation (April 1962): 159-168.
Biblarz, Dora. Guidelines for a Collection Development Policy Using the Conspectus Model. International Federation of Library Associations and Institutions - Section on Acquisition and Collection Development (2001). Retrieved from http://www.ifla.org/VII/s14.
Chan, Lisa. “Electronic journals and academic libraries.” Library Hi Tech 17.1 (1999): 10-16.
Langschied, Linda. "The changing shape of the electronic journal." Serials Review 17.3 (Fall 1991): 7-13.
As detailed in Collections policy recommendations, the end goal of Knowledge for All is to collect journal article metadata for all current and past scholarly journals in all subject areas and languages. Metadata elements details specific data elements needed for different types of content in the database. This document identifies and analyzes different methods for collecting and generating past and current journal article metadata. These ideas were generated through reading discussions on relevant listservs; informal interviews with librarians, researchers, and developers; extensive Internet searching for metadata repositories; and initial research conducted by Carly Currie, Mary Zazelenchuk, Andrea Crabbe, and Alyssa Graybeal.
Most of the journal article data needed is factual data (such as title, author, year), which, as discussed in Copyright of journal article metadata, can be more easily harvested from existing collections of metadata without violating copyright compared to subject terms and abstracts, which are subject to copyright as literary works. Below I identify potential sources of factual journal article metadata and methods for collecting or generating factual journal article metadata. The accompanying Metadata Sources document lists specific collections of journal article metadata that could be potentially harvested or acquired for the Knowledge for All database. These are by not means exhaustive, but they give a sense of what is available and what to consider in selecting methods and sources. They are categorized by type of metadata source, which corresponds with the categories noted here. I address subject indexing and abstracts in a separate document.
An important partner and resource in collecting journal article data is the Open Knowledge Foundation's Open Bibliographic Data Working Group, which is making different kinds of bibliographic data open and harvestable. It is recommended that Knowledge for All work closely with this group to share strategies and resources.
Summary of general recommendations for metadata collection and creation:
Publishers may be willing to provide their journal article metadata to Knowledge for All as a means of publicizing their content. Willing publishers could enter into a data sharing agreement with Knowledge for All and either upload their data into the system as it becomes available or have their data automatically harvested at regularly scheduled intervals. We could request both current and past data from publishers.
This option may appeal more to open access publishers and non-profit publishers. Some open access publishers, such as Public Library of Science, have already been contacted and have expressed a desire to share their data. Other open access and non-profit publishers should be approached. Commercial publishers should also be approached, especially smaller ones, and possibly when Knowledge for All has reached a more advanced stage of development and popularity.
Another group of publishers that may be interested in providing their journal article metadata is organizations that publish journals in non-Western countries and languages other than English, as this group may feel neglected by commercial publishers and want a new way to publicize their content.
Publishers are not included in the Metadata Sources document, with the exception of Public Library of Science and BioMed Central, because there are so many.
Institutional repositories (IRs), or collections of research articles, theses, and dissertations by university faculty and students at a particular institution, are becoming increasingly common at universities. They are usually managed by university libraries and sometimes accompanied by open access policies that require faculty to deposit copies of all of their published works in the IR. There are also online repositories of eprints, or digital versions of research articles, that are not restricted to a particular institution and usually subject-specific, where authors are encouraged to deposit copies of their articles. Articles deposited in IRs and eprint archives are often pre-prints, or first drafts that have not yet undergone the peer-review editing process. This is due to copyright issues, although some journals also allow authors to deposit post-print versions of articles in repositories. The Self-Archiving FAQ for the Budapest Open Access Initiative (BOAI) states that sixty-eight percent of journals allow self-archiving of post-print articles, while 32% do not. Journal policies for self-archiving can be searched in the Sherpa Romeo site.
Most institutional and eprint repositories follow an open access model and so would likely be willing to provide Knowledge for All with their metadata. In fact, their data is often made available via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). IRs are not included in the Metadata Sources list because there are too many of them, but subject-specific repositories are. The Canadian Association of Research Libraries (CARL) Institutional Repositories Pilot Project Harvester is a search tool for searching across IR content from participating Canadian institutions, and could be a useful tool for locating IRs, as well as potentially harvesting metadata. OpenDOAR is a directory of academic open access repositories with full text resources. Open DOAR can be used to search repository contents as well as repositories but not to harvest metadata as it only searches Google indexes.
One problem with this source and method is articles would be acquired individually so it may be challenging to collect complete journal holdings. This method would need to be used alongside other methods. Also records may lack complete metadata relating to their publication in journals and this additional metadata may need to be collected separately.
The fact that these articles may be pre-prints necessitates having a field that indicates the version of the article and ensuring that the record links to the appropriate full-text version of the article. If an article record is obtained for a post-print version of the same article, the two records should be linked.
Many journals provide a free table of contents (TOC) subscription service in which subscribers receive the table of contents for upcoming issues of journals via e-mail or RSS feed. There are also services such as JournalTOCs or TicTOCs which provide a search interface for journal TOCs and a means of subscribing to multiple journal TOCs at once. This could be a means of acquiring current but not past journal article data. Some TOC services are provided via e-mail while others are available via RSS feed. RSS feeds are in XML, and so the data could easily be adopted into the Knowledge for All database. TOC services are included in the Metadata Sources document.
One issue is the quality of data and schemas of TOCs will vary among publishers. In terms of copyright issues, it may depend which TOC service is used. An administrator of ticTOCS Journal Tables of Contents Service stated, "ticTOCs merely directs you to the publisher's feeds, therefore any questions about re-using the content of publishers' feeds for an OA citation database should be directed at the individual publishers," whereas JournalTOCs has an API that provides free access to the metadata that has been collected by JournalTOCs. The JournalTOCs API is a very promising avenue to collect journal article data as well as journal data and should be investigated further.
Numerous free, non-commercial online citation databases, like Knowledge for All, already exist. These primarily index journals in specific subject areas and have been created and maintained by scholars that specialize in those subjects, such as Latin American Periodicals Tables of Contents, and databases that index open access journals, such as the Directory of Open Access Journals (DOAJ). Many of the subject-specific databases are called “table of contents” services by their creators or even have the words “table of contents” or “TOC” in their names, but they differ from TOC services like TicTOCs in that they provide past as well as current journal article metadata. While the databases of open access journals are fairly sophisticated and follow current data standards, there is a lot of variation in the type of data and format of subject-specific databases, and the usability of the data varies. Citation databases are included in the Metadata Sources list.
Citation databases present an important opportunity to Knowledge for All on multiple fronts. They make good potential partners due to sharing similar values and goals – both as metadata sources and indexers. Citation databases were likely created because certain subjects are not well represented in commercial databases or their creators want to provide free and open access to research. When contacting these organizations about partnerships, Knowledge for All should emphasize that it aims to be comprehensive and provide free and open access to research. Knowledge for All can offer a better interface and search features, more complete data, a larger community of contributors, and more comprehensive content than most of these sources. Some citation databases have already been contacted and expressed willingness to contribute their data to Knowledge for All. Some have mentioned challenges with maintaining their own systems.
A concern with citation databases is data from some sources is not in an easily harvestable or digestible format. For example, the LINGUIST citation database only has citation data in HTML. In some cases we will need to determine whether the time and effort it will take to harvest data in a challenging format is worth it. The organizations that maintain citation databases may have limited resources and expertise to provide their data in other ways.
In order to offer full-text searching to users, Knowledge for All will aim to acquire PDF files of full-text journal articles to extract and index the full text without actually storing the PDF files or making them available to users. This method could also be used to harvest citations or references in articles. In addition, journal article metadata could be extracted from PDF files using Zotero or a similar tool or process, then added to the Knowledge for All database. Contributors could upload PDF files for extraction, but it is recommended that a further process be created to upload and harvest PDF files in bulk if this is selected as a primary method of collecting metadata. Metadata extracted from PDF files is not always accurate and will need to be edited.
Use of this method depends on having access to PDF files for all articles, which then depends on having an established community of contributors with access to full-text PDF files. They, in turn, will need to ensure their institutional licenses do not prohibit metadata harvesting of PDF files for journal articles. Another issue is that PDF versions of articles may not exist or be available for all older journal content. In terms of full-text searchability, Knowledge for All may not be able to provide full-text searching for all articles in the database but could aim to provide it over time.
Reference management software, which is widely used by scholars, allows people to build personal citation databases for the purpose of storing and accessing research literature and easily generating bibliographies. Some reference management tools additionally create centralized databases of user-added citations. Thus, Knowledge for All could appeal to both individual users and organizations that manage these tools to contribute their citation databases to the Knowledge for All project. Infrastructure could be created that would allow ongoing sharing of records added to centralized reference databases or personal reference databases with Knowledge for All. In return, Knowledge for All could offer the following:
Zotero, a popular open-source reference management tool maintained by a non-profit organization, is a promising potential partner that has shown interest in a similar project through its partnership with the Internet Archive. Zotero, Mendeley, and CiteULike are included in the Metadata Sources list, but the latter two are less likely partners due to being closed source and operated by private companies.
One issue with using this source of metadata is varied metadata quality and the need to sort, combine, and edit many duplicate records. Like with institutional repositories and eprint archives, journal holdings will not be complete. The Knowledge for All system would need to be able to ingest data in various reference management formats, including RIS, BibTeX, and Zotero RDF.
It would also be possible to design Knowledge for All so that masses of users could upload data in the style of a reference management software project, but this might be a redundant effort compared to partnering with or utilizing an existing project.
The least desirable method of collecting journal article metadata is for contributors to enter it by hand, since this would take the most time and effort. However, it may be necessary for some articles and journals where we are not able to acquire the data using any of the other methods or sources noted here.
Even with utilizing any of the above automated metadata collection methods, there will inevitably be missing data to be added and errors to correct. One thing that will distinguish Knowledge for All from other citation search tools is its high quality metadata, and so having robust quality control and editing processes is essential. A large number of volunteer editors will be needed to carry out this work. Clear and precise data standards should be created, agreed upon, maintained, and distributed to editors to ensure consistent and high quality metadata.
After creating a collections policy for Knowledge for All that defines what types of journals will be collected, Knowledge for All will need to develop a comprehensive list of all past and current scholarly journals that will be included in its collection. The following data should be collected for each journal:
This information is needed for estimating resources required, planning workflows, and determine how and where journal article metadata can be collected. Knowing how many journals are published, how often, in which subject areas, and in which languages, as well as numbers of past journals that need to be indexed, will help determine how many volunteer indexers and editors are needed, what subject and language expertise they must have, how much time it will take to collect data, and which metadata sources can be used. In addition journal data is an essential component of the metadata needed in the operational Knowledge for All database. Ulrichsweb was used to gather some of this aggregate data to make collections policy recommendations, but more complete data is needed – particularly as we finish planning and begin operations.
Some preliminary work has been done to gather the journal data, but this document will mainly explain how further data will be collected.
Many of the sources listed in Metadata Sources also contain journal data, but in varied forms. Some, such as the Directory of Open Access Journals, provide a file of journal data in CSV format that is easy to harvest while others simply have HTML pages of journal lists. Data in easily harvested formats will be collected first, with other sources being used to fill in gaps if necessary. Additional sources of journal data only have been located through Internet searching, which are not currently listed with Metadata Sources. The Open Knowledge Foundation's Open Bibliographic Data Working Group is another source of journal title data that is constantly growing.
The source of journal data which is most comprehensive in terms of probably providing all titles published is Simon Fraser University's (SFU) CUFTS Knowledgebase, a journal database used primarily for libraries to link to their full-text holdings. SFU has provided its Knowledgebase files to Knowledge for All, but the data is dispersed over 371 files that include duplicate entries and non-journals, and every file does not have the same fields (although the same field names are used in every file).
After considering the above information and consulting with others, it has been determined that the following steps should be taken to collect journal data:
Knowing what elements of data will be included and managed in the Knowledge for All system is essential for designing and developing the technological infrastructure and approaching content collection and creation. Metadata elements have been identified through examining existing citation databases in a variety of subject areas and reading scholarly literature about metadata in digital collections.
The Knowledge for All database will contain data about 3 main types of content: journals, journal articles, and scholars/authors. These content types will be related, as journal articles will be part of journals and scholars will be linked with journal articles through the author/creator field. A journal issue content type could also be used to link articles to journals. Alternatively journal issue information could be included in every article record, but it may be desirable to have a separate journal issue content type in order to minimize data needed for journal article records and organize indexing workflows. Another significant node type in the Knowledge for All system will be contributors (indexers, editors, developers, etc.), but that will be addressed in the Contributors and Workflows section of the planning documentation. There may be considerable overlap between scholars and contributors.
Below I have made initial recommendations for data elements needed for each type of content. These elements are not mapped to any particular metadata schema.
Some subject-specific databases include other metadata elements which would largely only be relevant within that subject. These include classification, population, location, age group, tests and measures, grant information, and methodology for psychology; type of literature, time period, subject author, subject work , literary theme, literary genre, and media for literature; and study design, place of study, period of study, materials, methods, edited by, and reviewed by for medicine. We intend to include these in the Knowledge for All system to allow for highly refined searching within disciplines. However, the fields will only be searchable if a user is searching within a particular discipline rather than doing a general, interdisciplinary search, and if the data for specialized fields is not available elsewhere volunteer indexers will need to create original data.
An article's list of citations or its bibliography is not a necessary element but ideally it will be included to allow for citation analysis. As discussed in Legal Issues, citations may be protected under copyright.
Additional administrative metadata which could be included for the above content types include:
Other administrative metadata elements will likely be added as the system's technical infrastructure and workflows are developed.
As the Knowledge for All database will not actually include digitized objects, metadata related to preservation of digital objects is not relevant.
Scholar names/Personal authors:
Other data possibly needed for name disambiguation (discussed further in Scholar Name Data Collection and Creation):
In selecting metadata schemas or standards to use for the Knowledge for All citation database, it is important to consider the information or metadata elements needed for different types of content in the database (outlined in Metadata Elements) standards used by sources that metadata will be harvested from (noted in Metadata Sources), standards used by institutions that will be harvesting data from Knowledge for All, and standards used by other tools that may be used in the data harvesting, creation, and editing process. There are many different metadata schemas for bibliographic description in use by different institutions and standards are always changing, so interoperability is the key as opposed to conforming strictly to one particular standard. This process will inevitably evolve as workflows are developed, data harvesting methods are determined, and technical infrastructure is designed. Here are some initial recommendations. In addition to considering standards and schemas used by other repositories, I reviewed research on metadata standards used in digital libraries and specifically metadata standards used for electronic journal article data. Initial research by Vanessa Black and Rebecca Prescott was also utilized.
There is no standard metadata schema for journal article records. The most common bibliographic metadata schema for digital materials in general is Dublin Core. Although Dublin Core is widely recognized as insufficient for describing journal article metadata due to its simplicity and limited number of elements, many institutions still use Dublin Core because of its interoperability and wide use. The Dublin Core Metadata Initiative Citation Working Group analyzed this issue and published Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. Many institutions which have chosen to use Dublin Core for journal articles have also documented how they adapted it to meet their needs (see references). As stated by Apps and MacIntyre (2002), “Dublin Core should remain a ‘core’ set of metadata elements, with domain-specific metadata recorded according to more complex standards, whether extensions to Dublin Core or separate standards.” In addition, Dublin Core is the metadata standard required by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which itself is quickly becoming the standard through which institutions make their metadata available. It is recommended that Knowledge for All use a form of Dublin Core or incorporate Dublin Core elements into its metadata schema.
Metadata Object Description Schema (MODS) is another important metadata schema for descriptive bibliographic data because of its wide use and compatibility with MARC, the data format in which libraries exchange bibliographic records. Like Dublin Core, it is built on an XML foundation. Although not widely used for journal article metadata, MODS is a more complex and specific schema than Dublin Core and can be used in conjunction with Dublin Core for more specificity and granularity.
The Open Knowledge Foundation's Open Bibliographic Data project has developed BibJSON, a simple description of how to represent bibliographic metadata in the JSON format, and is using this schema for its open bibliographic data. This option should be examined further.
Metadata Encoding and Transmission Standard (METS) is a standard for encoding descriptive, administrative, and structural metadata that could be used to bundle multiple metadata sets together.
Allinson, Julie, Pete Johnston, and Andy Powell. “A Dublin Core Application Profile for Scholarly Works.” Ariadne 50 (January 2007). Retrieved 8 April 2011 from http://www.ariadne.ac.uk/issue50/allinson-et-al/.
Apps, Ann and Ross MacIntyre. “Dublin Core Metadata for Electronic Journals.” Lecture Notes in Computer Science 1923 (2000): 93-102. Retrieved 31 March 2011 from http://eprints.rclis.org/bitstream/10760/12183/1/appsmacecdl2000_full.html
Apps, Ann and Ross MacIntyre. “zetoc: a Dublin Core Based Current Awareness Service.” Journal of Digital Information 2 (2002). Retrieved 31 March 2011 from http://journals.tdl.org/jodi/article/viewArticle/39
Dappert, Angela and Markus Enders. “Using METS, PREMIS and MODS for Archiving eJournals.” D-Lib Magazine 14.9/10 (September/October 2008). Retrieved 4 April 2011 from http://www.dlib.org/dlib/september08/dappert/09dappert.html
Metadata Elements notes metadata that should be collected about scholars or authors of journal articles. This page explores how that data will be collected and managed. It is the result of a literature search on scholar name data and name disambiguation, an examination of name schemas and data systems, informal interviews with key informants, and initial research by Linda MacAfee and Robert Martel.
One issue to deal with regarding scholar/author names is name disambiguation. When more than one person shares the same name, which will certainly be the case in the Knowledge for All database, it is important to have a means by which authors and their associated works can be identified and similar names can be disambiguated. It will lead to more precise searching and make citation analysis possible. There are various tools available or in development that facilitate name disambiguation. The ideal tool would limit the names to scholars who publish, provide rich metadata about each scholar, provide unique name identifiers, and provide open data. Currently, it does not appear that such a resource exists. Current options are noted below.
The Open Researcher and Contributor ID (ORCID) project “aims to solve the author/contributor name ambiguity problem in scholarly communications by creating a central registry of unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID and other current author ID schemes." It is a promising project but is still in the beta development stage, so it is uncertain whether this tool will meet Knowledge for All's needs for name disambiguation. We do not know yet what data will be provided for scholars, how that data will be made available, or how the tool will work. ORCID's About page states that in 2012 they plan to start charging fees to organizations who wish the use the service.
Friend of a Friend (FOAF) is a semantic RDF ontology used to describe people, their activities, and their relationships with objects and each other on the web. It does not involve a centralized database of information about people but rather is a method by which that information is made available in a standardized format. It may enable Knowledge for All to harvest scholar/author information from multiple sources. However, it is not limited to the scholarly community.
ArXiv, an open e-prints archive, has an authority records system that assigns author identifiers in an attempt to disambiguate author names and enable retrieval of all publications by a particular author. It has some limitations and is by no means comprehensive but the data is open.
There are various closed-access databases of scholar names, such as Thomson Reuters ResearchID and Scholar Universe. This tool could potentially be used by contributors with access through their institutions but is not reliable due to its closed status.
The Library of Congress Authority Files are the most established and widely-used name authority records. They are free to search but can only be downloaded one at a time. Names are uniquely identified by adding additional biographical information, such as birth and death dates. Authority records are based mainly on book publications rather than journal publications, and so may not be adequate for Knowledge for All.
The Virtual International Authority File is a joint project of multiple libraries, implemented and hosted by OCLC, that aggregates library authority files and makes some attempt to disambiguate names. It is currently available to organizations that apply and are accepted to be members, which requires contributing name authority data that meets certain criteria. In the future the service will be freely available.
The International Standard Name Identifier (ISNI) is a standard for assigning unique identifying numbers to names, similar to the ISBN. Authors must register for a number and the database is not publicly searchable.
Knowledge for All could utilize one of these external resources to identify and disambiguate scholar names or create our own internal system of author name disambiguation and unique identifiers within the database. This process is explored by Torvik and Smalheiseracm (2009). In his article, “Metadata for Name Disambiguation and Collocation” (2010), Jeffrey Beall recommends collecting the following additional data for name disambiguation, as necessary:
Beall, Jeffrey. “Metadata for Name Disambiguation and Collocation.” Future Internet 2.1 (2010): 1-15. Retrieved 6 April 2011 from http://www.mdpi.com/1999-5903/2/1/1/
Torvik, Vetle I. and Neil R. Smalheiser. “Author Name Disambiguation in MEDLINE.” ACM Transactions on Knowledge Discovery from Data 3.3 (July 1, 2009): 11.
The Knowledge for All system should have two levels of subject classification for articles and ideally abstracts included in every article record. Each article in the database should be classified in one or more broad subject categories and have multiple more specific subject terms assigned to it. This system of classification will allow us to provide rich searching and browsing capabilities in the citation database. It will also be useful for connecting volunteer indexers and editors with journals in which they have subject expertise.
The primary purpose of broad subject classification in the K4All database is to allow users to limit their searching to a specific discipline or set of disciplines. While the goal should be to classify an article or journal in a single broad subject category (and categories should be broad enough to allow that), classification in more than one category should be allowed for multidisciplinary articles or journals. Broad subjects should be selected and defined thoroughly to facilitate easy classification. This could be a flat or hierarchical list of subject categories. JournalSeek provides a model flat list while Directory of Open Access Journals provides a model hierarchical list.
Broad subject classification terms have been assigned to Thesauri for Subject Indexing records and Metadata Sources records, but the vocabulary has not been 'controlled' yet. The working, uncontrolled list of subject categories can be viewed here.
Classification of articles into one or more broad subject class can either occur at the article level or the journal level. An advantage of classifying entire journals in one or more broad subject categories means that indexers do not need to make this classification for each individual article. However, with interdisciplinary journals or journals whose articles might typically fall into more than one broad subject class, this could result in mis-classification of journal articles. As a compromise, it is recommended that journals be classified under broad subjects but that indexers are able to overwrite that classification on a per article basis if needed, and interdisciplinary journals will be flagged so that indexers assigned to those journals pay special attention to this field and correct if needed.
Subject indexing or applying subject terms from controlled vocabularies to journal articles will be an important aspect of the Knowledge for All system, as it will provide search precision that is absent from many search tools that search for keywords in full-text only, such as Google Scholar, and that use subject terms that are not from controlled vocabularies, which is the case with many commercial databases.
Unless the source's license allows it, Knowledge for All will not be able to copy subject terms from existing journal article records due to copyright restrictions. This is discussed in more detail in Copyright of Journal Article Metadata. There may be exceptions when the subject terms were chosen from the same thesaurus used by Knowledge for All (such as the ubiquitous MESH) and the metadata record is part of an open data set. But otherwise volunteer indexers will select subject terms from controlled vocabularies for all articles in the Knowledge for All database.
There are currently many thesauri available online for different subject areas and in different languages. They are being located and listed on the Thesauri for Subject Indexing page. Once a near-complete list is composed, the Knowledge for All community will need to decide which thesauri will be used for different subject areas and ensure that indexers consistently use those thesauri.
It may be necessary to adapt and modify existing thesauri for Knowledge for All or to import terms and structures into the Knowledge for All system. If so, thesauri that allow this should be selected or Knowledge for All can request permission from thesauri maintainers. Another option is to develop thesauri from scratch for Knowledge for All, but this takes considerable time and skill and so should be avoided where possible.
Some original thesauri construction and modification will likely be necessary to sufficiently represent and describe all subjects covered by all published scholarly journal literature, particularly in areas that may be neglected or misrepresented by existing thesauri. These gaps will be identified once a complete list of existing thesauri is compiled and these thesauri are analyzed by subject specialists or as indexers begin to use them. Thesauri construction and indexing can be controversial due to the politics of language and naming (de la tierra, 2003). It is recommended that Knowledge for All make every effort to include diverse perspectives in construction and modification of its thesauri.
Copying of abstracts from journal metadata records is also restricted by copyright, unless the source's license allows it. As discussed in detail in Copyright of Journal Article Metadata, the creation of abstracts for most articles in the system would take considerable time and is not a feasible option. Thus, it is recommended that Knowledge for All find a way around the copyright issue with abstracts and make every effort to provide access to existing abstracts of articles in the database.
de la tierra, tatiana. "Latina Lesbian Subject Headings: The Power of Naming." Radical Cataloging: Essays at the Front. Ed: K.R. Roberto. Jefferson: McFarland, 2008 (94-102).
In the spring of 2010, Knowledge for All shared an innovative proposal, calling on the international library and academic community to collaborate in developing its own truly open research tool for scholarly journal literature. A bold response to skyrocketing access fees for scholarly journal databases, the proposed tool would be freely available to public users via the web as well as private users wishing to customize and adapt it to their own environments. The proposal was met with a welcome reception and, with funds granted by the Council of Atlantic University Libraries (CAUL), it continues to move toward realization.
One year later, the project has successfully launched into its initial Planning and Development Phase (Phase 1) and is on track to prototype launch scheduled for January 2012 when Phase 2 of the project is set to begin. Leading up to this date, engaging stakeholder groups and the international library and academic community in becoming active participants in its development is critical. This document will serve as a guide for facilitating this process. It will inform the project’s approach to strategic engagement leading up to Phase 2, and on an ongoing basis, by:
The goals and strategies presented in this document are not intended as a final plan, but rather as a guide for facilitating meaningful engagement on an ongoing basis and a touchstone for engagement activities. They respect that, as an open and dynamic project, needs may change as the community involved in the project’s development grows. Consistent with the nature of the project itself, an open invitation for feedback, improvement and participation in their execution stands. That being said, what follows has taken into careful consideration the challenges and needs of stakeholder groups, community actors, and of the project itself to date, and represents an informed, creative effort to carry the project into its next phase and beyond.
Knowledge for All is a bold response to barriers in access to scholarly journal literature. Increasingly, these barriers are financial, spurred on by rising database subscription fees that severely limit the ability of libraries, institutions and managers of academic works to facilitate meaningful access to scholarly research. Over the last 15 years alone, the cost of journal subscriptions has increased by over 180% with no sign of stopping. These rising costs have ripple effects that extend beyond just library budgets: students, researchers and decision makers lack access to the latest findings; authors lack a platform to promote their work, receive feedback and gain credibility; members of the public lack access to publicly funded research.
Knowledge for All represents an alternative to subscription-based models of access that are failing the communities they are intended to serve. The project is a move to place scholarly journal literature back into the hands of those who create, manage and seek it, calling on the international library and academic community to collaborate in developing their own truly free, truly open tool for scholarly journal research. The first project of its kind, Knowledge for All utilises open source software and open data to deliver and promote open access to scholarly journal content via its dynamic online interface. The full tool is also available for download and customization by the private user in need of a cost-effective, accessible alternative to commercial products.
Though Knowledge for All stands to benefit creators, managers and seekers of scholarly journal literature across the board, the project has deep roots in the academic library community where it began. Faced with a 120% fee increase for ISI’s Web of Science, University of Prince Edward Island’s Robertson Library declared enough was enough. In an open letter to library patrons, University Librarian Mark Leggott outlined why the increase was not only infeasible, but unacceptable. The fee hike, combined with a challenging fiscal climate and a restrictive contract posed a major challenge to the library’s ability to effectively connect faculty, students and members of the campus community with journal content - contrary to the very purpose of scholarly journal databases. The letter announced that, as such, Robertson Library would discontinue its subscription to Web of Science and instead develop a free, open index of scholarly journal literature that would meet the needs of the international academic library community without causing added financial strain.
Following this announcement, a proposal for Knowledge for All was drafted and shared with local and international library consortia, and with funds granted by the Council of Atlantic University Libraries, the project launched into operation. Now on track to product launch scheduled for November 2012, the project continues to gain momentum and support and represents a collaborative effort to tackle challenges in access to scholarly journal literature. It invites participation from libraries, research institutions, journals, publishers, students, authors and members of the international library and academic community at large to participate in developing a tool that is built by and for them, decreasing reliance on expensive commercial products.
At its core, the Knowledge for All project is based on principles of
Openness. Knowledge for All is developed using open source software, open data and aims to facilitate open access to scholarly journal literature as much as possible.
Accessibility. Knowledge for All aims to lift all barriers in access to scholarly journal literature, whether financial, legal, formal, linguistic or otherwise.
Collaboration. Knowledge for All is a flat organization that invites the international library and academic community along with members of the public at large to contribute and benefit from its development as equal participants regardless of institutional or individual affiliation.
Interdisciplinarity. Knowledge for All content aims to span all academic disciplines and represent a diverse range of subject areas.
Accountability. Knowledge for All is governed, developed and maintained by the very same community it is intended to serve and invites continuous feedback and input in order to ensure needs are met.
Sustainability. Knowledge for All is designed as a long term, dynamic solution intended to evolve and grow with the needs of the community it serves.
Knowledge for All emerges at a unique moment when opportunities for better access to scholarly journal literature are at once, incredibly limited and incredibly abundant. Limited because under the strain of a weakening economic climate and shrinking library budgets, commercial providers of access to scholarly journal literature continue to increase subscription fees, placing greater and greater strain on the ability of libraries and other institutions to facilitate access to academic works. Abundant because at the same time, the shift toward more open, innovative models of access continues to accelerate along with developments in the open source, open data and open access movements. Knowledge for All hones in on the abundances in the present landscape. It leverages the momentum building in these open movements to seize the opportunity for better access to scholarly journal literature.
Within this landscape, the impact of Knowledge for All is far reaching. The project’s benefits extend to a wide range of stakeholders which, broadly speaking, can be broken down into three distinct groups: (1) creators, (2) managers, (3) seekers of scholarly journal literature.
Creators include all institutions and individuals involved in the production of research, analysis and content that make up an item of scholarly journal literature. They are researchers, authors, writers, commentators, editors and others contributing original works to the realm of scholarly publication. Their aims in doing so might be to verify existing literature, discover new insights, or solve problems. To achieve these aims they may require access to existing data, findings or expertise or be in need of funds, resources, acknowledgement or other support. Some challenges they face in achieving these aims include:(1) isolation and lack of connection to other related institutions, individuals or published works in their field (2) absence of an effective platform to promote and share their work, (3) limited acknowledgement for their work, (4) high costs and limited funding or other resources needed to complete their work.
Managers include all institutions or individuals involved in the distribution or maintenance of scholarly journal literature. They are librarians, knowledge managers, educators, archivists, curators, advocacy and service groups and others facilitating the discovery and use of published academic works. Their aims in doing so are to preserve the products of scholarly work and to act as connectors between scholarly journal literature and the institutions and individuals who seek or stand to benefit from it. To achieve these aims they require accurate records of available literature and proper tools for collecting, managing and delivering it to others. Some challenges they face in achieving these aims include:(1) absence of adequate tools to meet diverse needs, (2) lack of representation and input in tool development (3) high costs and limited funding or others resources to acquire available tools, (4) difficulty in coordinating widespread distribution and maintenance of large collections.
Seekers include all institutions or individuals involved in the direct consumption and use of scholarly journal literature. They are researchers, students, professionals, policy and decision makers and others in search of relevant analysis, data or findings to support their own pursuits. Their aims in doing so are to deepen their understanding of an area of inquiry, inform and strengthen decision making and improve their ability to solve problems or fulfill their roles and obligations. To achieve these aims they require adequate means of discovering available literature, identifying what items are relevant and reviewing content that might satisfy their search. Some challenges they face in achieving these aims include:(1) absence of adequate tools to discover relevant literature (2) high costs and limited funding for reviewing and accessing content, (3) no guarantee of relevance or usability of purchased content.
Knowledge for All seeks to address these common challenges and needs by inviting institutions and individuals from all stakeholder groups to become active participants in the development of a free, open, dynamic tool that will meet their diverse needs. The tool’s strengths and weaknesses, along with the opportunities and threats it will meet in the current landscape, are outlined below.
Within the project community stakeholders are invited to fulfill various roles that are critical to its ongoing development and success. These roles include:
Users include both (1) institutions and individuals utilizing the tool via its public online interface, as well as (2) institutions and individuals downloading or adapting the tool to their own private environments. In both cases, users are considered as active participants in the project’s development and are encouraged to share feedback and contribute to the project by fulfilling additional roles to ensure their needs as users are truly met.
Contributors are institutions and individuals lending their time and skills to the project on a voluntary basis. Their commitment and contribution may be casual or long-term and may extend to a range of specialized tasks or areas of project development. Contributors may include indexers, developers, translators, or participants in the project’s working groups and committees including the Board of Directors, Advisory Committee, Content Working Group, Technical Working Group, Internationalization Working Group and Marketing and Fundraising Working Group.
Staff are paid contributors to the project who are tasked with coordinating and executing specialized administrative and technical aspects of project development. Staff are contracted for a period of one year or more. As a lean organization, staffing is kept to a minimum as dictated by the needs of the project at each Phase. Staff may include a System Administrator, System Development Coordinator, Data Technicians, Metadata Librarian, Senior Indexer and Engagement Coordinators.
Partners are institutions or individuals lending in-kind resources or collaborative support to the project as Strategic Partners or offering financial support to the project as Funding Partners. Partners may contribute on a casual or ongoing basis.
To facilitate communication and collaboration among stakeholders and community members in all roles the following channels will be utilized:
The project website at www.k4all.ca will serve as a central hub for project information and activity. As Knowledge for All’s public facing site, it will host all project plans and documentation and serve as a connecting point to other project channels via posted contact information, sign up and links. In addition, the project website will function as a community portal for existing and prospective community members in all roles. The portal will be accessed by registration and login via the website and will include interactive features to facilitate collaboration among community members including discussion forums, groups, wiki pages and news feeds.
The project newsletter will be published monthly and will include linked content with relevant updates on key areas of project development including system development, content development and fundraising. It will reach all community members registered on the project website or signed up via the online sign up form.
Social media and online community presence
Knowledge for All will establish accounts on all relevant social media channels to broadcast project developments and connect with community members. Social media accounts will include, but are not limited to LinkedIn, Twitter and Vimeo. The project will also establish presence in relevant online communities such as Wikipedia by establishing profiles or pages.
Project meetings and events
Knowledge for All will establish a schedule of regular meetings for stakeholders and community members to receive updates and share feedback on project development as well as for contributors working on specific aspects of project development. Meetings and events will take place online via web-conferencing. To compliment its regular schedule of project meetings an annual flagship event might also be launched that invites stakeholders and members of the project community to address challenges and developments within the broader project landscape.
To grow and engage an active community of support around the project who will collaboratively develop, maintain and use its proposed tool and ensure the project's sustainability.
Objective 1. Establish project within broader landscape as a viable solution to stakeholder challenge
Strategy 1. Highlight stakeholder challenges and showcase how project addresses them
Action 1. Clearly identify and share evidence of stakeholder challenges on project website via interviews and stories. (Oct 2011)
Strategy 2. Acknowledge other available solutions within landscape and highlight gaps that project addresses
Action 1. Develop product/solution comparison table (Oct 2011)
Action 2. Publish list of available solutions and similar intitiatives (Nov 2011)
Strategy 3. Align project with complimentary projects/initiatives addressing common challenges and establish partnerships
Action 1. Reach out to administrators of complimentary projects/initiatives (Nov 2011, Ongoing)
Action 2. Publish statements on open access, open data, open source to clarify project's positioning with respect to these movements (Nov 2011)
Action 3. Include project in external resource/project directories and lists (e.g. Wikipedia pages, Library eResource/project listings (Nov 2011)
Objective 2. Leverage stakeholder/community experience to showcase project value and elicit participation
Strategy 1. Facilitate storytelling among stakeholders
Action 1. Develop collection of stakeholder interviews/testimonials that touch on common challenges and project solutions (Oct 2011)
Action 2. Broadcast an invitation for community members to tell their stories (Oct 2011)
Action 3. Develop case studies of successful project applications in the community (Jan 2013)
Strategy 2. Invite 'champions' to step forward as project ambassadors
Action 1. Encourage community members to share their project experience within their external networks and facilitate partnerships and participation (Dec 2011)
Action 2. Provide champions with all outreach materials (presentations, brochures, case studies, etc.) needed to share project within external networks (Oct 2011)
Action 3. Allow community members to self identify as 'champions' in user profiles (Oct 2011)
Objective 3. Fill all community roles necessary to proceed with project development and maintenance
Strategy 1. Broadcast opportunities
Action 1. Publish recruitment announcements for all community roles (users, board members, staff, contributors, partners, funders) on all internal project channels (website, mailing lists, groups, discussion forums) (Oct 2011)
Action 2. Publish recruitment announcements for all community roles (users, board members, staff, contributors, partners, funders) on external channels (mailing lists, groups, discussion forums, newsletters, online and offline bulletin boards, volunteer/job boards) (Oct 2011, Ongoing)
Action 3. Connect with resource centers and or representatives of likely participants i.e. Career Centers, Campus community involvement offices, Library professionals' associations (Sep 2012, 2013)
Action 4. Execute contributor recruitment campaigns with focus on recruiting indexing contributors during Phase 1 & 2 (Open Index Week; Data Entry Blitz; Monthly calls for contributors) (Oct 2011, Jan 2012, Ongoing)
Action 5. Participate in external community involvement and/or volunteer fairs and events (Sep 2012, Sep 2013, Ongoing)
Strategy 2. Reach out to stakeholder groups and invite them to fill community roles
Action 1. Develop and maintain comprehensive list of prospective candidates for all community roles and establish contact (Oct 2011, Ongoing)
Action 2. Develop outreach materials to accompany contact including project presentation(s) and information packages that include specialized information relevant to each community role (i.e. specialized info for users, contributors, partners etc.) (Oct 2011, Jan 2012, Nov 2012)
Action 3. Develop project 'wishlist' to clearly identify project needs to stakeholders and invite them to participate in addressing them (Oct 2011, Jan 2012, Nov 2012)
Strategy 3. Acknowledge community member status and activities
Action 1. Allow community members to self identify roles and activities in member profiles (e.g. John Smith is a User & Indexer indexing Journal __) (Sep 2011)
Action 2. Include announcement/sharing options for community members to broadcast their activities in external networks and invite their contacts to participate (e.g. Tweet "I've just posted to the Internationalization Working Group discussion on k4all.ca") (Jan 2012)
Action 3. Select and highlight activities of a featured community member on a monthly basis. Post announcement to homepage and internal channels (Feb 2012)
Action 4. Incorporate goal setting features into member profiles for contributors to track their activities and participation in the project (e.g. Journal Indexing Goal & Journals Indexed represented numerically and/or graphically) (Jan 2012)
Action 5. Incorporate badges and/or ranked status fields into member profiles based on expertise, activities and participation. (Jan 2012)
Action 6. Draft and send thank you messages and goal reminders to community members in all roles (Jan 2012)
Objective 4. Establish central network to facilitate and encourage collaborative participation
Strategy 1. Develop website as central community portal and incorporate community-based design features
Action 1. Move registration to a prominent place on website (Oct 2011)
Action 2. Include links to community portal in all project outreach materials and communications including email signatures (Sep 2011, Oct 2011)
Action 3. Update community member profiles with additional descriptive fields
that allow users to self-identify expertise, interests & activities (Sep 2011, Oct 2011)
Action 4. Enable commenting on all website pages to facilitate feedback (Sep 2011, Oct 2011)
Action 5. Enable RSS feeds on all website pages (Sep 2011, Oct 2011)
Action 6. Include "share this" features on key pages and News items (Sep 2011, Oct 2011)
Action 7. Develop organic groups and discussion boards where registered community members can self-organize (Sep 2011, Oct 2011)
Action 8. Design badges for community members to feature on external networks, blogs and websites (Jan 2012)
Action 9. Revise "Get Involved" page to include immediate action steps (join/start a discussion, comment on our strategic plans, sign up to a mailing list, join a working group, become a contributor, “share” project link in external network, subscribe to RSS feed) (Sep 2011, Oct 2011)
Action 10. Utilize surveys as needed to gather community feedback on specific community issues (Feb 2012, Ongoing)
Strategy 2. Establish presence on complimentary networks and channels to direct interested stakeholders back to website
Action 1. Establish external social media accounts that link back to central network (Feb 2012)
Action 2. Regularly announce/invite audiences on external networks to join central network (Feb 2012)
Objective 4. Support contributors with resources needed to collaborate and self-organize for ongoing development
Strategy 1. Establish collaborative spaces for contributors
Action 1. Set up announce & discuss lists; facilitate conversations on internal discussion forums; create internal groups via Drupal; (Sep 2011, Oct 2011)
Action 2. Facilitate conversations on internally hosted discussion forums (Sep 2011, Oct 2011)
Action 3. Create and/or allow community members to create internally hosted groups for specialized collaboration (i.e. indexing groups, various working groups) (Sep 2011, Oct 2011)
Strategy 2. Identify contacts for specialized project development areas
Action 1. Publish staff contact information for specialized project development areas via website (Sep 2011, Oct 2011)
Action 2. Allow registered community members to self-identify as experts in specialized project development areas (Sep 2011, Oct 2011)
Action 3. Publish directory of registered community members with options to sort by project development area and level of participation (Sep 2011, Oct 2011)
Strategy 3. Provide adequate orientation/training/documentation materials for contributors involved in specialized tasks (indexing, software development, etc)
Action 1. Develop contributors toolkits/documentation for specialized tasks (indexing etc.) (Jan 2012)
Action 2. Allow contributors to create pages to write and share self-authored documentation (Jan 2012)
Strategy 4. Coordinate regular contributor meetings
Action 1. Select online forum to host regular meetings facilitated by staff and/or registered community members (Jan 2012)
Action 2. Publish and announce meeting schedules for all working groups and contributors (Jan 2012)
Objective 5. Keep community informed of project developments and successes
Strategy 1. Establish internal channels to regularly share project news on long-term and short-term developments
Action 1. Institute quarterly report and/or newsletter (Jan 2011)
Action 2. Establish newsletter/announce list to broadcast more frequent/timely news items (Oct 2011)
Action 3. Draft and publish community news items in News section of website with RSS options (Oct 2011, Ongoing)
Action 4. Establish appropriate social media channels to share project news (Twitter; Identi.ca) (Mar 2012)
Strategy 2. Coordinate opportunities to showcase project development
Action 1. Pursue conference, event, demo opportunities (Ongoing)
Action 2. Host regular product demos (Jan 2012, Ongoing)
Action 3. Invite community to sit in on regular contributor meetings (Jan 2012, Ongoing)
Strategy 3. Celebrate project milestones
Action 1. Identify project milestones and share on public calendar (Oct 2011)
Action 2. Publish announcements on internal and external channels when milestones are reached (Ongoing)
Action 3. Identify and acknowledge key contributors that helped to achieve milestones (Ongoing)
To raise project profile and build awareness among stakeholder groups that translates into support and ensures continued development
Objective 1. Establish memorable brand consistent with project values
Strategy 1. Design branding materials (name, logo, tag line)
Action 1. Clarify branding objectives and identify potential concepts (Oct 2011)
Action 2. Obtain necessary design support as needed (e.g. graphic designer) (Sep 2011)
Action 3. Share design considerations with community and invite feedback/ideas on branding decisions (Jan 2012)
Strategy 2. Incorporate branding elements into all internal and external channels
Action 1. Include project name, logo, tag line in website/page headers and footers (Ongoing)
Action 2. Use branding elements to develop consistent email signature for staff, contributors and community at large (Ongoing)
Action 3. Use project name, logo, tag line on external channels (Wikipedia, YouTube, LinkedIn, Twitter, Identi.ca) (Ongoing)
Action 4. Design and share badges with branding elements on website for others to use on external channels (Ongoing)
Objective 2. Gain project exposure via external networks and channels
Strategy 1. Identify and reach out to prospective media partners
Action 1. Maintain comprehensive list of conventional and social media contacts including news outlets, industry and community publications, journalists and bloggers (Sep 2011, Ongoing)
Action 2. Connect and engage with media partners on appropriate online and social media channels and invite them to register as members of the project community (Oct 2011, Ongoing)
Action 3. Develop content for distribution to media contacts including press releases and media kits (Oct 2011, Ongoing)
Action 4. Pursue opportunities to share project information with media contacts via interviews and guest editorials/blog posts (Oct 2011, Ongoing)
Strategy 2. Establish social media presence
Action 1. Set up profiles on selected social networks including Twitter, Identi.ca, Facebook, LinkedIn, YouTube, Vimeo and integrate links into project website (Mar 2012)
Action 2. Participate and seek membership in other relevant online communities and projects such as Wikipedia (Nov 2011)
Strategy 3. Leverage opportunities to showcase project in external community spaces and forums
Action 1. Maintain calendar of relevant online and offline events, conferences, workshops, seminars and meetings, and pursue presentation and/or exhibiting opportunities where possible (Sep 2011, Ongoing)
Action 2. Develop presentation-ready materials including posters, slide decks, handouts, pamphlets and business cards (include QR codes where appropriate) (Nov 2011)
Action 3. Explore opportunities for sponsorship and cross-promotion of relevant external community events (Nov 2011)
Action 4. Where possible, add project links to relevant external forums, directories and project/resource lists (Oct 2011)
Objection 3. Generate attention and interest around key aspects of project development
Strategy 1. Host regular project showcases to highlight project development and progress
Action 1. Host offline project showcase/demo/info session at library venue and invite stakeholders to attend (Jan 2012)
Action 2. Host online project showcase/demo/info session via webconference (Oct 2011, Jan 2012)
Action 3. Develop flagship event/conference and that addresses broader issues addressed by project. Invite partners, contributors to participate as speakers. (Jan 2014)
Strategy 2. Develop and execute marketing campaigns and events (See Appendix B: Proposed Campaigns).
To secure sufficient resources and funds to facilitate project development and operations
Objective 1. Obtain core funding to facilitate project development during Phase 1, Phase 2 & ongoing operations
Strategy 1. Identify and pursue grant opportunities
Action 1. Maintain comprehensive list of appropriate granting agencies that includes grant details and application timelines (Sep 2011, Ongoing)
Action 2. Assemble committee to prepare and submit grant proposals (Nov 2011)
Strategy 2. Identify and build relationships with prospective funding partners
Action 1. Maintain list of prospective funding partners (institutions and/or individuals) (Sep 2011, Ongoing)
Action 2. Prepare sponsorship/partnership package and outreach materials (Nov 2011)
Action 3. Reach out to prospective funding partners to explore opportunities (Nov 2011)
Action 4. Make sponsorship/partnership package and outreach materials publicly available and invite community to share them in external networks (Nov 2011)
Strategy 3. Identify sources of in-kind support
Action 1. Maintain list of prospective partners able to offer in-kind resources and support (Sep 2011, Ongoing)
Action 2. Develop 'wishlist' of support needed and share publicly (Oct 2011)
Action 3. Reach out to prospective in-kind partners (Nov 2011)
Strategy 4. Execute fundraising campaigns (See Appendix B: Proposed Campaigns)
Objective 2. Develop sustainable funding structures to ensure resources and funds for ongoing operations are obtained
Strategy 1. Leverage online fundraising tools to solicit individual donations
Action 1. Set up PayPal account to accept donations (Oct 2011)
Action 2. Incorporate donations button/link into all appropriate web pages and project resources (presentations/outreach materials/email signatures) (Oct 2011)
Action 3. Reach out to prospective donors in appropriate online forums and via social media (Nov 2011)
Action 4. Establish presence in web-based fundraising communities such as Kickstarter (Nov 2011)
Action 5. Acknowledge funding partners on project website, user profiles, in mailing list content and via user-activated badges and/or social media updates (e.g. “I support Knowledge for All” tweet this options upon donor payment) (Sep 2011)
Strategy 2. Explore revenue models to secure funding
Action 1. Investigate paid membership models as method of generating funds (Nov 2011)
Action 2. Provide fee-based advanced technical support and customization services for users (Nov 2012)
The strategies and actions outlined above are intended to support overall project goals and development and contribute to positive outcomes for stakeholders and community members. These outcomes, along with measurable indicators of achievement, are identified below and will be used to gauge the effectiveness of chosen strategies and make improvements as needed.
Knowledge for All is a collaborative project of libraries around the world that aims to provide a free, online, comprehensive, and interdisciplinary index of scholarly journal literature. The Knowledge for All concept was devised in early 2010 and the initial one-year Planning and Development Phase began in January 2011 with funding from the Council of Atlantic University Libraries/Conseil des bibliothèques universitaires de l'Atlantique. This document is a deliverable of the Planning and Development Phase: Work Plan (October 2010). In February 2012 the project will move into Phase Two: Product Development, a two-year phase which will run from February 2012 to January 2014. This document presents a financial strategy and budget for year one of Phase Two. The strategic focus of year one of Phase Two will be developing a production version of the Knowledge for All software, developing a community of contributors and partners, and developing content for the journal index. Further details on the activities of Phase Two are available in the Human Resources and Workflows Plan (September 2011).
Knowledge for All's proposed financial strategy and business model during Phase Two include the following features:
Knowledge for All will operate is a not-for-profit corporation, as distinguished from a business or a charity. This means all revenue generated by the organization will go back into the organization rather than a portion being distributed to shareholders as profit. A charity is a special type of not-for-profit organization that is restricted to carrying out activities defined as “charitable” by the federal government. Charitable status allows an organization to issue tax-deductible receipts to its donors and makes it eligible for certain kinds of funding, but it does not allow an organization to advocate for a specific cause. As a not-for-profit organization, Knowledge for All will be able to generate revenue through donations and contributions but will not need to raise additional revenue to pay shareholders or be limited by restrictions placed on charities.
The main purpose of Knowledge for All is to develop an online index of scholarly journal articles that can be accessed by anyone in the world with a computer, web browser, and internet connection. Therefore, providing barrier-free access to the Knowledge for All's products (journal index, data, and software) are an important part of its business model. The model assumes that a percentage of organizations and individuals that use Knowledge for All will then voluntarily contribute funding to a project that benefits them and saves them money on commercial database subscriptions, as well as time and expertise toward maintaining the project, system, and data collection.
Revenue to fund development and ongoing operations of the Knowledge for All project will primarily come from one-time or ongoing contributions from organizations, with a small portion of it coming from grants and donations by individuals. A reliance on grant funding can lead to instability and a great deal of time and energy invested in writing grant applications and administering grants. Instead, contributions will be solicited from the user community as a primary revenue source, while grant funding will be pursued to supplement primary revenue. Knowledge for All will make the case that a library that contributes a small amount of money to Knowledge for All will save hundreds of thousands of dollars in electronic database subscription fees, which can instead be spent on other resources and services. Academic libraries will be the primary funding partners but other types of libraries and organizations which benefit from the Knowledge for All products will also be approached.
After the Knowledge for All products are fully developed and the organization has entered its Operations Phase, and if there is sufficient demand from the user community, Knowledge for All could provide additional services for a fee, such as customized software development and dedicated technical support. Organizations that wish to customize the Knowledge for All software for their own unique needs but lack staff resources to do this internally may have a need for fee-based customization services. Knowledge for All will provide free technical support to all users in collaboration with the user community (detailed in the Human Resources and Workflows Plan). However, organizations or individuals that need support beyond this may wish to pay for extended support. This could generate additional revenue for Knowledge for All and provide its user community with needed services that would allow use by a broader community. If fee-based extended support is offered in the future, what constitutes basic support and extended support will need to be clearly defined for users.
Estimated expenses for year one of Knowledge for All's two-year Product Development Phase are noted in the attached spreadsheet. The total amount of funding needed for this phase is approximately $500,000, or approximately $41,000 per month. Expense categories are explained below.
The budget includes wages for seven full-time staff members at $30 per hour, 35 hours per week. Job descriptions for these roles are included in the Human Resources and Workflows Plan. During the first year of the Product Development Phase, staff will be employed as contractors who work from home and are responsible for their own employment taxes and home office expenses, in order to save on the costs of setting up a Knowledge for All office and administering payroll. The exception will be a summer student employee, who will be hired if Knowledge for All obtains funding from Service Canada. The summer student's wages and estimated related employment costs are indicated on the spreadsheet. Staff will be reimbursed for costs such as printing, long-distance phone calls, and office supplies that were used solely to conduct Knowledge for All business, but not for expenses such as computers or Internet service. If desired by individual staff, office space to be provided in-kind by partner institutions will be sought out. This will be a particular priority for the Project Manager, who will have a higher need to access office equipment and space to store records.
Contract services, or irregular work performed by non-staff members, will include bookkeeping / accounting services, legal services, software development and design that is beyond the skills of staff developers, usability testing, and technical writing and documentation development. Volunteers will be utilized for most of these services when they are available, but the services have been included in the budget in case volunteers are not available.
The Knowledge for All Internationalization Strategy recommends that professional translators translate the content about Knowledge for All on the website, the Knowledge for All system interface, and software documentation. Most translation will occur in year two of the Product Development Phase – after product launch – but translation of content about Knowledge for All on the website should occur in year one and is included in the budget.
Office and communication expenses during year one of Phase Two will be minimal due to the fact that Knowledge for All will not be maintaining an office. However, some money will be budgeted for postage, long-distance telephone calls by staff, office supplies, and printing and copying.
Travel and meeting expenses will consist of conference fees and travel to conferences, local travel by staff to attend meetings with partners or to carry out Knowledge for All-related errands, and food and accommodations for occasional meetings between staff and board members. As most staff and board members will be located in different cities, there will be few in-person meetings between them. There will be an annual general meeting or strategic planning meeting for all staff and board members, and Knowledge for All will cover travel expenses for staff members. The budget assumes this meeting will happen in July, in accordance with the 2011 strategic planning meeting, but the date could change.
It is expected that a Knowledge for All representative will present at the following conferences in 2012:
The budget allocates additional funding for attendance at other conferences as appropriate during the “conference season.” Conference and travel fees for board members presenting on behalf of Knowledge for All may also be covered by their employers. Conference attendance and presentations will increase in 2013 after the production version of the system has been launched and more funding support is available.
Marketing and advertising expenses include the costs of creating marketing materials, as well as advertising for vacant staff positions in the organization. It is assumed that most hiring will occur at the beginning of Phase Two.
Knowledge for All's monthly bank fees are high because it is an international organization with employees and vendors located in different countries and the organization does its banking electronically to accommodate staff and board signing authorities in different cities. These projected figures are based on the current costs of using the EFT payment service, wire payment service, and government tax payment services at Scotiabank. Knowledge for All is set up to receive online donations through Paypal. Paypal charges 2.9% on all donations, but the fee is reduced to 2.2% for charities. Knowledge for All is currently in the process of determining whether it qualifies for a reduced fee as a non-profit organization. Paypal fees are calculated under the assumptions that Knowledge for All will receive approximately three donations of a total of $15,000 per month via Paypal. Other donations could be paid by cheque, credit card, or wire transfer.
Computer software and hardware costs will be minimal because open source tools will be used whenever possible and staff will be responsible for supplying and maintaining their own standard hardware and software. However, Knowledge for All will purchase special hardware and software, such as project management software, for staff use when needed and when an adequate open source alternative is not available.
Other products and services that will be provided in-kind by partners will include the following:
|Product or service||Provided by|
|Hosting of Knowledge for All system||University of Prince Edward Island|
|Hosting of other software||University of Prince Edward Island|
|Office space for staff||Partner institution to be determined|
|Meeting space||Requested as needed from partners|
Knowledge for All is a non-profit organization that aims to provide a single, comprehensive scholarly research tool that can be used by anyone in the world with an internet connection, computer, and web browser to search all published scholarly journal literature regardless of financial resources or institutional affiliation. The organization also aims to provide libraries and other organizations and individuals with an open source software tool that can be customized, modified, and integrated with existing tools and systems and with open journal and journal article metadata that can be used for any purpose. This will be accomplished through coordinating the collaborative efforts and resources of librarians, researchers, and developers around the world, who will contribute data, expertise, and time as indexers, translators, and developers.
Knowledge for All is a large and complex project that depends on the work of different kinds of paid and volunteer contributors with varied relationships to the project and working different amounts of time in many different locations around the world. Therefore, it is imperative that we create organizational, communication, and workflow structures and processes that facilitate effective and open information sharing, communication, collaboration, and decision-making, and a sense of community among a varied and dispersed group of organizations and individuals.
This report summarizes the major processes and activities that will compose the Knowledge for All project during Phase 2 and Operations and provides recommendations on how they will be carried out and by whom. First it outlines different roles for individuals and organizations who will participate in the project, with job descriptions for each role, and then it details workflow processes for all of the organization’s major activities. Recommendations in this document were informed by studies of management and governance of non-profit organizations in the libraries, open source software, and publishing fields, as well as discussions with stakeholders.
The Knowledge for All concept was devised in early 2010 and the organization was incorporated in Canada in late 2010. The initial one-year Planning and Development Phase began in January 2011 with funding from the Council of Atlantic University Libraries/Conseil des bibliothèques universitaires de l'Atlantique. It includes strategic planning for all aspects of the project's development and operations, initial community engagement, and development of a system prototype. This document is a deliverable of the Planning and Development Phase: Work Plan (October 2010). In February 2012 the project will move into Phase 2: Product Development. Phase 2: Product Development will run from February 2012 to January 2014 and consist of developing a production version of the system, further community engagement, and content development. Following that, the organization will begin ongoing Operations. We acknowledge, however, that as the project grows and evolves this plan may change, particularly during the Operations phase.
Knowledge for All will be a community-driven project, where volunteer members of the community carry out most of the tasks involved in running the project - from content creation to software development to management. Our vision is a participatory, non-hierarchical organization that is managed from the bottom-up, with a small number of staff supporting a large number of volunteer contributors. However, during Phase 2, while community support is still being established, greater staff resources will be needed to coordinate operations and create community infrastructure.
The proposed organizational structure of Knowledge for All is shown in the attached organizational chart. Categories of roles are as follows:
Although people who speak a variety of different languages will be needed and welcomed to fill all different roles in the Knowledge for All project, effective communication among decision-makers in the project – staff and working group and committee members – requires that a common language be spoken. Because this project is currently being developed and led by English-speaking Canadians, we recommend that English-language proficiency be a requirement for these roles. We acknowledge, however, that this could change in the future and by no means believe that English should be the dominant language.
As explained above, contributors will include individuals who are contributing time and work toward the project on a volunteer basis, including Indexers, Senior Indexers, Software Developers, Translators, and members of working groups. Work hours will be flexible but contributors will be asked to work a suggested number of hours per week, depending on their role. If they work for a partner organization they may be asked to spend part of their paid work time doing work for Knowledge for All.
Appendix A: Figure 2 shows a suggested workflow for user registration. When a new user registers they will be invited to take on a contributor role, then directed toward appropriate orientation and training resources if a role is adopted. This process is discussed further for specific roles in the Workflow section below.
Knowledge for All will be governed by a number of working groups composed of community members and tasked with directing and managing different aspects of the project. Most working groups will be open to anyone who wants to participate in them and operate in an open and non-hierarchical manner. The Board of Directors is considered a working group, although it will have a greater responsibility than the other working groups in being fiscalling and legally responsible for the organization, and its members will be appointed. Members of a Steering Committee will also be appointed, due to the need for high profile members with specialized expertise on this committee.
The members of working groups should be allowed to structure and run the groups in whatever way is most efficient and appropriate for their work, but all should meet a minimum of once per quarter and follow a broad policy that governs all working groups to be developed by the Board of Directors. The policy should ensure a non-hierarchical structures and open participation at all levels.
Each working group can develop its own specific decision-making processes, but it is recommended that they use a consensus approach as opposed to Robert’s Rules of Order. Consensus decision-making involves collectively finding a solution that everyone can live with through dialogue and deliberation rather than majority rules. This approach will allow diverse perspectives to be heard and will not involve learning a complicated set of culturally-specific rules. Resources and training should be provided to working group members on the consensus process and effective facilitation.
As an international project that will require participation from diverse contributors and strive to be decentralized, it will be important to ensure broad and diverse participation in working groups, as well as participants with expertise relevant to the activities of the group. Thus, policies should be created and practices followed to attract participants in working groups from a variety of:
This report recommends the following working groups:
Additional temporary and ongoing committees and working groups will be formed as needed and as directed by the community.
Knowledge for All was incorporated federally as a non-profit organization in Canada in 2010. An initial set of by-laws to govern the organization were created at the time of incorporation and filed with Corporations Canada. These by-laws should be reviewed by the Board of Directors and amended as necessary to reflect the proposed new structure.
Staff will be responsible for coordinating different processes and more intensive tasks that cannot be managed by volunteers. The structure should be flexible and able to respond to the needs of the organization. To fit with the values of the organization, it is recommended that Knowledge for All employ a non-hierarchical staff structure, where all staff members have the same level of seniority, receive the same wage/contract fee and benefits, and make decisions collaboratively, but they will have responsibility for and decision-making authority in different areas.
In this document, roles are considered “staff” if they are paid and last for a period of one year or longer. Other positions, which provide special services to the organization for a shorter period of time, will be considered “contract.” That being said, during Phase 2 it is recommended that staff are employed as independent contractors who are paid a monthly fee with no employment taxes deducted and are responsible for their own office expenses. This is recommended because Knowledge for All will not have the administrative capacity or financial resources needed to have regular employees. However, it is also recommended that Knowledge for All consider evolving towards having regular employees in its Operations phase so that long-term staff can enjoy the benefits of vacation pay, health insurance, sick leave, and parental leave.
Staff should receive a competitive wage to be set by the Board of Directors, which takes into consideration that staff are responsible for their own expenses. The attached job descriptions suggest full-time hours (35 hours per week) for all staff, but the organization should be flexible and offer alternative arrangements when possible, such as job-sharing for staff who desire work-life balance.
As an international project that will require participation from diverse contributors and strive to be decentralized, staff should also come from a variety of geographic locations, cultural backgrounds, and linguistic groups.
It is recommended that most staff work remotely from their 'home offices,' as this will allow broad geographic dispersal, a less centralized project, and fewer project expenses. However, it is recommended that the Project Manager be provided with free office space by a partner institution so that the Project Manager has access to office equipment such as a photocopier and fax machine and space to store and manage records. Acquiring space from a partner institution rather than renting an office space allows flexibility in the location of the Project Manager, as the location may change over time as new people take on the role. Knowledge for All could also attempt to find office space from a partner institution for other staff members. Partner institutions will be asked to provide meeting space to Knowledge for All staff when needed.
Staff roles will include the following:
Organizations that provide financial, in-kind, or human resource support to Knowledge for All will be considered “partners.” Support could include but is not limited to:
Partners will be recognized for their contributions, with their permission, on the Knowledge for All website and publicity material. They will be invited to contribute input and feedback on various aspects of the organization's planning and operations.
The following position descriptions provide an overview of duties, responsibilities, reporting relationships, and qualifications for all major positions in the Knowledge for All organization and project, including staff positions, volunteer contributor positions, and working groups.
Type of position: Volunteer
Length of term: 2 years
Hours per week: Flexible
The Advisory Committee will largely be composed of high profile individuals who are committed to the Knowledge for All project and have relevant expertise, but are not able to make a great time commitment to the project. They will meet as a committee once per year and be invited to attend the annual meeting or conference. They will have no decision-making authority but advise the Board and staff on issues outside of the Board's expertise as needed.
Type of position: Volunteer
Length of term: 2 years
Hours per week: 1-4
The Board of Directors will be responsible for setting broad organizational policy and direction, addressing issues that affect the whole organization, hiring and evaluating the Executive Director, and overseeing finances. The Board will include a Chair, Secretary, and Treasurer (roles detailed below) to manage specific functions, and at least one member of each Working Group and Committee will sit on the Board. Directors will be appointed by existing Board members. 25% of the directors must be residents of Canada, as mandated by Corporations Canada. The Administrator will report to the Chair of the Board and attend Board meetings.
Type of position: Volunteer
Length of term: Flexible
Hours per week: 1-4
The Content Working Group is an open network of individuals who are involved in content development and want to participate in decision-making and management related to content in the Knowledge for All system. It will largely be made up of Indexers but open to others with relevant interests and expertise. The group will be open to all participants and communicate using open methods. Although the group will be loosely organized, efforts will be made to have representation from broad geographic, linguistic, subject areas. As needed, sub-groups could be formed for specific subject areas or to carry out specific content-related tasks for project. The Content Working Group will coordinate the work of Indexers and Senior Indexers. At least one member of the Content Working Group will sit on the Board of Directors and report on the activities of the Committee. The Data Technician will participate in the Content Working Group. During Phase 2 some of the Content Working Group's responsibilities will be managed by the Metadata Librarian, and the Metadata Librarian will participate in and coordinate the Content Working Group.
Type of position: Staff
Length of term: 2-year contract (renewable)
Hours per week: 35
The Data Technician will be responsible for harvesting data for the Knowledge for All system, coordinating automated data harvesting, and overall data management. The Data Technician will provide technical support to Indexers in development of content. The Data Technician will work with the technical team to resolve technical issues related to content and will provide advice to the Metadata Librarian during Phase 2 on development of data standards and practices. The Data Technician will participate in the Content Working Group.
Type of position: Staff
Length of term: 2-year contract (renewable)
Hours per week: 35
The Developers will be responsible for developing new applications and functionality for the Knowledge for All system. During Phase 2, one or more Developers will work under the direction of the System Development Coordinator to initially develop the Knowledge for All system, while during Operations one or more Developers will work closely with the System Administrator and Data Technician to lead and carry out ongoing development projects related to the Knowledge for All system and website. During Operations, the Developer(s) will take on leadership duties as needed, including project management, requirements analysis, and design, whereas during Phase 2 these duties will be the responsibility of the System Development Coordinator.
Type of position: Staff
Length of term: 2-year contract (renewable)
Hours per week: 35
The Engagement Coordinator will coordinate community development, marketing, and fundraising for Knowledge for All and provide broad coordination of the organization's network of users, contributors, and supporters. The Engagement Coordinator will work with the Project Manager to manage donations and ensure fundraising targets are being met. The Engagement Coordinator will work with the Content Working Group and Internationalization Working Group to monitor participation from different groups and undergo targeted recruitment campaigns as needed. The Engagement Coordinator will work with the Fundraising Working Group to carry out fundraising activities.
Community development and coordination
Type of position: Volunteer
Length of term: Flexible
Hours per week: 4-8 (suggested)
Indexers will be responsible for creating and editing metadata for journal article records in the Knowledge for All system. They will work as many hours as they are capable and adopt as many journals to index as they are capable. If they are employed by a partner organization they may be asked to contribute to the Knowledge for All project by their organization. They will be coordinated by the Content Working Group and communicate issues and suggestions to the Content Working Group.
Type of position: Volunteer
Length of term: 1 year
Hours per week: 1-4
The Internationalization Working Group will be responsible for coordinating translation of journal article records and directing and advising on internationalization of the Knowledge for All project. They will work with and advise the System Development Coordinator, System Administrator, and Developers on internationalization and localization of software and translation of the user interface. They will work with and advise the Metadata Librarian, Content Working Group, and Indexers on multilingual content development. They will work with and advise the Engagement Coordinator on recruitment and coordination of contributors from different geographic regions and with different linguistic and cultural backgrounds. At least one member of the Internationalization Working Group will sit on the Board of Directors and report to and advise the Board on internationalization.
Type of position: Volunteer
Length of term: 1 year
Hours per week: 1-4
The Marketing and Fundraising Working Group is responsible for helping Knowledge for All raise funds needed to carry out its operations and reach its goals. It works with the Engagement Coordinator to design and carry out fundraising campaigns, create fundraising strategies, and meet fundraising targets. At least one member of the Marketing and Fundraising Working Group will sit on the Board of Directors and report on the activities of the Working Group.
Type of position: Staff
Length of term: 1 year
Hours per week: 35
The Metadata Librarian will coordinate content development and management for the Knowledge for All system during the first year of Phase 2. Later the community (specifically the Content Working Group) will take over these responsibilities but a staff person is needed to develop and implement systems while the community is being developed. The Metadata Librarian will establish and participate in the Content Working Group. The Metadata Librarian will coordinate the content development work of Indexers and Senior Indexers during Phase 2 and work with the Data Technician and System Development Coordinator to establish content development processes.
Type of position: Staff
Length of term: 2-year contract (renewable)
Hours per week: 35
The Project Manager will be responsible for management and administration of the organization as a whole, including staff management and finances. The Project Manager will report to the Board of Directors and share relevant information with the Board of Directors. The Project Manager will attend Board meetings and provide support to the Board's activities as needed, working specifically with the Treasurer on financial matters. The Project Manager will oversee the work of staff and coordinate further development of the organization and project. The Project Manager's title will change to "Administrator" during Phase 2.
Type of position: Volunteer
Length of term: Flexible
Hours per week: 4-8 (suggested)
Senior Indexers oversee the work of Indexers and act as mentors to new Indexers. They also edit journal article records created by Indexers. When an Indexer has had a certain amount of experience indexing on the Knowledge for All project with no major quality concerns, they will advance to a Senior Indexer position. Senior Indexers will be coordinated by the Content Working Group and communicate issues and suggestions to the Content Working Group.
Type of position: Staff
Length of term: 2-year contract (renewable)
Hours per week: 35
The System Administrator is responsible for administration, ongoing maintenance, and support for the Knowledge for All system, website, and other software and hardware used by the organization. The System Administrator will work closely with the Developers, who will be responsible for developing new applications and functionality, and with the Data Technician, who will be responsible for developing and managing the data in the Knowledge for All system. The System Administrator will participate in the Technical Working Group.
Type of position: Staff
Length of term: 2-year contract
Hours per week: 35
The System Development Coordinator will be responsible for leading and coordinating the design and implementation of the Knowledge for All system during Phase 2. The System Development Coordinator will supervise and direct Developers to help carry out development of the system. The System Development Coordinator will also do ongoing maintenance of the system and provide support to staff and users as needed, and assist with the development of technical support infrastructure for the project. The System Development Coordinator will supervise developers, participate in the Technical Working Group, provide timely technical information to the Project Manager, and work with the Data Technician on data issues related to the system and its development.
Type of position: Volunteer
Length of term: Flexible
Hours per week: 1-4
The Technical Working Group is an open network of individuals who want to participate in decision-making and management of technical issues in the Knowledge for All project. The Technical Working Group will provide a forum to discuss and resolve problems, direct technical development, and advise on tools and standards. The Group will be open to all participants and communicate using open methods. The Group will largely be composed of volunteer developers and other community members with technical expertise. At least one member of the Technical Working Group will sit on the Board of Directors and be responsible for representing the Technical Working Group at board meetings. The Group may be asked to carry out or provide input on technology-related tasks or projects by the Board of Directors or staff. Development staff will attend Technical Working Group meetings and will report regularly on technical issues that can be addressed by the Group.
Type of position: Volunteer
Length of term: Flexible
Hours per week: 2-6 (suggested)
Translators will translate journal article records and additional material and proofread translated records on a volunteer basis according to the will and schedule of each Translator. They will be coordinated by (and may participate in) the Internationalization Working Group. They will work with the Metadata Librarian, Content Working Group, and Indexers on matters related to multilingual content development as needed.
Specific communication processes and reporting relationships are outlined throughout this document, but below is a summary of major tools that will be used for communication between community members.
Automated workflow system
The Knowledge for All system for managing and providing access to journal article data will include a robust automated workflow system for creating and editing data, translating records, and managing users. This will limit the amount of direct communication needed between contributors around these processes.
Contributors will utilize free online meeting tools such as Skype for calls and video conferencing.
Knowledge for All website
The organization's website will provide a space for collaboration, document sharing, and discussion between members of working groups and committees, between contributors, and with the broader community.
Annual conference or meeting
The project will have an annual conference or meeting (or multiple conferences/meetings in different regions) at which contributors can get together to do strategic planning, do software development, receive and give training, and discuss issues in person. This will require significant resources to organize and host, so perhaps during Phase 2 it could piggyback on one or more conference or event organized by another group.
Regional meetings and groups
Although various tools and processes will be used to facilitate communication by geographically dispersed contributors, users and contributors in the same geographic area could be encouraged and supported to form regional groups and meet in-person when possible. This could be useful for training and development, as well as organizing around common cultures and languages.
Below are a list of tools and options for managing our community.
An open journal index containing article-level metadata about tens of thousands of scholarly journals is the primary product and service being offered by Knowledge for All, and so development of content for the index is the largest and most complex process of the organization. The Content Development Strategy document identifies what content will be included in the index and how it will be generated and organized, while this document focuses on the workflow processes involved in developing and managing that content. Appendix A: Figure 1 shows the content development workflow.
Some of the development or collection of content for the Knowledge for All journal index will be automated and some of it will be human. Most factual metadata will be collected or generated through automated means while volunteer Indexers will edit the data and add additional subject headings and other special metadata that is covered by copyright or not available elsewhere. A Data Technician, a full-time staff member, will coordinate automated data harvesting and overall data management. Full records will then be sent to a volunteer Senior Indexer for final editing and quality control before being published.
The Knowledge for All system will have an automated workflow process by which Indexers will be assigned articles to index that match their subject expertise. If they do not complete the task within a specific time frame, it will be assigned to another Indexer. When the article has been indexed, it will be sent automatically to a Senior Indexer for editing before it is published. The Data Technician will oversee and troubleshoot this process, although higher level systems-related issues will be passed on to the System Development Coordinator during Phase 2 and the System Administrator during Operations.
During Operations the content development process will be managed by a Content Working Group, an open network of individuals who want to participate in decision-making and management around content and most of whom will be involved in content development. This group will communicate via a discussion list and monthly calls. It will be open to anyone who wants to participate. The Content Working Group will provide a forum to discuss and resolve content and indexing issues, direct metadata development, and advise on standards. At least one member of the Content Working Group will sit on the Board of Directors and be responsible for representing the Content Working Group at board meetings. The Group may be asked to carry out or provide input on content-related tasks or projects by the Board of Directors or staff.
The Data Technician will attend Content Working Group meetings and will report regularly on issues that concern the Group. The Content Working Group will be responsible for addressing all content related issues, but will ask for support from staff members when needed.
During the first year of Phase 2, content development will be coordinated by the Metadata Librarian, as the infrastructure and processes for content management will need to be developed during this phase. The Metadata Librarian will be responsible for establishing the Content Working Group, developing authority files and thesauri, establishing metadata standards and guidelines, and assisting with the development of content-related documentation.
Some fields in the Knowledge for All system will link to authority files, and these files will need to be developed during Phase 2 and maintained throughout Operations. The most significant files that will need to be developed are the author name authority file and subject thesauri for all major disciplines.
The Metadata Librarian will be responsible for coordinating these processes with extensive input from subject experts in the Knowledge for All community – perhaps through the creation of ad hoc committees specialists in different disciplines who will assess available thesauri and make recommendations. In some cases an existing thesaurus could be adopted or adopted with modifications, while in others an entirely new thesaurus will need to be developed.
In the later part of Phase 2 and the Operations phase, the Content Working Group will be responsible for coordinating ongoing development and maintenance of authority files and thesauri. They may task individual Indexers with responsibility for maintaining specific thesauri or utilize other resources, such as students of Master of Library Studies programs through partnerships with schools.
The Knowledge for All project will require a large number of volunteer Indexers with different linguistic and subject expertise who will be geographically dispersed around the world. Initial and ongoing recruitment of Indexers will be done by the Engagement Coordinator, with assistance from the existing community of Indexers and other contributors. Targeted recruitment will be carried out as needed.
As shown in Appendix A: Figure 3: Indexer Training Workflow, new Indexers who register on the Knowledge for All site will be directed to an online orientation, online training modules, and documentation and be invited to join the Content Working Group. They will also be assigned a Senior Indexer who will act as a mentor. The system will keep track of which online training sessions have been completed by the Indexer and the mentor will be able to monitor this. Once a new Indexer has completed necessary training tutorials, they will receive journal article records to index. The mentor will check in with the Indexer and to be available to provide support and guidance. The mentor will edit the new Indexer's work initially and address quality issues with the Indexer as needed. Most of this will occur through automated workflow processes, but the Content Working Group during Operations and the Metadata Librarian during Phase 2 will be responsible for overall coordination of this process and will address problems as they arise.
During Phase 2, documentation and training videos will be created by the Technical Writer on creating and editing content in the Knowledge for All system. Further support will be provided through a support forum managed by the Content Working Group. Orientation materials will be created and maintained by the Engagement Coordinator.
When someone has had a certain amount of experience indexing with no major quality concerns, they will advance to a Senior Indexer position, which means they will take on more editing responsibilities and act as mentors to new Indexers. Senior Indexers will receive initial online training and orientation but will also have a mentor - an experienced Senior Indexer - to guide them through the process and provide support as needed.
As an international project, it is important that the Knowledge for All system is accessible to users of different linguistic and cultural backgrounds who want to participate in different ways. This includes:
These recommendations were informed by the Internationalization Strategy
The Knowledge for All system will include records for journals and journal articles that are published in languages other than English. These records will need to be indexed by Indexers who are proficient in languages other than English. Records for these articles should be created in the original language of the article, although they may also be translated into English. The Content Development Workflow (Appendix A: Figure 1) will be followed to accomplish this. Multilingual authority files and thesauri should be available to facilitate this process. Where existing multilingual thesauri in major disciplines cannot be located, they will need to be translated from English thesauri. Volunteer Translators will carry out this activity, coordinated by the Metadata Librarian and Content Working Group.
Journal article records will be translated into different languages by volunteers according to the needs of the community. As outlined in the Translation Workflow (Appendix A: Figure 4), users can flag article records they would like to have translated and specify the desired language, and these records will be made forwarded to volunteer Translators with appropriate language expertise. Translated records will be proofread by another Translator before being published.
When a new user registers in the system and accepts the role of Translator, the user will be asked to identify language expertise, given translation guidelines, and given access to journal article records that need to be translated in their known languages.
The Knowledge for All website, system interface, and documentation should be translated and localized into different languages in order to make the project welcoming and accessible to contributors and users of different linguistic and cultural backgrounds. This should be done by paid translators on contract to ensure it is done well and in a timely fashion, unless willing and capable volunteers are available. Initially this material will be translated into ten languages most common in the academic community, and later others will follow if there is a demand as resources permit.
Translation of records and internalization of the Knowledge for All project will be coordinated by an Internationalization Working Group. The group will advise on and direct internationalization of the Knowledge for All project and coordinate the record translation process. This group may subdivide into culture, language, or geographic area-specific sub-groups if needed.
Knowledge for All is a collaborative project that depends on the contributions of thousands of institutions and individuals, so effective community engagement and coordination, marketing, and fundraising are essential for success. A part-time Engagement Coordinator began this work in month 7 of Phase 1. During Phase 2 there will be a greater emphasis on community development and engagement, while during Operations there will be a greater emphasis on community coordination. One or more full-time Engagement Coordinators will continue this work during Phase 2 and Operations. If more than one person is needed for this role, it could be divided into different roles, such as a Fundraising Coordinator and Engagement Coordinator.
The Engagement Coordinator will receive assistance and support with fundraising from a Marketing and Fundraising Working Group – a group of volunteers with experience with and enthusiasm for fundraising. Some community coordination will be done by the Metadata Librarian, Content Working Group, technical team, and Internationalization Working Group in their specific areas. The Project Manager will assist with fundraising since it involves accessing and managing the organization's finances. Depending on the skills and experience of the Engagement Coordinator(s), additional contractors may be needed for graphic design at various times.
Community development or engagement involves recruiting contributors in all roles for the Knowledge for All project and ensuring their continued engagement and involvement with the project. This will be accomplished through regular communication with the community, development and maintenance of the website, involvement of the community in decision-making and ongoing operations, and user registration processes (Workflow shown in Appendix A: Figure 2). The Engagement Coordinator will monitor and maintain information about users and undertake targeted recruitment campaigns as needed. Strategic partnerships with organizations will be developed and maintained by the Engagement Coordinator and working group members, but information about partners will be managed by the Engagement Coordinator. Community development and engagement during Phases 1 and 2 are outlined in more detail in the Engagement Strategy.
Ongoing marketing activities and campaigns will help raise awareness of Knowledge for All and communicate our vision and message to the community. This will include development of marketing materials and content for social media and the Knowledge for All website, planning and execution of marketing campaigns, and presentations at conferences and events. Marketing activities will largely be carried out by the Engagement Coordinator, but contributors and partners could be invited to create content (such as a blog post), community members will be provided with materials to promote the project in their networks, board members will deliver presentations and do promotion when possible, and the System Administrator and Developers will provide technical assistance with maintaining and developing the website.
Knowledge for All will be funded mainly through contributions from partner organizations and individuals, and partly from grants. The Engagement Coordinator will be responsible for seeking grant funding, preparing grant applications, and preparing reports for granting institutions, with assistance from the Marketing and Fundraising Working Group. The Engagement Coordinator and Marketing and Fundraising Working Group will solicit donations and undertake fundraising campaigns. The Engagement Coordinator will manage and collect financial contributions from institutions and individuals, which will involve managing donation schedules, setting up and managing payment systems, managing information about donors, and communicating with donors. The Project Manager will provide support. The Engagement Coordinator will be responsible for meeting fundraising targets set in the annual budget, keeping track of amounts raised, and reporting this information to the Project Manager on a regular basis.
As the project grows on an international scale, additional Engagement Coordinators could be hired to coordinate marketing activities, recruit contributors, and manage communities in different geographic areas, when different language proficiencies and cultural understanding is important for doing this effectively, as well as geographic proximity to the communities in question.
Development of the Knowledge for All journal index and technical infrastructure began in the latter half of Phase 1: Planning and Development, when a full-time System Development Coordinator was hired to work with staff and the community to design the system and develop a prototype that could be used for initial data havesting, data entry, and user testing. In Phase 2, a System Development Coordinator will continue to develop the system until it has all features noted in System Requirements Specification document, has undergone usability testing, and is ready for an official product launch. The System Development Coordinator will lead this process, supported by one or two additional Developers as needed.
During Operations of the Knowledge for All project, one or two Developers will work to continuously develop and improve system functionality while a System Administrator will maintain and support the system. The system will be hosted by a partner organization, so high level server and network administration will be done by staff at that institution.
Depending on the skills and abilities of the development staff and volunteer contributors, additional contractors may be hired to carry out web design/theming as needed.
Usability testing will be carried out by volunteer contributors when possible or contractors as needed.
The Knowledge for All end-user community will play an important role in providing feedback and input on system development and participating in user testing. Members of the end user community with appropriate skills will also be invited to participate in software development. It is expected that participation in software development will be minimal initially, as there are not many developers or programmers in the academic library community, but that participation is expected to grow over time as the project gains momentum and broader interest. Community contributions to Knowledge for All software development will be solicited and coordinated by the System Development Coordinator. The Engagement Coordinator will assist with recruiting volunteers through general marketing activities and special campaigns.
The Technical Working Group will be an open group of volunteers who want to participate in decision-making and management of technical issues in the Knowledge for All project. The Technical Working Group will provide a forum to discuss and resolve problems, direct technical development, and advise on tools and standards. It will be open to anyone who wants to participate and will likely be composed of volunteer developers and other community members with technical expertise. At least one member of the Technical Working Group will sit on the Board of Directors and be responsible for representing the Technical Working Group at board meetings. The Group may be asked to carry out or provide input on technology-related tasks or projects by the Board of Directors or staff. Development staff will attend Content Working Group meetings and will report regularly on technical issues that can be addressed by the Group.
Users of the Knowledge for All system who may require technical support will include those who:
These user groups will require different kinds of technical support. It is recommended that Knowledge for All offer the following types of support.
An e-mail discussion list or online forum requires few resources to maintain because it allows users to support each other. An online forum allows easier and better organized archiving of past questions and responses, but an e-mail discussion list is more immediate. This tool can be used by all of the user groups noted above but should be divided into broad topics that are relevant to specific user groups. Forums or lists should also be offered in different languages as there are sufficient users who speak those languages to maintain them. With this tool, users can provide support to each other, but the System Administrator should monitor the list and respond to questions when needed.
Phone and chat support requires significant resources but can be valuable for those who require immediate or more complex support, and phone support is valuable for users who are not technologically savvy. This option is feasible if the task is shared by members of the user community rather than done by staff. It would be scheduled and coordinated by the System Administrator.
As discussed above, regional groups would provide the opportunity for people to provide and receive support and training in-person or from users who may have common needs and interests due to their geographic proximity or who they may already know. Users and contributors in the same geographic area will be encouraged and supported to form regional groups and meet in-person when possible and as needed.
Different kinds of users who are using the Knowledge for All tool in unique ways, such as special libraries, may form groups for networking, development, and technical support. The Knowledge for All organization will support these groups by setting up online discussion groups for them.
Clear and thorough documentation will be needed for all types of users noted above. Some software projects rely on users to create documentation, but this often leads to poor or insufficient documentation. Instead it is recommended that Knowledge for All employ a technical writer during Phase 2 to create professional documentation, which will include written documentation and online tutorials. Staff will assist with creation of documentation and community members will be invited to contribute, but a Technical Writer will coordinate the process and will edit and finalize all documentation. The Technical Writer could also create documentation guidelines, or documentation about documentation, to guide staff and community members in creating future documentation. Ongoing maintenance and updating of documentation will be done by community members and staff members with relevant expertise.
As a federally incorporated not-for-profit organization in Canada, Knowledge for All is required to have a Board of Directors which oversees and takes legal responsibility for the organization. As a community-driven, non-hierarchical organization that is managed from the bottom up, ideally most of the decisions directing the organization will come from the community, and volunteers and staff will carry out the day-to-day operations. Thus, during Operations an Administrator and the Board of Directors should focus on administration and management in areas that involve the organization as a whole, such as human resources management and broad policy and planning. However, during the beginning of Phase 2 the Board may play a stronger role in day-to-day operations and decision-making and take on some of the responsibilities of other working groups while they are being established. Duties of the Administrator are currently carried out by the Project Manager, so the title Project Manager will be used in this document to avoid confusion.
The Board of Directors will be composed of supporters of the Knowledge for All project with relevant expertise and at least one member of each other working group (except the Advisory Committee). The Board will have roles for Chair, Treasurer, and Secretary and further roles as needed. These roles will not serve an executive function outside of the larger Board, but rather be tasked with certain responsibilities. The Chair will organize and facilitate meetings and manage board development; the Treasurer will oversee finances; and the Secretary will keep meeting minutes and manage records related to the Board's legal status and responsibilities. These roles should rotate as often as is efficient.
There will also be an Advisory Committee, which will meet infrequently and have no decision-making authority, and whose members will advise the Board and staff on issues outside of the Board's expertise as needed.
The Board of Directors will be responsible for planning and overseeing Knowledge for All's finances, and the Treasurer specifically will ensure that effective financial measures, controls, and procedures are put in place in line with good practice and in accordance with legal requirements and report to the board of directors at regular intervals about the financial health of the organization. The Project Manager will be responsible for day-to-day management of finances, including accounts payable and accounts receivable, payroll, banking, and cash flow management. The Project Manager and Treasurer will work together to ensure effective systems are in place and that the board has all necessary financial information. The Project Manager will also work with a bookkeeping service, who will maintain and post transactions in an electronic accounting system and provide the organization with information and reports as needed. An accountant will be contracted as needed to provide auditing services.
The Project Manager will be largely responsible for human resources management for Knowledge for All, including hiring, firing, evaluate, and directing staff and projecting short and long-term human resources needs. The Board of Directors will establish human resources policies under advisement from the Project Manager and will assist with hiring new staff. The Board of Directors will be responsible for hiring, firing, evaluating, and overseeing the Project Manager.
While Working Groups and Committees will direct specific areas of the Knowledge for All project, the Board of Directors will provide broad direction and planning for the organization. This will include setting objectives, establishing strategies, and making action plans, and allocating resources to carry them out. The Project Manager will assist with the development of strategies and action plans as needed and will advise the Board on planning needs.
The Project Manager will assign responsibilities to staff and volunteers as appropriate and work with staff and volunteers to develop efficient systems and ensure effective performance. The Project Manager will store and manage the organization's records while the Secretary of the Board will manage Board records. The Board will approve operational policies that are developed by staff and working groups.
Developing an open access, open source, multi-institutional, multilingual, international digital library requires robust strategic planning and technological infrastructure. This paper seeks to outline a series of “Best Practices” as related to managing multilingual content for the upcoming Knowledge for All journal catalogue.
The creation and maintenance of appropriate multilingual metadata and multilingual indexing data are essential components of providing international access to the Knowledge for All journal index. Success in this area means that a proper community will be developed which represents as many languages and cultural backgrounds as possible. After proper user testing and consultation with the System Development Coordinator, this outline of ‘best practices’ should provide grounding for the development of a strategic plan and timeline to allow Knowledge for All to properly manage multilingual content.
According to Wikipedia (2011):
Internationalization and localization (also spelled internationalisation and localisation) are means of adapting computer software to different languages, regional differences and technical requirements of a target market. Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by adding locale-specific components and translating text.
The world is a diverse place. If Knowledge for All wishes to expand beyond its core user base in English Canada, proper internationalization strategies must be implemented. It is ideal to allow users to search in one language, their native language, without having to rely on a machine translator or interpreter. This shows respect for diverse populations and diverse needs. It also shows commitment to diverse communities, whoever they may be. In the end, it is always recommended to adapt to the users needs than vice versa.
In most of the research there is confusion between internationalization, localization and translation. This document focuses on translation, as that is the author's area of expertise. Technology-based decisions regarding internationalization will ultimately be made by Knowledge for All's System Development Coordinator; however an outline of some suggested plans of action on internationalization are included in this strategy.
The following questions are pertinent to Knowledge for All's strategic planning. These questions will be addressed throughout this document.
The figure above shows the top ten languages used on the internet. There is little available on the top languages used in academia. In fact, researchers (Grin, Hughes, Emma) point out a need for more policy research done in this area. In addition, it is well-recognized in the above literature that the language used on the internet and in academia do not fully represent all of the languages spoken in the world, and so Knowledge For All must take account of this digital divide above all else. However, we can say that the linguistic practices of researchers in academia reflect broader socio-cultural conditions (Grin, 2010, p.4). Indeed, Wikipedia has evolved to include 264 languages with 11,389,385 articles in those languages, since its creation in 2001. Based on that assumption, the internationalization of the Knowledge for All catalogue will reflect the user community we are able to recruit and maintain, as well as the desire to translate metadata from other languages into English.
As this is indeed a subject which seems to be lacking in prior research (Yang, 2009), a user survey should be conducted to various international research groups in order to discern their expectations in terms of searching in a scholarly journal article database. It is well understood by many academics that English is the lingua franca of the academic world. Knowledge for All can use this fact to its advantage but should not exclude those potential researchers who operate solely in Spanish, for example. On another note, language use in research can have many different phases, from the initial searching to the writing and publication of an article. At Knowledge for All, we are most interested in the initial research phase of this process.
Keeping the above in mind, we can ask the following questions:
To maintain a competitive advantage over other similar services, Knowledge for All should consider the approach of commercial vendors to internationalization and translation. By doing so, Knowledge for All can improve on the practices of other major products to provide the user community with a comprehensive search experience. Here is a list of the vendors which were considered:
To see the full results of this inquiry, please consult Appendix A. In each of these cases, interface language support and multilingual search options were considered. It was found that most databases offered multilingual text searching in many languages, and interface translation in many languages. In all cases, it was possible to limit results in a designated list of languages. It is assumed that this feature became available as articles in those languages were available in each specific database. Knowledge for All may wish to use these language lists as a starting point for its journal index, and as a starting list for languages in which to acquire articles.
To allow for proper internationalization, the following aspects of the Knowledge for All journal index should be translated:
Internationalization means much more than translating text. All across the literature, many of the non-technical aspects that came up seemed to stem from general cross-cultural communication differences. Witte (2011) has suggested just a few examples of important things to note:
Sort Order: Some cultures list alphabetically, others do not.
Calendar use: different date-representations occur across the world.
Reading order: right-to-left in Arabic and Hebrew
Casing of words: The French do not use accents on their capital letters, and capitalize only the first word in a title.
Names: In some cultures people have more than two names, used in different orders.
Differences in symbols.: For example, the French quote text not like this “” but like this: « ».
These issues should be taken into account when developing the system. The availability of Unicode is necessary to ensure proper communication across all cultures.
Just like other facets of the Knowledge for All catalogue, the multilingualism and the degree to which things are translated or localized for different user communities should be an organic, grassroots process. As stated by Jilovsky et. al., (2006), “the range of cultural, historical and political knowledge that [users] need along with general indexing and specialist language skills” is necessary for the proper internationalization of the Knowledge for All tool.
Of course, the above depends on user input and so the catalogue will naturally evolve by soliciting the language expertise of its core user group. Here are some recommendations for utilizing the user community in development of multilingual content for the Knowledge for All system.
Translators should be flagged as such in their user profiles as willing to translate FROM Language A TO Language B
They may also be flagged as willing to proofread translations in English or in another language. Ideally proofreaders will be proficient in both the original and translated language, but proofreading could also be done by contributors who are only proficient in the translated language.
In theory, the translating process should appear as similar as possible to the indexing process:
In order to properly implement such a strategy, Knowledge for All will have to ensure that proper guidelines are available for the translation process in all languages used. These guidelines will consist of a well-defined explanation of the translation process, and will serve to instruct the translation community. It may also include some general translation tips and suggestions. Above all, the translation should be a good document, meaning that Knowledge for All may wish to hire a professional translator to translate these guidelines. This will streamline the process and ensure quality control.
It is advisable to avoid using a direct machine translation. Human translation, while riddled with faults, is most often the preferred method. Kitsite is a London-based company focused on web application development and content management projects, including multilingual content management. According to the Kitsite.com company blog, “machine translation should be considered with extreme caution, but it may be a plausible alternative for infrequently-accessed pages containing non-essential content. In this case, the use of short, unambiguously structured sentences and the avoidance of idiomatic phrases are essential, and sub-editing is likely to be a necessity.” This is a sentiment shared by similar internationalization consulting services, such as Lingoport. Finally, it is common knowledge in the linguistic field and the French-language academic world.
Ideally, all multilingual content will be read and revised by native speakers of that language. Whether or not they use a machine translator/other available language resources (dictionaries, thesauri, grammar books, etc.) should be up to the user. Knowledge for All may wish to provide links to appropriate and well-reputed translation resources such as (as an example only), Le Bon Patron.
The web interface is where the translation takes place. It should have embedded language tools such as dictionaries and a multi-lingual keyboard. Here are a few trusted sites:
As suggested by Notess (2008), although machine translation provides a less-than-ideal result, it may work in Knowledge for All’s favour in translating text to English. In the case of translating an abstract, a proofreader would only be required to plug an abstract from their native language into a machine translation tool, from which the general sense of the abstract could be garnered. The next step would be to simply proofread the translated text and prepare it for wider use.
If Knowledge for All wants to maintain high search accuracy in languages other than English, subject thesauri for all major disciplines in major languages other than English should be made available. In some cases, these already exist and only need to be located while in other cases they will need to translated. Quality translation of subject thesauri will likely require someone who is both a skilled translator and a taxonomist.
It is recommended that the static text available on the Knowledge for All system interface and website be available in many languages. This would include the search interface, information about the project, a “contact us” page, and user documentation. Making this content available in many languages would be essential for attracting multilingual contributors to the project and ensuring accessibility. As translation of this content should be high quality, it is recommended that Knowledge for All hire professional translators. Where this is not possible, Knowledge for All could rely on the user community as outlined above.
Finally, if Knowledge for All wishes to provide technical support in any language other than English, we may wish to recruit international volunteer contributors or hire contractors for this task, depending on the enthusiasm of the user community. As mentioned in the Human Resources and Workflows Plan, below is a list of the types of technical support Knowledge for All wishes to provide, with comments on how to achieve this in a multilingual environment. As the Knowledge for All tool gets up and running, staff will get a sense of which services may be necessary.
As this is another community effort, especially when considering an online forum, Knowledge for All should encourage members of all linguistic communities to contribute in whichever language they are comfortable using. By doing so, they will receive support in their own language if it is available from within the community. If this is not satisfactory, Knowledge for All may wish to ask one translator from each major language to monitor the discussion forums. These translators may require a small honorarium.
If resources allow, the user community may be utilized by having individuals sign up for designated time periods of support. Attempts should be made to recruit users with different language expertise to provide this service.
If users wish to form regional groups to provide and receive support and training in-person, specific linguistic groups will occur naturally. They may emerge as special interest groups with a focus on translation. Users may wish to get together to comment on their challenges and solutions when translating, or to improve on interface translation, for example. The Knowledge for All organization should provide support for these groups via online forums.
Clear and thorough documentation will be necessary for all users to fully understand the Knowledge for All tool. As it is recommended that Knowledge for All hire a translator to translate documentation and ensure it is high quality. Of course, the user community will be available to comment on and make recommendations to said documentation, to be done by volunteers and staff members with relevant expertise.
As Knowledge for All branches out into more and more international communities, some of Knowledge for All's user community may naturally have the ability to function in other languages, and thus a translation community may appear naturally. Although the material of this strategy focuses primarily on translation from one language to English, and vice versa, a language community which includes translations to and from languages other than English may also naturally grow as the tool is developed. This may include, for example, translations from Spanish to French. However, in order to promote the growth of such a community, it is recommended, as noted above, to provide a multilingual interface and documentation.
Another potential source of volunteer translators is translation students looking for experience. To recruit international translators, Knowledge for All may wish to use KDE’s website «Get involved in Translation» as a model. Other potential resources include (for networks of researchers in literature in history, in French):
An Internationalization Working Group could be established to develop, coordinate, and support a Knowledge for All translation community and advise on other aspects of internationalization.
Archer, J.. "Internationalisation, technology and translation." Perspectives Copenhagen and Clevedon Studies in Translatology 10 (January 01, 2002): 87-118.
Borgman, C.L.. "Multi-media, multi-cultural and multi-lingual digital libraries: or how do we exchange data in 400 languages?," D-Lib Magazine (June 1997). Retrieved from http://www.dlib.org/dlib/june97/06borgman.html
Cousins, S., & Hartley, R. "Towards multilingual online public access catalogues." Libri: International Journal of Libraries & Information Services, 44.1 (1994): 47-62. Retrieved from EBSCOhost.
EBSCOhost. "Translating an article." Retrieved September 2, 2011 from http://support.ebsco.com.ezproxy.library.dal.ca/help/index.php?help_id=98
Emma, M. C., Ali, S., & Dennis, N. "Challenges and issues in terminology mapping: a digital library perspective." The Electronic Library 23.6 (January 01, 2005): 671-677.
Grin, F., (2010). “Managing languages in academia: Pointers from education economics and language economics”, Paper presented at the Conference Professionalising Multilingualism in Higher Education, Luxembourg, 4 February 2010.
Google Scholar. (n.d.). Retrieved September 5, 2011 from http://scholar.google.com
Guerrini, M. "In praise of the un-finished: The IFLA statement of international cataloguing principles." Cataloging & Classification Quarterly 47.8 (2009): 722-740.
Hughes, R. "Internationalisation of higher education and language policy: Questions of quality and equity." Higher Education Management and Policy 20.1 (2008): 1-18.
Jilovsky, C., Sukkar, L., & Varga, E. "Multi-lingual cataloguing: Culture, practice and systems." CAVAL Collaborative Solutions (2006). Retrieved from http://www.caval.edu.au/assets/files/Research_and_Advocacy/Openroad_Mult...
Kitsite. Multilingual content management (n.d.). Retrieved August 25, 2011 from http://www.kitsite.com/articles/multilingual-content-management.html
Lingoport.. An internationalization blog by Lingoport (August 04, 2011). Retrieved August 25, 2011 from http://i18nblog.com/
Notess, G. "On the net. Multilingual searching: search engine language tools." Online 32.3 (2008): 40-42.
Thomson Reuters. Web of Knowledge (n.d.). Retrieved September 5, 2011, from Web of Science: http://wokinfo.com/
Translatewiki. Web interface (n.d.). Retrieved July and August 2011 from http://translatewiki.net/wiki/Web_interface
US National Library of Medicine. (n.d.). Retrieved September 5, 2011 from http://www.ncbi.nlm.nih.gov/pubmed/
Wang, J., Teng, J., Lu, W., & Chien, L. "Exploiting the Web as the multilingual corpus for unknown query translation." Journal of the American Society for Information Science & Technology 57.5 (2006): 660-670.
Witte, Carsten. "Improved usability through internationalization." In A. Marcus (Ed.) Design, User Experience and Usability (2011): 111-116.
Wikipedia. Wikipedia: Multilingual statistics (n.d.). Retrieved July and August 2011 from http://en.wikipedia.org/wiki/Wikipedia:Multilingual_statistics
Wikipedia. Internationalization and localization (n.d.). Retrieved July and August 2011 from http://en.wikipedia.org/wiki/Internationalization_and_localization
Wikipedia. Wikipedia: Translation (n.d.). Retrieved July and August 2011 from http://en.wikipedia.org/wiki/Wikipedia:Translation
Wooldridge, B., Taylor, L., & Sullivan, M. "Managing an open access, multi-institutional, international digital library: The digital library of the Caribbean." Resource Sharing & Information Networks 20.1/2 (2009): 35-44.
Wu, Y.D., Lui, M. "Content management and the future of academic libraries." The Electronic Library 19.6 (2001): 432-439
Yang, H.C., Lee, C.H., & Chen, D.W. "A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps." Journal of Information Science 35.1 (February 01, 2009): 3-23.
Here is a detailed description of the multilingual services provided by major competitors to Knowledge for All.
EBSCOhost databases and discovery technologies are the most-used, premium online information resources for tens of thousands of institutions worldwide, representing millions of end-users.
EBSCOhost offers multilingual searching in any of the supported interface languages. As a user, you can change the interface text of EBSCOhost from English to one of the following languages:
2. Bhasa Indonesian
3. Brazilian Portuguese
18. Simplified Chinese
22. Traditional Chinese
In the «Advanced Search» options, it is possible to limit results to any of the above-listed languages. That said, it is also possible to enter key words and subject headings to conduct multilingual searching in any of those languages as well.
In addition, EBSCOhost screens are presented in English, by default. If permitted, a user can translate a full text article from English into one of the following languages:
3. Simplified Chinese
4. Traditional Chinese
EBSCOhost provides this service by using automatic translation software. Here is the information available from that software, as presented on their Help page:
"Automatic translation software systems use sophisticated translation technology with comprehensive dictionaries and a collection of linguistic rules that translate one language into another without relying on human translators. An automatic translation software system interprets the structure of sentences in the source language (the language the user is translating from) and generates a translation based on the rules of the target language (the language the user is translating to). The process involves breaking down complex and varying sentence structures; identifying parts of speech; resolving ambiguities; and synthesizing the information into the components and structure of the new language. Machine translation is considered a “gisting” application, producing translations that enhance the end-user’s understanding of the original document. It does not produce the same level of translation that a human translator could provide."
Provides a search of scholarly literature across many disciplines and sources, including theses, books, abstracts and articles.
The Google Scholar interface is available in the following languages:
5. Chinese (Simplified)
6. Chinese (Traditional)
31. Portuguese (Brazil)
32. Portuguese (Portugal)
In addition to searching for pages in any language, it is possible to search for pages written in only these languages:
1. Chinese (Traditional)
2. Chinese (Simplified)
Multilingual keyword searching is available in any of the 43 interface languages.
PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
There is no mention of interface availability in a language other than English.
Under the «limits» section below the search screen, it is possible to limit results to any or all of the following languages:
25. Modern Greek
47. Scottish Gaelic
It is assumed that multilingual searching is available for any of the above listed languages, although there is no information to prove so in an «About Us» section of the website.
Web of Science ® provides researchers, administrators, faculty, and students with quick, powerful access to the world's leading citation databases. Authoritative, multidisciplinary content covers over 10,000 of the highest impact journals worldwide, including Open Access journals and over 110,000 conference proceedings. You'll find current and retrospective coverage in the sciences, social sciences, arts, and humanities, with coverage available to 1900.
The interface of the entire web site is available in English, Japanese and Simplified Chinese.
As stated explicitly on the Advanced Search page, search terms must be entered in English. However, it is possible to restrict results in the following languages:
Knowledge for All will collect or generate and provide public access to mass quantities of journal article metadata, including such elements as title, author, date, and journal title. How this metadata will be collected or generated has not yet been determined and each method may have its own legal implications (for a discussion of possible methods see Journal Article Data Collection and Creation), but there is still a basic question to be addressed of what content is and is not subject to copyright?
The content will originate, or potential copyright owners will reside, in many different jurisdictions, but primarily Canada, the United States, and the European Union. There are no international copyright laws, but there are agreements that countries have to respect the copyright laws of other countries. In the context of Knowledge for All, this means that for content that was created in another country, we will need to abide by the copyright laws of that country which protect that content. The copyright laws of Canada, the United States, the European Union, and the United Kingdom and implications for the Knowledge for All project will be discussed here. This discussion is based on information gathered from research and writings about copyright laws. We have had no personal contact with a legal expert yet, but that is the recommended next step.
Under Canadian Copyright Law, compilations of factual data, including databases, are protected if they were independently created by the author and there was "a sufficient degree of skill, judgment and labour involved in arranging and selecting of the content in the database or other compilation" (Harris, 2000). Interface, content, and software can all be protected by copyright but the content or data within the compilation are still only protected as a compilation, not as individual elements (Moyse, 2002).
The United States offers less copyright protection for databases than Canada. Like in Canada, they are protected as "compilations," but while Canada recognizes effort in selecting and arranging data, the US requires "creativity." A report from the United States Copyright Office in 2003 calling for new legislation to protect databases summarizes current protection as follows: "What remains is a thin layer of copyright protection for qualifying databases. In order to qualify, they must exhibit some modicum of creativity in the selection, arrangement, or coordination of the data. The protection is thin in that only the creative elements (selection, arrangement, or coordination of data) are protected by copyright. Explanatory materials such as introductions or footnotes to databases may also be copyrightable. But in no case is the data itself (as distinguished from its selection, coordination or arrangement) copyrightable."
In 1996, the European Union passed the Council Directive 96/9/EC on the Legal Protection of Databases to give a high level of protection to databases under copyright law which may qualify as "original" and give a separate form of sui generis database right protection to databases which would not be considered "original" under copyright law (Commission of the European Communities, 2005). Its purpose was to protect databases which may not be covered by copyright to encourage to production of commercial databases in the EU (Commission of the European Communities, 2005). "While 'original' databases require an element of 'intellectual creation,' 'non-original' databases are protected as long as there has been 'qualitatively or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents' of a database" (Commission of the European Communities, 2005). Database right lasts for 15 years, but can be extended if the database is updated.
The UK implemented the EU's Directive on database right in 1997 with the Copyright and Rights in Databases Regulations 1997. These regulations similarily give copyright protection to databases that might be considered an original intellectual creation of an author and special database right protection for databases where there was a substantial investment in obtaining, verifying, or presenting the data within.
In summary, under Canadian and United States copyright laws the majority of journal article metadata elements are factual data and so not protected by copyright. Metadata elements that would be considered original creations of their authors, such as subject terms and abstracts, are subject to copyright. Subject terms and abstracts are discussed in more detail below. Blogger and law professor Michael Carroll (2009) addresses metadata in particular under US law in writing, "Under these principles, metadata is copyrightable only if it reflects an author’s original expression. For example, a collection of simple bibliographic metadata with fields named “author,” “title,” “date of publication,” would not be sufficiently original to be copyrightable." Under UK law, however, factual data in a database could be protected by database right or copyright. In their research on copyright status of metadata in the UK, Gadd, Oppenheim, and Probets (2004) note that under UK copyright law, metadata “is probably protected by copyright... [but] the key word here is 'probably.'” They suggest that an individual record could be considered a "compilation" of data, but it is questionable whether this would be considered "original" and whether it would be protected. They conclude that "the more creative effort that goes into the record (such as abstracting, indexing, etc.), the more likely it is that that record enjoys copyright."
This suggests that Knowledge for All should tread more carefully in harvesting metadata from databases located in the UK and EU and perhaps favour databases located in Canada and the USA. Copyright law with regards to databases and digital materials is changing and developing, and it is difficult to predict whether the future will bring increased or decreased protection. A good way to avoid legal problems is to obtain permission to harvest data even when harvesting is explicitly permitted.
The digital age has brought a lot of concern over unlawful copying and distribution of complete or partial works, such as full-text articles. By not including full-text articles in its database, Knowledge for All may avoid a lot of copyright headaches. Thus far there has not been as much concern about copying of metadata. Gadd, Oppenheim, and Probets' survey of OAI data providers and service providers found that few were concerned with protecting metadata for individual records.
Subject terms and abstracts are two elements of journal article metadata that would likely be considered literary works rather than factual information and so would be covered by copyright.
The Knowledge for All system will ideally include both subject terms and abstracts. Subject indexing of articles using controlled vocabularies will distinguish Knowledge for All from many commercial databases and tools like Google Scholar while abstracts are invaluable in providing detailed information to users about the content of articles.
Knowledge for All contributors will select subject terms for articles from controlled vocabularies, except where the license terms of metadata from other sources allow copying of subject terms and those subject terms have been selected from an appropriate controlled vocabulary. Thus, copyright is not a concern for subject terms.
Knowledge for All contributors could write abstracts for articles, but it would take a significant amount of time, which contributors may not be able to provide, and thus is not a feasible option.
The first question to consider is who owns the copyright on author-supplied abstracts of published articles - the author or the publisher? Authors are increasingly retaining some rights over their published material as institutional repositories and open access policies at institutions become more common. Author-publisher agreements vary between publishers, so this would have to be considered on a journal-by-journal basis. Where the publisher has supplied the abstract, clearly the publisher is the copyright owner.
But regardless of who owns copyright on abstracts, both the author and the publisher may be willing to allow reproduction of abstracts in the Knowledge for All database because the abstracts will potentially draw more users to read and perhaps purchase articles. Most publishers allow unrestricted access to abstracts on their websites, assumedly because they see it as an important part of marketing the article, so they may not object to Knowledge for All reproducing abstracts in its database.
If we are not able to include abstracts in the Knowledge for All database itself, we should link every article to its abstract via the publisher's website. We could look at the possibility of using a pop-up window to show the abstract on the linked page.
It is recommended that further research be done on how publishers view the distribution of abstracts and how important abstracts are to the Knowledge for All community of users.
The citations, or bibliography, of an article could be covered by copyright as an original compilation of factual data within the body of the article itself. Thus to import an article's citations into the Knowledge for All system for the purpose of citation analysis would likely violate copyright. To get around this we could obtain permission from the publisher for post-prints, obtain permission from the author for pre-prints, or add the citation information by hand, possibly through a tagging system.
Carroll, Michael. "Copyright in Database." Carrollogos: A Blog about Law, Technology, and Music (20 February 2009). Retrieved 29 March 2011 from http://carrollogos.blogspot.com/2009/02/copyright-in-databases.html
Carson, David O. Statement of David O. Carson General Counsel, United States Copyright Office before the Subcommittee on Courts, the Internet, and Intellectual Property Committee on the Judiciary and the Subcommittee on Commerce, Trade and Consumer Protection Committee on Energy and Commerce. Database and Collections of Information Misappropriation Act of 2003, Hearing, September 23, 2003 (United States House of Representatives 108th Congress, 1st Session). Retrieved 29 March 2011 from http://www.copyright.gov/docs/regstat092303.html
Commission of the European Communities. First evaluation of Directive 96/9/EC on the legal protection of databases - DG Internal Market and Services Working Paper. Brussels: Commission of the European Communities (2005). Retrieved 29 March 2011 from http://ec.europa.eu/internal_market/copyright/prot-databases/prot-databa...
Gadd, Elizabeth, Charles Oppenheim, and Steve Probets. "RoMEO studies 5: IPR issues facing OAI data and service providers." The Electronic Library 22.2 (2004): 121-138.
Harris, Lesley Ellen. Canadian Copyright Law (3rd Edition). Toronto: McGraw-Hill Ryerson (2001).
Moyse, Pierre-Emmanuel. Database Rights in Canada. Montreal: Leger Robic Richard, advocats (2002). Retrieved 10 March 2011 from http://www.robic.ca/publications
Pohl, Adrian. "Are Bibliographies Copyrightable? – The German Case." Open bibliography and Open Bibliographic Data blog 13 July 2011. Retrieved 17 August 2011 from http://openbiblio.net/2011/07/13/are-bibliographies-copyrightable-the-ge...
United Kingdom Intellectual Property Office. Copyright: Rights in Performances, Publication Right, Database Right - Unofficial Consolidated Text of UK Legislation to 8 April 2010. London: United Kingdom Intellectual Property Office (2010). Retrieved 29 March 2011 from http://www.ipo.gov.uk/cdpact1988.pdf
by Samantha Read, Amanda Stevens, and Michelle Gruda
This report outlines different data licensing options for Knowledge for All content and data that fit within the realm of open data, as defined by the Open Knowledge Foundation, and discusses advantages and disadvantages of available licenses. We conclude with a recommendation to adopt a Creative Commons Attribution license for content related to the Knowledge for All project and an Open Data Commons Attribution license for data related to the Knowledge for All project.
It is important that the Knowledge for All data license supports Knowledge for All’s core values, as defined in the Engagement Strategy:
In terms of openness and accessibility, the Open Knowledge Foundation has defined open knowledge with eleven key points, which can be found on their website (http://opendefinition.org/okd/), and will be summarized here. This definition has been supported and endorsed by many organizations involved in the open data movement.
Knowledge for All’s core values and the Open Knowledge Foundation’s definition of open knowledge should inform the selection of a license for Knowledge for All’s journal and journal article data. Another important consideration is the licensing terms of existing collections of open data and how our license might limit availability of existing data collections.
In the world of open content, the most popular licenses are those created by the Creative Commons organization. The six Creative Commons licenses allow reuse of content with various restrictions while three Creative Commons licenses adhere to the tenets of open knowledge: Attribution (CC BY), Attribution Share-Alike (CC BY-SA), and CC0. While the Creative Commons licenses were developed for content, the Open Data Commons developed similar licenses specifically for databases and data contained in databases. Open Data Commons licenses which could be used for Knowledge for All data include Attribution License (ODC-By), Open Database License (ODbL), Database Contents License (DbCL), and ODC Public Domain Dedication and License. As explained in their FAQ, separate licenses were developed because a database and the data contained within it sometimes needs to be licensed separately and differently, copyright often applies differently to databases and content, and data in databases is often used differently than content. The Open Data Commons and Creative Commons licenses are explained below in order of least to most restrictive.
CC0 is the least restrictive Creative Commons license, as it removes all copyright restrictions from content and thereby places that content in the pubic domain. A CC0 license can only be applied by authors or holders of copyright and related or neighboring rights over the content. The Public Domain Mark can be applied by anyone to content that is already free of copyright restrictions.
The ODC Public Domain Dedication and License places the data and database in the public domain and allows unrestricted sharing, reuse, reproduction, and adaptation of the database with no restrictions.
One can also apply a separate Database Contents License (DbCL) to the contents of a database licensed under the ODbL, which waives all rights in the individual contents.
CC BY allows other users to copy or remix the work in any way, including for commercial purposes. All users must do is attribute the work to the original creator. The creator should provide a link to the Creative Commons page explaining the user’s responsibilities, and as suggested by item 5 in the Open Knowledge Foundation’s list, provide an easily-accessed list of creators. The Open Knowledge Foundation’s web content is licensed as CC BY.
ODC-By allows users to copy, distribute, and use the database; to produce works from the database; and to modify, transform and build upon the database. In turn, users must attribute any public use of the database or works produced from the database and make clear to others the license of the database.
CC BY-SA has restriction for the user in addition to attribution. Users have free reign to copy, transmit and adapt the work, but anyone who remixes the work into something new must distribute that work under the CC BY-SA license, or a similarly open license. Wikipedia’s content is licensed as CC BY-SA.
Under the Open Database License (ODbL) users can do the same as above but must also keep the database open technologically and offer any adapted version of the database or works produced from it under the ODbL. This license limits commercial reuse of the database or its contents, whereas the ODC-By does not.
All of these licenses fulfill items 2, Redistribution, 3, Reuse, 5, Attribution, 7, No discrimination against persons or groups, and 8, No discrimination against fields of endeavour, on the Open Knowledge Foundation’s definition of open knowledge. These are the items that can be addressed in licensing; the others are matters of format. Item 8, regarding commercial ventures, is explicitly addressed in the About the Licenses page on the Creative Commons.
Considerations, then, turn to how well each license fits Knowledge for All’s core values and best allows Knowledge for All to reach its goals in delivering open journal content to all.
The public domain licenses best facilitate openness and accessibility in the sense that they put the fewest restrictions on data use and allow data to be most open. However, these licenses could also impede accessibility if using a public domain license means not being able to harvest data from existing data sources that have an attribution license. Known data sources that require attribution include the following:
Some of these data sources have a small amount of data while others have a large amount of valuable data. This list only includes data providers who make their licensing terms explicit. Not being able to harvest data from these providers for the Knowledge for All system could significantly slow down the project’s progress and require additional resources.
Using an attribution share-alike license limits openness to some degree because it requires those who reuse Knowledge for All data and content to apply a similar license or provide an open version of a proprietary product they build with the licensed data. On the other hand, it further supports openness and availability of data by requiring that the data remain open, which aligns with Knowledge for All’s values of openness and sustainability.
Using an attribution license does not make data any less open, as it only requires attribution. An attribution license would allow Knowledge for All to use the above data sources and would showcase the collaborative aspects of the project by recognizing all data contributors, indexers, and editors. “Attribution stacking” is one commonly discussed disadvantage of attribution licenses, whereby attribution credits become large and unwieldy, but keeping records of data sources and enhancers is a good administrative practice, as well as a way to recognize contributors and cite the authority of data.
Using an attribution license would also contribute to the sustainability of the Knowledge for All project as it would mean that those who use Knowledge for All data for other projects would credit Knowledge for All for the data, thereby marketing our project and directing new users to our site.
It is recommended that Knowledge for All apply a CC-BY license to their content, such as website content, marketing materials, and documentation, and a ODC-BY license to their data, such as journal and journal article metadata. This allows Knowledge for All to follow its core values and conform to the Open Knowledge Foundation’s definition of open knowledge while also supporting its sustainability and ensuring access to larger amounts of existing data.
Admin. Comments on the Science Commons Protocol for Implementing Open Access Data (2009 February 2). Open Knowledge Foundation Blog. http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/
Ball, Alex. How to License Research Data (2011). Digital Curation Centre http://www.dcc.ac.uk/resources/how-guides/license-research-data#x1-20002
Creative Commons (2011). About the licenses. http://creativecommons.org/licenses/
Open Data Commons (2011). Licenses. http://opendatacommons.org/licenses/
Open Knowledge Foundation (2011). Open Definition. http://opendefinition.org/okd/
Wikipedia (2011). Wikipedia:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License. http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribut...
Based on a review of publicly available legal documents for similar projects and ongoin analsysis of the Knowledge for All project, it is recommended that Knowledge for All compose the following legal documents.
A working document that addresses the first two items has already been created.
There should also be a FAQ on the website that summarizes many of the above legal information in plain language.
It is recommended that Knowledge for All consult with legal experts with extensive knowledge of international copyright laws. All approaches recommended here should be confirmed with legal experts and they should assist with creating all legal documents. It is also recommended that Knowledge for All contact and consult with organizations that have carried out similar large-scale projects, such as Zetoc.
Knowledge for All aims to provide a single, comprehensive scholarly research tool that can be used by anyone in the world with an internet connection, computer, and web browser to search all published scholarly journal literature regardless of financial resources or institutional affiliation. The organization also aims to provide libraries and other organizations and individuals with an open source software tool that can be customized, modified, and integrated with existing tools and systems and with open journal and journal article metadata that can be used for any purpose. This will be accomplished through coordinating the collaborative efforts and resources of librarians, researchers, and developers around the world, who will contribute data, expertise, and time as indexers, translators, and developers.
The Knowledge for All system will store and manage the journal and journal article data; provide a sophisticated user interface through which users can search for, use, and harvest the data; and provide a structure with which staff and volunteer contributors can develop and manage content and manage users and workflow processes. It will be an open source tool that can be customized, modified, and further developed by the user community and will be internationalized for accessibility by users around the world.
This document identifies all currently known requirements for the Knowledge for All system. Each feature is assigned a priority from 1-5, with 1 indicating a feature that should be developed first and 5 indicating a feature that can be developed last. It is understood that different features may be incorporated into different versions of the software over a period of time, depending on resources available. A realistic development schedule will be developed by the System Development Coordinator based on this document.
Open the attached PDF file below to read the full Software Requirements Specification document.