-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should software be cited? #12
Comments
We currently capture this on figshare based on the data citation principles as follows. We will follow the advice of the community here: Sparks, Adam (2014): Global-Late-Blight-Modelling. figshare. |
The DataCite metadata page for that code/dataset has a link for the XML that describes it: <?xml version="1.0" encoding="UTF-8"?>
<resource xmlns="http://datacite.org/schema/kernel-2.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datacite.org/schema/kernel-2.2
http://schema.datacite.org/meta/kernel-2.2/metadata.xsd">
<identifier identifierType="DOI">10.6084/M9.FIGSHARE.963593</identifier>
<creators>
<creator>
<creatorName>Adam Sparks</creatorName>
</creator>
</creators>
<titles>
<title>Global-Late-Blight-Modelling</title>
</titles>
<publisher>Figshare</publisher>
<publicationYear>2014</publicationYear>
</resource> |
Some comments:
|
It seems that it is not clear if we discuss software ( = relatively stable, executable package) or code base (rapidly evolving, versioned object). While in many case referring to the software is enough, in other cases there is actually no software, just an evolving code base. |
@seinecle that's a good point, maybe it would be useful to identify what the citation is intended to support? E.g:
The final one is perhaps not a typical goal for scientific citations, but I think for both data and software citations, the "live"/current version ought to be discoverable from the citation. |
With the fidgit project progressing parallel to this, would make sense to include a meta pointer to the assigned DOI from figshare? This would eliminate having to put a link to the code in a tag manually. |
One comment from looking at the metadata, and especially in light of the comments about the minimal information we need to capture in a previous thread from @bobbledavidson, @npch & others, but is the other project metadata being captured anywhere? The intermediary page currently asks you to input language, platform, maintainer, description, and (probably most importantly) license, so was that input into the above example? @IDodds RE citation etiquette, I don't know how many authors it can handle before the system breaks down (we've added >90 to some DataCite DOIs), but if you follow the practice of journals and the style of the human genome project, you could just list as the author of a massive group of contributors [x consortia] or [x community of developers]. DataCite metadata has the ability to set different levels of granularity to research objects if you wanted to credit separate units of code. Their RelatedIdentifier field can precisely describe relationships to other research objects through values like IsSupplementTo/IsContinuedBy/IsNewVersionOf/IsDocumentedBy/IsCompiledBy/etc. |
There's also the difference between the maintainer(s) of a project (who is/are currently responsible for it), and the contributors (those who have committed code to the project, or made other contributions). I imagine it would be the maintainers (and possibly also previous maintainers) that would be cited, though there are bound to be exceptions.
I think this can be optional, but might be useful when software is produced solely by a specific university or software company. A better analogy to book publishers, though, might be the code hosting (e.g. "GitHub") or archiving (e.g. "fighare") service. The geographic location is probably irrelevant, unless it's necessary for distinguishing between multiple entities with the same name.
Yes, I think being able to cite the snapshot as well as providing details of the current codebase (even if it's just the equivalent of an "Available from" or "Accessed at" URL) needs to be in there.
I guess the version would be the hash, in that case, and it would be nice to add a URL for it… |
@pbulsink Yes, the DOI should definitely be in there, though there is perhaps ambiguity between whether it's an identifier for the snapshot, an identifier for the specific release, or an identifier for the software as a whole (which could be assigned a separate DOI, linked to DOIs for specific releases using versioning metadata). |
@ScottBGI Some of the metadata that could be attached to the project is most useful for discovery, rather than citation (which just needs to identify the software specifically enough that a reader could find it). "platform" should be in the citation, I think, and possibly "maintainer" (see the comment above) but the code language, description and license probably don't need to be. |
It looks like we are discussing a couple of different problems in parallel:
As for 1), I see two distinct cases. Software cited for the scientific record (we used package X) should exist in an archive and be cited with a DOI. The reference should be to a precise version. Software cited as a recommendation for use (we implemented our algorithm in package X, ...) should be on a development site such as GitHub, and referenced there. Point 2) is already a standard situation in citing Web resources such as Wikipedia. The habit is to state the date at which the resource was consulted. Point 3) doesn't have a good solution in the academic tradition. We are very attached to citing specific people, maybe companies, but not communities. |
@mikej888 did some work for the Software Sustainability Institute looking at citing software in traditional outputs: http://software.ac.uk/so-exactly-what-software-did-you-use This includes a summary of what various journals ask for, as well as some software platforms like R. @seinecle and @khinsen comments are very insightful - the "citation" metadata associated with a piece of software conflates a number of issues. Taking @khinsen points in order:
Author List Now this means that for 2) work in progress, citation is no different - the code version identifier and code location identifier will just point to a work in progress version. However by advertising that version through a citation, you're effectively identifying a new version of the code. Given that most repositories (like GitHub) enable some sort of hash identifier for each commit, you could simply use that as an (automatically generated) identifier.
I don't think that platform, language or potentially even license information should be part of the citation metadata (though I might be willing to budge on license). When we were undertaking the SoftwareHub project for Jisc looking at creating "showcase catalogues" of software funded by Jisc, we quickly realised that things like platform or programming language were not useful either for citation or first level discovery. They are useful for categorisation and filtering, but it they aren't as useful as they first appear. |
Another example: The PERMANOVA+ add-on for PRIMER is often referenced as a book citation:
The makers of the software don't provide citation examples for that specific add-on, but do provide citation examples for the main software package, by citing the user manual:
The citation most commonly used for the PERMANOVA add-on doesn't include information about which version of the software was used, or on which platform - this is usually described in the Methods section instead. |
Users of R are also asked to cite the manual: @Manual{,
title = {R: A Language and Environment for Statistical
Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = 2013,
url = {http://www.R-project.org}
} |
JATS 1.1 provides <element-citation publication-type="software">
<person-group person-group-type="author">
<name><surname>Goddard</surname><given-names>TD</given-names></name>
<name><surname>Kneller</surname><given-names>DG</given-names></name>
</person-group>
<data-title>SPARKY 3</data-title><!-- could be "software-title"? -->
<version designator="3.114">3.114, Windows</version><!-- needs a "platform" element or attribute? -->
<!-- use a "source" element for the host, e.g. "GitHub"? -->
<year iso-8601-date="2007">2007</year>
<publisher-loc>San Francisco</publisher-loc>
<publisher-name>University of California</publisher-name>
<uri>http://www.cgl.ucsf.edu/home/sparky/</uri>
</element-citation> |
Does this still stand as the best source for this? I will recommend to JATS4R your suggestions |
@Melissa37 This probably needs updating to take into account the Force11 Software Citation Principles. |
Here's an example of a software citation. Does it include all the appropriate information, and can it be improved? When a snapshot of the code has been archived somewhere, how should that be included in the citation?
Goddard TD, Kneller DG. 2007. SPARKY 3 (v3.114, Windows). San Francisco: University of California. Available from http://www.cgl.ucsf.edu/home/sparky/
The citation in JATS XML:
The text was updated successfully, but these errors were encountered: