Metadata Schemas/DTDs for ETDs
Here we examine the potential metadata elements that we wish to collect when an ETD is submitted online. We look at the default configuration of the qualified Dublin Core register in DSpace, as well as the metadata standards employed by two ETD specific schemas: the ETD-MS (Electronic Theses and Dissertations Metadata Schema) from Virginia Tech and the NDLTD and the TDM DTD (Theses and Dissertations Markup Document Type Definition).
DSpace Default DC Registry Contents
This metadata set comes built into DSpace and can be modified for use in the archive. It is built upon qualified Dublin Core.
|
Element |
Qualifier |
Description |
|
contributor |
A person, organization, or service responsible for the content of the resource. Catch-all for unspecified contributors. |
|
|
advisor |
Use primarily for thesis advisor. |
|
|
author |
||
|
editor |
||
|
illustrator |
||
|
other |
||
|
coverage |
spatial |
Spatial characteristics of content. |
|
temporal |
Temporal characteristics of content. |
|
|
creator |
Do not use; only for harvested metadata. |
|
|
date |
Use qualified form if possible. |
|
|
available |
Date or date range item became available to the public. |
|
|
accessioned |
Date DSpace takes possession of item. |
|
|
copyright |
Date of copyright. |
|
|
created |
Date of creation or manufacture of intellectual content if different from date.issued. |
|
|
issued |
Date of publication or distribution. |
|
|
submitted |
Recommend for theses/dissertations. |
|
|
description |
Catch-all for any description not defined by qualifiers. |
|
|
abstract |
Abstract or summary. |
|
|
provenance |
The history of custody of the item since its creation, including any changes successive custodians made to it. |
|
|
sponsorship |
Information about sponsoring agencies, individuals, or contractual arrangements for the item. |
|
|
statementofresponsibility |
To preserve statement of responsibility from MARC records. |
|
|
tableofcontents |
A table of contents for a given item. |
|
|
uri |
Uniform Resource Identifier pointing to description of this item. |
|
|
format |
Catch-all for any format information not defined by qualifiers. |
|
|
extent |
Size or duration. |
|
|
medium |
Physical medium. |
|
|
mimetype |
Registered MIME type identifiers. |
|
|
identifier |
Catch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to a local collection instead of unqualified form. |
|
|
citation |
Human-readable, standard bibliographic citation of non-DSpace format of this item |
|
|
govdoc |
A government document number |
|
|
isbn |
International Standard Book Number |
|
|
ismn |
International Standard Music Number |
|
|
issn |
International Standard Serial Number |
|
|
other |
A known identifier type common to a local collection. |
|
|
sici |
Serial Item and Contribution Identifier |
|
|
uri |
Uniform Resource Identifier |
|
|
language |
Catch-all for non-ISO forms of the language of the item, accommodating harvested values. |
|
|
iso |
Current ISO standard for language of intellectual content, including country codes (e.g. "en_US"). |
|
|
publisher |
Entity responsible for publication, distribution, or imprint. |
|
|
relation |
Catch-all for references to other related items. |
|
|
haspart |
References physically or logically contained item. |
|
|
hasversion |
References later version. |
|
|
isbasedon |
References source. |
|
|
isformatof |
References additional physical form. |
|
|
ispartof |
References physically or logically containing item. |
|
|
ispartofseries |
Series name and number within that series, if available. |
|
|
isreferencedby |
Pointed to by referenced resource. |
|
|
isreplacedby |
References succeeding item. |
|
|
isversionof |
References earlier version. |
|
|
replaces |
References preceeding item. |
|
|
requires |
Referenced resource is required to support function, delivery, or coherence of item. |
|
|
uri |
References Uniform Resource Identifier for related item. |
|
|
rights |
Terms governing use and reproduction. |
|
|
uri |
References terms governing use and reproduction. |
|
|
source |
Do not use; only for harvested metadata. |
|
|
uri |
Do not use; only for harvested metadata. |
|
|
subject |
Uncontrolled index term. |
|
|
classification |
Catch-all for value from local classification system; global classification systems will receive specific qualifier |
|
|
ddc |
Dewey Decimal Classification Number |
|
|
lcc |
Library of Congress Classification Number |
|
|
lcsh |
Library of Congress Subject Headings |
|
|
mesh |
MEdical Subject Headings |
|
|
other |
Local controlled vocabulary; global vocabularies will receive specific qualifier. |
|
|
title |
Title statement/title proper. |
|
|
alternative |
Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation |
|
|
type |
Nature or genre of content. |
ETD-MS Schema Structure
Data Types
These items define what sort of data can be stored within each element.
|
Name |
Extension |
Restriction |
Attribute |
Enumeration Value |
Pattern |
Description |
|
freeTextType |
string |
translated |
When a free text field is translated by someone other than the author, that person's name should appear as the value to the translated attribute. |
|||
|
AG: specialAttrs |
||||||
|
controlledTextType |
string |
scheme |
When the content of a field is controlled, as in subject or date fields, the controlling scheme should be annotated with either or both the name of the scheme and/or a URI describing the controlled object within the context of the scheme. |
|||
|
resource |
||||||
|
AG: specialAttrs |
||||||
|
authorityType |
string |
resource |
Each reference to an individual or institution in any field should contain a string representing the name of the individual or institution as it appears in the work. Where possible, the reference should also contain a URI which points to an authoritative record for that individual or institution. |
|||
|
descriptionRoleType |
string |
note |
Optional attributes to qualify the meaning of the description tag. "note" indicates additional information regarding the thesis or dissertation. Example: acceptance note of the department. "release" indicates a description of the version of the work. Should only be used for errata, etc.. |
|||
|
release |
||||||
|
dateType |
string |
YYYY-MM-DD |
As defined in ISO 8601 and the profile recommended for implementing ISO 8601 dates in Dublin Core. |
AG = Attribute Group, in this case provided by W3C.
These are the actual metadata elements in ETD-MS, along with their datatypes from above and whether they are required or not.
|
Element |
Sub-Element |
Data Type |
Additional Attributes |
Required |
Description |
|
title |
freeTextType |
y |
A name given to the resource. In the case of theses and dissertations, this is the title of the work as it appears on the title page or equivalent. |
||
|
alternativeTitle |
freeTextType |
n |
Alternative title of the thesis or dissertation. |
||
|
creator |
authorityType |
y |
An entity primarily responsible for making the content of the resource. In the case of theses or dissertations, this field is appropriate for the author(s) of the work. Like other names and institutions, this field should be entered in free text form as it appears on the title page or equivalent, with a link to to an authority record if available. |
||
|
subject |
controlledTextType |
y |
The topic of the content of the resource. In the case of theses and dissertations, keywords or subjects listed on the title page can be entered as free text. The "scheme" qualifier should be used to indicate a controlled vocabulary. |
||
|
description |
freeTextType |
n |
An account of the content of the resource. In the case of theses and disserations, this is the full text of the abstract unless otherwise qualified. |
||
|
descriptionRoleType |
role |
||||
|
publisher |
authorityType |
n |
An entity responsible for making the resource available. This is typically the group most directly responsible for digitising and/or archiving the work. The publisher may or may not be exactly the same as thesis.degree.grantor. Like other institutional names, this field should be entered in free text form as it appears on the title page or equivalent, with a link to to an authority record where available. |
||
|
contributor |
freeTextType |
n |
An entity responsible for making contributions to the content of the resource. Typical use would be for co-authors of parts of the work as well as advisors or committee members. Co-authors of the entire work would be more appropriate for the creator field. |
||
|
string |
role |
||||
|
anyURI |
resource |
||||
|
date |
dateType |
y |
A date associated with an event in the life cycle of the resource. In the case of theses and dissertations, this should be the date that appears on the title page or equivalent of the work. Should be recorded as defined in ISO 8601 and the profile recommended for implementing ISO 8601 dates in Dublin Core. |
||
|
type |
freeTextType |
y |
The nature or genre of the content of the resource. This field is used to distinguish the resource from works in other genres and to identify the types of content included in the resource. The string "Electronic Thesis or Dissertation" is recommended as one of the repeatable values for this element. In addition, specify types of content using the standard vocabulary found at: http://dublincore.org/documents/dcmi-type-vocabulary/. Degree and Education Level are now handled by the thesis.degree field. |
||
|
format |
freeTextType |
n |
The physical or digital manifestation of the resource. In the case of an electronic thesis or dissertation, this should contain a list of the electronic format(s) in which the work is stored and/or delivered. Use the standard MIME type whenever possible (for a list of "registered" MIME types, visit ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types). List as "unknown" if no format information is available, omit if the work is not available in electronic form. |
||
|
identifier |
string |
y |
An unambiguous reference to the resource within a given context. This can and should be used to provide a URI where the work can be viewed or downloaded. Persistent URNs such as PURLs (http://purl.org/) or Handles (http://handle.net/) are recommended. |
||
|
language |
string |
n |
A language of the intellectual content of the resource. This should be the primary language the work is recorded in. Portions of the larger work that appear in other languages should use the lang qualifier. See Global Qualifiers. Language names themselves should be recorded using ISO 639-2 (or RFC 1766). If the language is not specified, it is assumed to be English (en). |
||
|
coverage |
controlledTextType |
n |
The extent or scope of the content of the resource. Should be used for time periods or spatial regions. |
||
|
rights |
freeTextType |
n |
Information about rights held in and over the resource. Typically, this describes the conditions under which the work may be distributed, reproduced, etc., how these conditions may change over time, and whom to contact regarding the copyright of the work. |
||
|
degree |
n |
The degree associated with the work. |
|||
|
name |
freeTextType |
Name of the degree associated with the work as it appears within the work (example: Masters in Operations Research). |
|||
|
level |
string |
Level of education associated with the document. Example: bachelors, masters, doctoral, postdoctoral, other. |
|||
|
discipline |
freeTextType |
Area of study of the intellectual content of the document. Usually this will be a department name. |
|||
|
grantor |
authorityType |
Institution granting the degree associated with the work. Like other institution names, this field should be entered in free text form as it appears on the title page or equivalent, with a link to an authority record where available. |
TDM DTD Specification
This shows the elements and their various sub-elements along with the basic datatype they take and a description taken from the DTD itself. The TDM is designed not only for building the metadata for the item but also laying out the item in full. All references to the layout elements have been removed since we are only interested in storing the true metadata.
|
Element Path |
Content |
Description |
|
date |
PCDATA |
Any date can be entered. For truly precise, the attribute of "notation" allows selection of European/"eur" (day-month-year) or U.S.A./"usa" (month-day-year). |
|
pages |
PCDATA |
Identification of the page numbers in a citation if desired.
|
|
publisher |
PCDATA |
For identifying the publisher of a work.
|
|
editor |
PCDATA |
Part of a citation, identifies editor(s) of a given work, if desired.
|
|
pubPlace |
PCDATA |
For identifying the place of publication for a work.
|
|
volumeissue |
PCDATA |
Volume and issue of a serial publication.
|
|
head |
One can specify how to render this tag- e.g., italic.
|
|
|
head.title |
For now, limited to appearing only in the head element. See "worktitle." |
|
|
body |
The three main elements of the ETD, the front matter (such as certifcate of approval, abstract, titlepages, epigraphs, etc.), primary content (chapters, which are here designated TEI-friendly as "div"), and back (the bibliography and appendix) are included in the "body" element as options (hence the "?" after each) to permit the assembly of the ETD in multiple files. Effectively, there will be one "front.html" and several "div.html" files, and one "back.html" file. So that this same DTD can be used with each, those three main subdivisions are made optional.
|
|
|
body.front |
This will be a separate file, logically named as "front.html" in your ETD directory. Each item is separated by a "pb"/pagebreak tag (TEI), and visually in HTML by a horizontal rule/"hr" tag. You may find that several line breaks/"br" are useful to visually separate the material. Use the template file you have been given on disk to assure proper format. Preset, pre-formatted pages are layed out |
|
|
body.front.titlePage |
See Graduate College Thesis Manual, page 3, also examples on pages 15-16. Don't forget to add the "pb"/page break tag at the end. Use docDate for the graduation date.
|
|
|
body.front.titlePage.docTitle |
PCDATA |
This this the title of the thesis or dissertation. Since some topics may cover foreign, mathematical, author's works, etc., these sub-elements are allowed- and you are encouraged to use them- in order to assure the most precise categorization of your dissertation or thesis.
|
|
body.front.titlePage.docTitle.worktitle |
PCDATA |
Since HTML limits "title" to the "head" element, for now, titles other than "docTitle" (the title of your dissertation or thesis), will use worktitle. The "level" attribute is required, signifies whether it is "m" (monographic- a book, monograph, or other publication under a single autonomous title), "s" (a series title), "j" (a journal title), "u" (unpublished title, like a dissertation ;-). Additionally, you can use "type" to specify it is an "abbreviated" version of the title, if it is the "main" title, if it is a "subordinate" title, or, if it is a translation, for instance, it is a "parallel" title. |
|
body.front.titlePage.docTitle.author |
PCDATA |
identifying an author of a quote work, etc. In other words, an author other than the dissertation/thesis writer, identified a docAuthor. "Name" option is if the given, middle, and surnames are to be specified. |
|
body.front.titlePage.docAuthor |
PCDATA |
Fill in the dissertation or thesis author according to the sub-elements if desired to be specific as to given, middle, surname found under the "name" element. Otherwise, simple typing of the name with no such delineation is allowed.
|
|
body.front.titlePage.docDate |
PCDATA |
The graduation date (e.g., "May 1999").
|
|
body.front.titlePage.thesisadvisor |
PCDATA |
Identifies the thesis advisor.
|
|
body.front.thesiscopyright |
PCDATA |
The thesis copyright statement is optional, but should be centered and followed by a page break.
|
|
body.front.certifapproval |
PCDATA |
This is the most important part of the front matter in many ways. Candidates are encouraged to scan the actual committee signatures and place them in this section. To enable a more precise layout, the table tag is allowed. Content must conform to the Graduate College Thesis Manual, page 3, and the example pages 17-18. Ending with a pagebreak is advised. Use docDate if graduation date is required.
|
|
body.front.dedication |
PCDATA |
This is the dedication, if any. Graduate College Thesis Manual, page 3.
|
|
body.front.epigraph |
PCDATA |
for front matter, including any epigraph desired according to the Graduate College Thesis Manual, page 4.
|
|
body.front.acknowledgements |
PCDATA |
Anyone you wish to thank would go here, according to the Graduate College Thesis Manual page 4.
|
|
body.front.abstract |
This should include the full abstract submitted according the the Graduate College Thesis Manual pages 4, 9, and 29. You are encouraged to tag this in detail as many search mechanisms only go as far as an abstract.
|
|
|
body.front.abstract.abstractcover |
PCDATA |
This is the cover sheet to the abstract. For the doctoral candidate, it should be formatted as close to the Graduate Thesis Manual specifications on page 29 as possible (but, of course, you still have to submit a proper print abstract upon deposit). Use docDate for the graduate date.
|
|
body.front.abstract.abstractext |
PCDATA |
The text of your abstract goes here, see Graduate College Thesis Manual pages 4 and 9. You will also, of course, be handing in a printed abstract for UMI if you are a doctoral candidate. Using the "pre"/preformat tag can guarantee a proper double-spaced printout.
|
|
body.front.abstract.abstractext.worktitle |
PCDATA |
Since HTML limits "title" to the "head" element, for now, titles other than "docTitle" (the title of your dissertation or thesis), will use worktitle. The "level" attribute is required, signifies whether it is "m" (monographic- a book, monograph, or other publication under a single autonomous title), "s" (a series title), "j" (a journal title), "u" (unpublished title, like a dissertation ;-). Additionally, you can use "type" to specify it is an "abbreviated" version of the title, if it is the "main" title, if it is a "subordinate" title, or, if it is a translation, for instance, it is a "parallel" title. |
|
body.front.abstract.abstractapproval |
PCDATA |
The line indicating the thesis advisor's
approval of your abstract goes here, limited according the
Graduate College Thesis Manual page 9. If possible, a scanned
image of the signature is desirable. "table" is allowed to
enable precise placement of the text and signature.
|
|
body.front.toc |
PCDATA |
Table of contents.
|
|
body.front.tablelist |
PCDATA |
The table list, if applicable, is much like any ordered/unordered list. The preference of the specific department/committee chair is honored. Cosmetic spacing with the horizontal rule/"hr" is permitted. TEI-compliant "pb" is required at the end.
|
|
body.front.figurelist |
PCDATA |
The figure list, if applicable, is much like any ordered/unordered list. The preference of the specific department/committee chair is honored. Cosmetic spacing with the horizontal rule/"hr" is permitted. TEI-compliant "pb" is required at the end.
|
|
body.front.symbolabbrevlist |
PCDATA |
The symbol abbreviation list, if applicable, is much like any ordered/unordered list. The preference of the specific department/committee chair is honored. Cosmetic spacing with the horizontal rule/"hr" is permitted. TEI-compliant "pb" is required at the end. In case there are symbols rendered only as images, the "img." tag is allowed.
|
|
body.front.preface |
PCDATA |
The preface is optional, and should conform with the Graduate CollegeThesis Manual guidelines for content on page 5.
|
|
body.back |
This is all the matter following the text proper, bibliography, appendix(es, if any), and the notes.
|
|
|
body.back.notes |
PCDATA |
These are the end, or foot, notes. The notes follow an unordered list, preferably. If they were to be identified by number, as revisions occur the chance of broken references (a href="#mycitedsource" rel="note") increases. It can also be more intuitive to remember notes by topic for editing and cross-referencing ease. However, the ordered, or numbered, list is also allowed, for those who prefer the conventional methods. In this case, "sup" is also included so a conventional superscript numeral can designate the link to the foot/endnote.
|
|
body.back.bibl |
PCDATA |
The bibliography is entered here, to the degree of detail desired.
|
|
body.back.appendix |
PCDATA |
This model for the appendix accounts for basic text, links, images, and some formatting. If you have more than one appendix, use the "id" attribute to differentiate. "Argument," instead of "hi" is the tag for identifying revisions (n) and any actual argument or rhetorical structures (id). |
|
body.back.appendix.argument |
ANY |
to which attributes corresponding to the arguments identified above (id) and/or specific revision-related material (n) that occurs in an appendix (e.g., a committee member states that given material should be moved to an appendix, or material is too big for just a note). As TEI does not accept "hi" in the appendix, this tag is used instead.
|