Digital Archive Sabbatical

This blog is for anyone interested in or experienced with digital archives and institutional repositories, especially in science and technology libraries.

Friday, August 05, 2005

Friday with DRC-Dev

Last Friday (July 29) the development team missed a conference call as Peter Murray, our leader at OhioLINK, was sick with walking pneumonia. However we resumed today with use of Gizmo software for conferencing. See www.gizmo.com. Gizmo works fine for 2-3 people, but with an entire group on the line, it was difficult to hear and people got dropped. Perhaps we will return to using Elluminator.

New terms arose today, as seems always to be the case with me! Terms such as
FOXML
RDF - Resource Description Framework
XACML - has to do with access control
semantic web
kowali triples - RDF is structured as a triple or triples. Triples include three pieces of info about an object: the subject, its attributes, and its value.
CBS trees - Complete Binary Search tree

I discovered a great Scientific American article (May 2001) on the semantic web by Tim Berners-Lee et al. Scientific American: The Semantic Web
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities

This helps to explain some of the concepts we discussed today. The goal is to create an institutional repository that will be flexible in describing objects and creating relationships among them, even when they are dissimilar object types or in different "collections."

Thursday, August 04, 2005

Strauss' UC career

Today I went to the UC Archives and perused the UC Catalogues for the years that Strauss was at UC. He started as a freshman in 1888 and graduated in Civil Engineering in 1892. Back then there were only about 125 students in the entire Academic Department of the University, with about 16 of them in Civil Engineering. One professor, Henry Turner Eddy, taught the entire Civil Engineering curriculum back then, and also served as dean of the Academic Department (as distinct from the medical, dental, and pharmaceutical departments of the university). In fact back in 1884-1885 Eddy taught Civil Engineering, Math and Astronomy!

The year after graduation Strauss was listed as a graduate candidate for the Master of Letters, and also living in Trenton NJ as a draftsman at the Steel and Iron Company. In 1893-1894 he was again listed as a masters student, though not a candidate for a degree. His address was back in Cincinnati at 290 West 7th Street (rather than his family home at 360 West 9th Street). In 1894-1895 was an instructor in the Civil Engineering program, teaching alongside Professor Ward Baldwin, who served also as registrar for the Academic Department. By this time the Academic Department had grown to 247 students.

Back then the catalogues listed all the graduates for each year, their credentials, and what they were doing. Strauss was listed in the 1895-1896 catalogue as a draftsman at Elmira Bridge Company in Elmira NY 1895-. In 1910 the university couldn't keep up with the graduate listings, but created a directory of graduates. Strauss was listed in the 1926 directory as living in Chicago at 3100 Sheridan Road. His bridge company was in Chicago, but by 1926 he was also spending much time in San Francisco selling the idea of the Golden Gate Bridge.

Sunday, July 24, 2005

Fedora discussion

Friday the DRC-Dev team discussed the underpinnings of Fedora, the platform selected for creating the Digital Resource Commons. The presentation method itself was interesting, using Elluminate software available through the Ohio Learning Network (OLN) to push information on a whiteboard to remote users and allow discussion and text messaging among participants. The process was definitely successful.

We learned about Fedora's digital object model, with four components: a DOI or handle as identifier; methods to disseminate or view the object; content; and system metadata. And there are four content types: managed, external (like a URL), redirects to other sites, and XML. The overall architecture consists of an interface (web service plus OAI provider); application logic in Java; storage, a relational database management system (RDBMS).

There are more acronyms and terms to learn and/or review:
digital object serialization defined by XML schema
extensible = can associate services with objects
extensible object model
DOI and handle
OAI-DC
SOAP-based versus web-based
web service
server container package

University of Virginia is deveping for digital archive application. See http://www.lib.virginia.edu/digital/resndev/fedora.html .
Their archive using Fedora? http://www.lib.virginia.edu/digital/collections/image/

A May 2005 Users Group conference hosted 110 implementers with objects as diverse as streaming data from a temperature sensor.

Wednesday, July 20, 2005

Faculty presentation

Today I presented a brief description of my academic leave activities to the University Libraries Faculty. The information is summarized in a power-point presentation.

Tuesday, July 19, 2005

USC Digital Archive

Today I reviewed the USC Digital Archive. It's been redesigned! I remember serving on the committee that started the redesign back in the fall. They have gone live with it. You can browse collections, and search across collections. You can even search within specified collections with the Advanced Search. You can scroll through images. You can see the metadata for the images. And of course you can get a larger view of the images. It's fun to see something I was working on come to fruition!

Friday, July 08, 2005

DRC-Dev homework

I can see that keeping up with the DRC-Dev group will require some review on my part. After the conference call today I made a list of all those feisty acronyms and terms with which I need to be more conversant. Things like

  • Metabuddy
  • VRA 4
  • DC (Dublin Core) and FGDC
  • CEN
  • Getty Crosswalk
  • Shibboleth
  • DMAP (descriptive metadata application profile)
  • Luna crosswalk - CWA
  • other crosswalks - CDWA XML, etc
  • Handles and handle servers
  • RDF (resource description framework)

I covered many of these way back in October but they didn't "stick" yet.... We are still debating about multiple schemas and interfaces for users - how can we be complex but simplify for use? We are to look at interfaces of other places. We should stress the needed search functionality and design to that need.

Friday, July 01, 2005

DRC-Dev Team

On April 22 I wrote that it looked like OhioLINK would be using Fedora to build the Digital Resource Commons rather than Documentum. That's indeed what's happening. Fedora is an open access software developed at Cornell. (Interestingly, the Cornell librarians at ASEE reported they are not using Fedora at Cornell for their digital archive....) The University of Virginia is using Fedora as its platform though, and has done much development work.

Fedora out of the box is very "raw," a basic structure upon which to build. It is supposedly very robust, allowing capabilities not provided by other open access software such as DSpace. For example it allows not only cross-collection searching, but more importantly, the specification of object to object relationships. It enables mixing object types (videos, text, data sets, simulations, etc.) and varying uses (e-publishing, repositories, exhibitions, portfolios, etc.).

Peter Murray is heading the OhioLINK DRC-Development Team meetings. The conference calls are interesting, with participants from across Ohio, plus developers at OhioLINK itself. As Peter stated, the change from using a proprietary software such as Documentum to an open access softare such as Fedora signals a change in OhioLINK's approach to system development. OhioLINK members have become engaged in the development process via the DRC-Dev team. The project can be viewed at http://drc-dev.ohiolink.edu/wiki .

Today the DRC-Dev conference call struggled with the concept of object-specific application profiles and where the application profile for the metadata should reside - with the object? with a collection? Having different metadata sets for different types of objects means having different ingestion profiles for entering data into the system. This can get confusing, depending on who is doing the inputting. A trained team of "ingesters" is different than occasional inputting by faculty participating in a department repository. We want adequately to describe various objects while at the same time supporting global retrieval of all content.

I will be participating on the development team, lending what insight I can relating to

  • supporting teaching and research needs in terms of function
  • contributing content
  • helping to track relevant technologies
  • sharing ideas on what the DRC should look like
  • planning to use the DRC in my work.

The DRC will be the platform where I hope to put the engineering repository. It will be a while before it's ready....