Summing it all up. December 4, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, digital collections
add a comment
Given that this is my last post for the class, I thought it only appropriate to reflect over what what I have learned. The first thought that immediately comes to mind is ‘I know almost nothing’. Every single issue I researched led to an increasing complex series of questions and I will avoid the obvious Hydra reference here (meta-hydra references, you have to love them). This is hardly surprising to me at this point in my life but it is refreshingly humbling. Yet, as I reached back and looked at what I knew coming into the class, I have learned quite a bit about the construction of digital collections.
Perhaps my greatest regret was not being able to build a collection during the class–unfortunately my work schedule didn’t move quickly enough–but I will be creating a digital collection in the near future, so this class will undoubtedly be useful at that time.
While I did find the loose structuring of the class at times maddening, I have to admit that it does allow you to pursue your own interests, thereby resulting in increasing investment in the process. (Which I am sure was the point) Another advantage to the ‘browsing collections’ model, is that you do get to see a lot of very good, and very bad, digital collections. This will also be useful when buiding my own collection.
Finally, this class has made me do something that I have never done before, blog. While I don’t think I will be lighting up the blogosphere anytime soon, I have to admit that it has been a good experience, and it has given me an interesting way to communicate with instructors. (I am using a blog in my internship this coming semester)
Its been fun,
Jeremy
Creating Digital Collections for Illiterate Users December 4, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, digital collections, illiteracy
add a comment
After my last post I did some web surfing of sites in other languages. The results were fairly frustrating as I realized that even basic navigation on most websites assumes a certain amount of literacy within the host language. In some cases, basic navigation is possible, using a combination of educated guesses, trial and error, and pictures. In other cases, especially in heavily text based websites, almost no meaningful information could be gleaned.
This led me to the conclusions that many websites and digital collections could benefit from making their sites friendly to multiple languages. On the heels of that came the thought of ‘what about digital collections for people who cannot read or do not read well?’. I suspect that for many people, the basic assumption is that anyone who utilizes the internet is, by default, literate. While in many cases this is correct, it is not the case in all environments. (see page three of this presentation by University of Waikato) It also ignores that fact that literacy skills take place along a continuum, with a wide range of varying literacy profiles.
Studies by multiple organizations in various parts of the world have indicated a strong corollaries between internet use, literacy levels, and formal education. The Office for National Statistics United Kingdom issued this study linking educational attainment and internet access. (This obviously requires a country where internet infrastructure is readily available to be valid) Similarly in the United States,a Pew Internet survey arrived at a similar conclusions looking at the Latino population. (It should be noted that Pew data has several methodological problems, such as rely on land based phone lines for their survey group–no phone=unsurveyed population)
What I found interesting in the Pew data, was the language barrier. Specifically, they noted “Language is also a powerful factor, as internet use is much higher among Latinos who speak and read English fluently than among those who have limited English abilities or who only speak Spanish. ” (ii) This population may be literate, in Spanish, but not in English–thereby greatly limiting their internet options. (This also throws an interesting wrench into the basic literate/illiterate distinction)
Before I get to far off on a tangent, let me bring us back to my main point of ‘how can digital collections be made more accessible?’ Obviously, creating them in multiple languages would be helpful, but what about those who are not completely literate in any language? The University of Waikato paper provides some interesting ideas in how digital collections might try to cater to illiterate information seekers, ranging from increased pictorial or icon useage to audio versions of collections. It would be interesting to see what other solutions could be employed if more people considered this often neglected portion of the population when designing digital collections.
On a lighter note… December 4, 2008
Posted by jeremyjord in assignment posts.Tags: architecture, assignment posts
add a comment
Well my viewing public, it is time to take a look at one of my hobbies, architecture. It seems that I am not alone in my predilection for structures, since others have built digital archives on the topic. The Digital Archive of American Architecture for example. This site was created by a professor at Boston College to serve as a supplement to his course, but it is interesting in it’s own regard. It is a rather simple, plain website that does have several interesting ways to group the images. It also has the ability to be searched, and has some basic metadata built into it. For example, when you search for ‘church’ you also get cloisters and other religious buildings. You can also search for specific names of architects or particular buildings. Not to mention, it has its own section on libraries, which boosts this collection up in the ratings in my opinion. While not the flashiest collection out there, it is a good example of a basically small collection that anyone could create, and should be easy to maintain.
I found this nice collection that focuses on Islamic Architecture. Interestingly enough, it is a nice little web community that includes spaces for professionals and students. Classes can use the posts to conduct their affairs, architects can use it to discuss the finer points, and I can look at it just to see an interesting architectural tradition that is not well represented in my corner of the globe. It has good sort functions on their searches, and is also browsable. My only complaint while browsing, is that it can sometimes take a bit before you actually get to the image, but this is a minor issue. Another feature I really enjoyed about this site is that it gives information about the photograph–photographer, who holds the copyright, what it was taken with–that many sites do not include that makes it a lot easier to contact the owner of the images.
Finally, take a look at this gem. Really, go look at it right now! The reason I gave you this link was to give you taste of what a good part of the world gets to see when they browse the Internet–things in another language that is not their native tongue. The nice thing about this collection is that if you click on the United Kingdom flag icon on the left and you are back in English. I really wish more collections in the United States had this feature (and everywhere else for that matter), it can really expand your audience, in addition to just being a nice thing to do. It also has many different resolutions of the same photo that lets you zoom in if you want to see more detail. The search function seems to work well, even when searching using English terms. It ties art to architecture, a relation I always enjoy emphasizing. The browse function also works well, with the groupings being fairly intuitive. Well done.
Enjoy!
Curse You Nicholson Baker Part Two December 4, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, card catalog, Nicholson Baker
1 comment so far
Greetings Everyone,
My plan to solve that nasty little conundrum mentioned in my last post. To start with I would perform a manual audit of the records to ensure data entry integrity. This is tedious and does take a lot of person hours, but it does have several beneficial elements. 1) You really get to know your records this way. I have learned a lot about the institution and it’s holding by performing this task. I have also came across many unresolved issues that have been solved. (like finding out that they were still owed money from an insurance settlement over 10 years ago) 2) The collection is still small enough to allow this to happen. As their collection grows, this will become an increasingly improbable task to perform in this manner. 3) The job gets done right–call me old fashion, but I still hold to this quaint custom. 4) I now understand the needs of each subset of their records–that is–I have important institutional knowledge that probably couldn’t be gained as quickly about their record keeping needs.
Having largely finished this task, it allows me to adjust the database that will be adopted to better suite the institution’s needs, and that will greatly increase its functionality when compared to the original. (I know Dr. Martens, you are ready for me to get to the Digital Collections part, so brace yourself) The next step is to optically scan the orginal documents and to attach the optical scan to the record. Actually, I would build a link to another data storage area on the network drive, and then give the record and the optical file name a unique identifier so you can always know which record corresponds to which scan, even if the link structure changes or is later removed. By not putting the optical image on the record itself, it speeds up the load time and keeps the database useable by reasonable PCs. Additionally, most people will not need direct access to the optical scan, so it seems unnecessary. You back the whole thing up on the server and on an external server, and you have a reasonably safe backup, and just to be safe, drop it all on an external hard drive.
It would take up a fair amount of digital space, but it won’t be growing (the optical scan portion), since the new records are born digital. This allows you to still throw away all of the old paper copies and still maintain those useful little bits of handwritten information and symbols that can’t be captured in the new database. It also means you can get rid of all of the file cabinets that they are kept in. It also leaves the filestructure open so that if you want to go public with the images later, you can harvest the links or copy the original images.
So that leaves me with one basic problem left from the Baker essay–what if someone wants the original at some point in the future? I suppose that their is nothing stopping the institution from storing the documents but I doubt that this will happen. In the final equation, I suppose my answer is that the physical document is less important than the information that it conveys. I realize that some of the information is the document, in the same way that the new electronic document may have some interesting historical value in the future. As Baker mentioned, if someone wanted to look at the type of paper used, they won’t be able to do so. But I do feel that the majority of the orginal information is being passed forward, the ability to retrieve the information is being improved, and hopefully the long term storage prospects are better for electronic media prove to be better than paper. (Though this still needs to be demonstrated)
Curse You Nicholson Baker! December 3, 2008
Posted by jeremyjord in assignment posts.Tags: Digitization, assignment, Nicholson Baker, card catalog
add a comment
If you haven’t read Nicholson Baker’s essay “Discards”, please do go out and read it this instant. The rest of this post will make far more sense. (Baker, Nicholson. 1994. “Discards.” New Yorker 70, no. 7: 64-86)
Let’s talk about me. I have spent much of the last four years of my life pressuring various institutions to hurl all of their paper records out the window. Having been involved in a startling amount of projects involving the digitization of records combined with a lot of projects involving retrieving records from older, more paper based systems, I can only say that I generally believe that digital retrieval is superior. (and certainly preferable for those of us with dust allergies!) This is hardly a revolutionary statement in this day and age, but there it is. Now let’s move forward to my current position.
I am currently involved in preparing an office at a mid-sized institution’s database for transfer into another database. The majority of these records were not entered in a uniformally electronic fashion until 2002. At that time, someone had the joy of manually entering all of the records, numbering around 10,000 or so records. Each of these records was derived from a variety of older paper records ranging from a hand written note recorded on a feed and seed bag to typed pages. To complicate the issue, the records are from many different scientific disciplines, so the odds of anyone understanding the content of all of them is next to nothing.
Throw in a few score of intervening data entry personnel and you get to me. I inherit a database maintained by dozens of people who each have their own (apparently unique, to be polite) systems of data entry. This makes the age old solution of writing scripts or macros to neatly drag the old information into the appropriate new information field impossible (or at least not cost efficient). Additionally, it is known that previous people in my position loved working with the paper records and would just type in “see paper copy” in many of the fields. Since the institution wasn’t considering digitizing the orignal records, manual editing and reentry has again raised its ugly head.
That is when a strange thing began to happen. I noticed that on the old records, there would routinely be information that was never transferred to the new record. In some cases, this was because their was no appropriate field to add it to, in others, it was simply missed. But in almost all cases, the paper copies had more information on them than any of the electronic copies. Some of this information meant nothing to me, but when show to other members of the institution, it turned out to be useful. Worse yet, some of the information was hand written symbols, strange notes, personal codes, etc. Much of it was worthless, but who knows what might turn out to be useful?
This throws a wrench into the works–namely, how am I going to get all of these records entered in a timely fashion and still maintain the integrity of the data. I have proof that some of the notations are useful, but only a small percentage of them are likely to ever be relevant in the future. Perhaps we could optically scan the originals? Nope, to much time and cost involved in the long term storage. How about a third party digitization service? Not going to happen, they cost to much and they will have no chance of knowing what is important–especially those strange hieroglyphs…(and I am not kidding). The answer? Type as much information as can be transferred and leave a note in the notes field “see paper copy for more notes”.
For those of you who know me, this makes me want to puke. Instead of having a paper filing system, they now get to maintain a paper filing system and an electronic filing system. Forever. (or at least until someone figures out an answer of throws more money at the project)
Nicholson Baker talks about my problem, writ large, that happened when libraries moved away from card catalogs. He outlines how during the conversion process that their were data entry errors, searchability issues, orphan files, and missed records. The combination of these problems means that in some cases, it would be easier to reference the original card catalogue than it would be to use the electronic version. But the growing amount of data precludes continuing to use that option. To make matters worse, costly and time consuming fixes have to be made to all of the old information that was entered incorrectly in the first generation of data entry, and in many cases that information can only be verified by reference to the orginal (paper) record.
This means that you need to hang on to these original documents–thereby negating one of the advantages of digitization: namely being able to throw away the original and not have to store it physically anymore. Worse yet, after patrons search for and find a record digitally, it is possible that they will have to then march down to where ever those records are kept (if they have been kept) and look them up the old fashioned way. This means that they get to perform two searches instead of one. So much for progress…Stay tuned for more on this topic.
.
Real problems November 19, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, data management policy, Digitization, Internet Archives, Taxacom listserve
1 comment so far
If you surf the internet long enough, you will find the most interesting things…
Witness this forum thread on the Taxacom listserv. Taxacom, for those of you who are not regular subscribers already, is used in the natural sciences-particularly biological/botanical taxonomy discussions. Still with me? While the discussions are interesting if it is your field, it is highly specialized. This thread, spelling aside, points out several very interesting issues about electronic data in general.
First, internet data can be ephemeral. Web pages can be created in quickly and can disappear just as quickly. There are some organization attempting to capture the internet at particular moments, or at least parts of it (see my earlier post on the Internet Archive), but a great deal of the internets webpages rely on the servers of their creators.
Second, educated people can have difficulties finding what they need on the internet. Consider the linked thread which was started by M. Alex Smith PhD about not being able to find the webpages–this is a PhD who presumably is going to have better than average research skills. The basic root of his problem was that the webpage links that were originally posted as a reference for scientific data were incorrect/outdated. He then posts his problem and some good souls are able to direct him to their current location. When he gets their, he is able to open the file in question, but the data is not correctly aligned anymore. I can only wonder at this point if he isn’t really wanting to go and look the information up in hard copy.
Third, migrations to different formats can create problems. This isn’t really news to anyone who has been involved in computer or library sciences over at any point over the last 30 or so years, but it doesn’t seem to be going away, and may be getting worse as the speed of technological innovations accelerates. We also don’t really know what the long term reliability of electronic storage media is. CD’s, DVDs, external hard drives, etc just haven’t been around long enough to provide good temporal studies. Some have argued that it is largely an irrelevant question at this point since even if your CD lasts 250 years, it is highly unlikely that their will be anything around capable of reading it at that point. The solution to the problem seems to be migrate, migrate, migrate. Of course everytime you transfer that information to a new format you have to be careful that nothing is altered, lost, etc. This isn’t impossible but it does take a lot of planning and work. But as the volume of information increases, it may be easier to lose important digital data.
All of this may make it seem as if I am against digitizing information–I am definitely not–consider the vast amounts of documets/artifacts/information lost to natural disasters, wars, etc and you can see that print documents share many of the same problems that electronic ones do. (Can anyone read old English in the room? This is an example not entirely different from the electronic format obselescence problems) It does, however, point to how a well thought out data management policy is vital to the success of any electronic collection.
Fair Use and Copyright! OH MY! November 15, 2008
Posted by jeremyjord in assignment posts.Tags: assignment, copyright, fair use, guide, law, multimedia, resource
add a comment
I am posting this information to address some of the issues confronting digital collections. After performing some preliminary research for a project I am involved with, I thought it might be nice to consolidate some of the resources that others could use as a guide when creating and maintaining their collections.
The Library of Congress has a great deal of quality information on copyright issues in it’s U.S. Copyright Office website. This site will allow you to search for copyrighted documents and also gives you useful information concerning who to copyright your own documents. Links are also available to various copyright licensing organizations and Publication rights clearinghouses. (See here)
Stanford University also has an interesting website that tracks current legal cases involving fair use issues as well as having a wealth of resources concerning copyright law–with a heavy emphasis on fair use related issues.
The University of Texas has a great tutorial concerning the basics of copyright use issues in a variety of media. It is also set up to be easily understandable. It also is explains the differences in use depending on what type of organization you are–ranging from universities to libraries to lectures.
Interestingly enough, there are some conditions where you may have the ability to show an object under fair use law but still need to gain permission to exercise the copyright owners right. The UT site address this sort of issue that frankly I hadn’t considered before now. (Think trademark and patent use)
Hopefully this will get you started on the long journey of copyright compliance.
Enjoy,
Jeremy
Digital Preservation of Photos November 2, 2008
Posted by jeremyjord in Uncategorized.add a comment
In case you can’t tell, I have been a bit hung up on digitizing photographs lately. One of the burning reasons for this is that photographs, especially those in color, tend to break down quickly. I have also been presented with about 3000 photographs, ranging from the 1930’s-2000. The good news is that most of them have been stored in museum grade storage conditions for most of their life, so their deterioration is minimal. The bad news is that they are, of course, still deteriorating.
The economy being what it is, there is only a minimal budget in place for this program, so cost is a consideration. Many people find themselves in this sort of situation, either with their personal collection, or with institutional collections. A useful website on the subject of photograph digitization can be found here that is relatively easy to understand. I will highlight a few relevant concerns common to photographs.
1) Where and how are the photos being stored? If you answered in a little plastic photo album with the fold-over transparency sheet cover, abandon ye all hope. It is not unusual when removing the plastic cover from the photograph, that you take about half the photograph with it. Over time it will bind with the page. (Run out now and remove them!) Is the storage area climate controlled? Generally speaking you want them stored in a cool, low humidity area, or it will increase the degradation rate of the photograph. (You will tend to see the colors fade). Is the area low or no light? Light will also damage photographs, so avoid placing them in front of windows or other light sources.
2) If you plan to digitize your collection, what are you going to use to do it? Flat bed scanners are probably the most common device, and can do a good job depending upon the quality of the scanner, but have a few concerns. First, you must keep the scanning area clean. This involves making sure your photos are dust/particle/ketchup free before scanning them. The scanner itself will also need frequent cleaning or you will end up with less than ideal photographs. Second, if you plan to use a scanner, make sure you have a reasonably good one. Depending on what you are scanning, your selection will vary, but if you are spending less than $150, you are not spending enough. Third, you can use a digital camera to photograph your photographs. Lighting is important and you need a fairly advanced digital camera. (Probably in the $400 range, see recommendations here from the Getty Museum. Keep in mind that your lighting generates heat, which is bad for your photographs, so try to limit their exposure and work in a cool area.
Finally, where are you going to store your digital images? You will want a high resolution ‘master’ version of your photograph that you will not use routinely, and probably at least one other version for more common use. All this adds up to a lot of space. (PS-don’t store it all in the same place, in case of a crash, you will lose everything) Do the math in advance and purchase the appropriate size external hard drive, server, network space, etc for your collection.
Finally, read the literature. There are many website, books, and magazine articles on the topic that can be quite helpful. Just make sure they are a respected source in the field.
Enjoy!
JPEG 2000 October 24, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, Digitization, JPEG 2000, NARA, Photo Archiving
add a comment
It is a brisk–some would say cold–October night and I find myself curled up with my computer, the NARA Technical Guidelines for Digitizing Archival Materials for Electronic Access manual, and a few articles on JPEG 2000. I know, it sounds like a recipe for romance…
Strangely enough, perhaps it is… I have been investigating various file formats to use for a photographic collection and found myself becoming more impressed by the moment. JPEG 2000 has several interesting and beneficial features that were not present in JPEG files. JPEGs are well known for being ‘Lossy’ files, which means when it compresses an image, you lose detail. This loss affects not only the initial save, but each subsequent time you save the file after modification. This makes the format completely unsuitable for anything outside of photoshopping a tutu on your dog. Yes, a lot of people do spend their time doing just that, witness an example. The obvious advantage of a JPEG is its relatively small file size and its ease of use on web based applications
JPEG 2000 makes several notable improvements. First, you have an option of chosing a ‘lossy’ or a ‘loss less’ file compression mode. This makes it a viable choice for use in a more archival setting, as you can edit an image, without losing fine detail (or at least very little). It also extends support to color profiles and and layers, and generally makes it easier to modify the image in question. As Lancefield notes in the the earlier JPEG2000 link, it also contains an embedded metadata set as well as space for XML-tagged data, which makes it possible to put information about the image in the image itself. This can be very useful for archival purposes. The format also gives much higher resolution rates than the previous versions–which makes it a possibility for museum grade production master file use. (Instead of the more traditional, and perhaps the standard in the field, TIFF files.) It also is compatible with Dublin core and is complaint with most of the ISO international standards. Ahh, feel the warm fuzzy glow. Next entry, I plan on spurning my newfound love by locating all of its dirty little secrets! The downside is coming, stay tuned.
Jeremy
Your two cents October 19, 2008
Posted by jeremyjord in assignment posts.Tags: assignment posts, Digitization, UNESCO, NISO, federal
add a comment
Have you ever wanted to comment on a Federal interagency initiative during its developmental stages instead of complaining about the finished result? Congratulations, you can! About a dozen government agencies have come together to form two work groups, one focusing on still content that can be captured digitally and the other focusing on content categorizing sound, video, and motion pictures. See the projects website here.
The agencies are focused on how to preserve America’s cultural heritage digitally and what the appropriate standards should be for building governmental digital collections. The hope is by creating standards that it will save time, increase access, reduce cost, and make their digital collections more accessible by other agencies and entities. See here for a related article.
If this initiative can be implemented successfully it would represent a great leap forward in governmental digital collections policy. While many agencies have their own standards, the benefits of an interagency standard should be obvious to everyone. Admittedly, it does feel like it is reinventing the wheel as any number of international and national bodies have already developed standards (UNESCO, NISO, etc) but one can see that they have obviously been examined.
More on this story as it develops.