Steps from Scanning to Omeka Archive

Steps from Scanning to Omeka Archive

Open the Toshiba scanning software and set the beginning series number for the scans you are about to do. It will be something like “scpa00700003001”. Use the Crossroads File Name Conventions ( /documents/staff-resources) . Note that the hyphens in the suggested names will not carry down to increment the previous number in a spreadsheet. So we have been leaving out the hyphens.
Use the PMDublinFields.xls spreadsheet to start a new spreadsheet. . Open it and save it under a new name for the collection you are working on.
Enter the beginning series number from step one in cell A2 of the spreadsheet.
Start scanning and enter new item metadata in a new row of the spreadsheet as you go. You can drag the item number down to automatically increment the number.
When you have scanned all the documents, save the spreadsheet
This would be a good time to back up the images to an external hard drive. [Break Time]
Now we need to build a text file that Photo Mechanic can use to embed the metadata into the images. With the metadata spreadsheet open, export it from an .xls file to a .csv file in UTF-8, tab delimited (no commas or quotation marks). Warning: Excel does not do a good job of exporting to UTF-8 so we need to use a text editor to make some changes and insure that it is saved properly.
Open the .csv file in Textpad or a similar text editor. Delete the top row with the column names. Photo Mechanic (PM) only uses column order. Save it with and extension of .txt in UTF-8.
Use the PM Edit->Settings-> Set Code Replacements (Ctrl-Alt-C) to set the code replacement file to the .txt file that you just saved as UTF-8. Remove any other files that may be displayed on the Code Replacement screen. You can edit the file from that screen to make sure it is right. Also check to see that the Delimiter Character is “\” and the Default Replacement is “Missing”. In Photo Mechanic (PM), select all the files in the folder with the scanned images.
In PM, select all the files in the folder – contact sheet in their terms.
In PM, go to the Image->IPTC Stationary Pad (Ctrl-I) to open the stationary pad.
We need to create a Transmission Reference id of the file name in the stationary file. Use the clear all button to clear anything left in the stationary. Then in the Transmission Ref: field put {filename} and push the button to “Apply Stationary to Selected” at the bottom of the page. This will load the filenames into the metadata for each file. You can verify this by clicking on the “I” circle in the lower left of any image you place your cursor over.
Now go back to the IPTC Stationary Pad (Ctrl-I) and clear again. Now you need to load the PMStationary.XMP file that uses the file number in the Transmisison Ref {transref} field to link to the metadata by column number in the .txt file. You can download the current standard PMstationary.XMP file here: Make sure that “{filename}” is not left in the Transmission Ref field and it is no longer checked.
With the PMstationary file loaded, push the button to “Apply Stationary to Selected” at the bottom of the page again. This should show a progress bar while loading your metadata into the proper images. Check the I circle on any image to see if the metadata shows up. If it is not there or all the fields say “Missing” there is a problem with the .TXT file or the filenames. Check for capital or small letters in the filenames, UTF-8, tab delimited etc. There are no error messages to tell you exactly what went wrong in PM.
These original images, usual .tif files are typically too large to put on the Omeka site. We save them for the State Archives or local archives if someone wants to use the original high-resolution scan. Also, typically, they have Kodak color separators in the image which we don’t want in the presentation image. To create cropped .jpg files of the desired resolution we need to set the crop for each image. Double click on the first image. You will get a screen with the crop tool on the left and the series of photos along the bottom. Create a crop box on the photo and then click on the next one below. Move through all of the images cropping out the color separator.
The crop box will not take effect until the selected images are saved as .jpg. Go to the file drop-down menu Save Photos As (Ctrl-S) and chose Image Type .JPEG, Third tic on Quality. Check Apply Cropping, under custom scaling set custom width to 900. (Situations might vary on these parameters and some testing might be needed in each case.) Resolution at 200 dpi should be good enough. Under operations, check “Preserve EXIF information when possible”. Under Destination check “Create Subfolder” and name a new subfolder. You can also check “Open destination as a contact sheet.” Now click “OK” and the cropping, copying to a new file type and destination and resizing will all take place automatically with the metadata intact. We now have a folder of files ready to upload to the Omeka Archive. But before we do, we need to create a .csv file to Omeka specifications.
Open the original .xls file again and save as a .csv delimited with commas and quotes.
Open the .csv file to make sure that the top row has the column names from PMDublinFields.xls. but more importantly, we need to do a search and replace to put the URL to each file we upload into the first column. We have been staging these files in a folder on the Omeka site called “upload”. So to create the URL, we replace the “scpa” with “ for each file name using a replace all function. Now save and close the .csv file. Linux is case sensitive, so make sure there are no capitals in the URL.
You may need to move the image folder and .csv file to a computer with FTP software and/or a high speed connection for the next steps.
Log into the crossroads.net/upload/ directory. Load the presentation-quality photos into the upload directory. (This seems silly because Omeka is going to copy them into its own content directory with names that it creates on the fly thus taking up twice the space needed. We can actually delete these files from the upload directory later but it is sometimes handy to have their location and original name close at hand.) By no means should you delete these files until the CSV Import function is complete because sometimes it fails and you need to back off the import to fix something before trying again.
Now us FTP to copy the .csv file into crossroadsarchive.net/plugins/CsvImport/csv_files. This is the directory where the CSV Import utility will look for it in the next steps.
Now you need to login to the crossroadsarchive.net site as an administrator. (Since this is going up on the Internet, that process is not described here.)
In the administrative dashboard, click on the CSV Import tab. Chose the Import Items tab. Select the .csv file you are going to use from the drop-down field button. Chose the collection to put it in. Check “Public”. Also leave the check on the “Stop on Error” check box. You might need it. Go to the next page and match up the Omeka field names with the .csv field names.
Be sure to check the File check box after the Identifier field at the top, otherwise it won’t import the images. But don’t check any other File check boxes.
Start the CSV Import. You will need to refresh the status screen periodically to see how it is going. If there is a file missing or miss-named the import will stop with an informative error message. Use the undo option to reverse the import. Fix the problem and do steps 23-25 again. You can use “Clear History” to remove aborted imports from the list.
Go to the CrossroadsArchive.net items page to confirm the import. The last imported item should be at the top. If it is not, it’s going to be a long night. Compare your .csv file to other successful ones and try to figure out what happened. If all your items uploaded correctly…
Take a break. You deserve it.