Finding the Information You Want When You Want It: Thoughts for the SPMG on Findability

Finding the Information you Want When you Want It

Thoughts for the SPMG on Findability

February 28, 2008

Many types of information are needed to support project development from a diverse mix of topical areas, geographic locations, and timeframes. This information is located in many places and many forms. Different functional units such as Real Estate Services or Environmental Services need to access information from unique and reliable sources external to the department as well modified resources maintained by WSDOT.

Information created within project developmentis interdependent, supporting other aspects of project development. Other agency activitiessuch as strategic assessment, budget management, e-discovery, and communications, rely on information generated by project development.

Some of these information resources will be contained within the enterprise content management system being developed by the SPMG. Others currently reside outside of these systems and may continue to do so. These different systems use different control mechanisms to organize information.

The capabilities and limitations of these mechanisms affect the ability to find information to meet the varied user needs described above. But with attention, efforts can be undertaken to help information be readily “findable” to the users that need it.

Search Limitations

Finding information isn’t easy. Despite advances in technology search tools have significant limitations. But users of the systems may not understand of the limitations and therefore unaware of the impact on the results they achieve.

Common limitations to finding information are:

  1. Full text searches based on key words identify exact matches in the search term. You can miss information if a spelling is different or you use a different term. Search results are also not ranked for relevance to the search term so you can be deluged with information that is of limited value
  2. Key word searches do not identify “like” information resources with that use different terms. If one person spells out a term in a document and another uses the acronym for that term, the searcher will only find the documents with the search term they use. Keyword searches don’t commonly find synonyms.
  3. Key word searches don’t identify related terms that may lead to important information a user wasn’t specifically requesting.
  4. Document properties or attributes can aid searches but they are not consistently completed on many documents, web pages and databases. When they are used, the terminology varies substantially between authors leading to the problems identified in items 1 through 3.
  5. Some document management systems are adding “filtering” functions that learn from user searches. While these filters add value they, too, are limited to exact word matches or routine relationships. Common search functions may be improved but these may not provide the search capabilities needed for higher “value” or more complex search needs.
  6. On the web, current search tools only search the surface web. The surface web includes what is on the published web pages and in the property fields about those pages. Most attachments and downloadable files are considered part of the invisible web and will not be searched.
  7. Even surface web items may not be found. The search tools we use need to target the web sites for them to be included in their search and then incorporate them into their index. You can see this by comparing the search returns for different search tools (Google versus Yahoo versus Dogpile, for example). This can cause unpredictable delays in finding the most current information.
  8. A particularly challenging issue is finding emails and electronic files held by an individual employee. This is a particularly important issue in e-discovery. There are tools that will search text of emails and files. But again, the searcher needs to be aware of variations in terminology in order to have the most comprehensive outcome.
  9. Current search tools do not search images, other than the file name and property fields. Terminology used varies between authors leading to the problems in items 1 through 3.

Are there better ways to find information?

Yes. A significant amount of effort has been put into making information findable, particularly in the fields of library science, computer science and information science. Tools such as the controlled vocabularies, taxonomies, and ontologies are either currently available or are in development. Search tools are evolving and now include faceted searches that allow the user to increasingly narrow search criteria and semantic searches that use not only word as typed but the meaning of the word to search.

This document doesn’t attempt to describe the possible improvements but we hope that you will be interested in reviewing the options available and ways to integrate them.

What can we do?

The WSDOT Library, Office of Information Technology and Communications Office have been working to develop strategies and tools to improve findability of information resources. We support the development of a common DOT strategy and language that allow us to find all types of information resources. Actions that need to be taken to get to that point are:

  1. Assess the current capabilities and limitations of the search tools in LiveLink and Stellent. Develop specific recommendations learned from other activities to augment the document management capabilities. Use knowledge gained through LiveLink and Stellent projects to shape improvements in search tools for other systems so the information needed for project development can be easily found. (For example, can we embed synonyms to help connect topical information?)
  2. Identify common properties needed to describe all types of files. This describes the physical properties of the information source and limited information about the content (such as the title). We can learn a lot from the current formats we use as well as from the common search terms that are used. This would help us break down silos of information types.
  3. Develop common properties and guidance to describe the content of files. This would allow us to find more related and relevant information. Again, we can learn from the current templates and search terms used.
  4. Identify related terms such as synonyms and cross references. Library systems are very good at finding information because they include a relational database that links synonyms and related terms. The biggest payoff for WSDOT in improving findability will be when we can link a behind the scenes database of relational terms so that searchers can continue to use the natural language they are comfortable with and find the information they need.

Related activities underway

  1. WSDOT Library document and the WSDOT Data Catalog use controlled vocabularies to describe their information resources. A pilot project was conducted to show the value of indexing with a controlled vocabulary to describe web pages. The results were positive.
  2. Templates have been developed to describe the content of web pages. Some offices have begun to use this template.
  3. An inventory of image resources and indexing strategies is just beginning through a project funded by the Information & Finance Research Advisory Committee.
  4. WSDOT is a member of the TRB subcommittee on the Transportation Research Thesaurus (TRT), the controlled vocabulary for the transportation community. This helps us continue to develop this resource in a manner that is useful to us. The TRT also helps us connect to transportation information resources throughout the world wide transportation community.
  5. Staff from the Office of Research and Library Services, the Office of Information Technology and the Communications Office are available to assist in the development of strategies for organizing and indexing information resources in order to improve the success of finding information.