IIST 433/533: Information Storage and Retrieval

Fall 2013

Dr. Ozlem Uzuner

Email:

Phone: 518-442-4687

Course meets: Mondays 1:15-4:05pm in HS004

Office hours: Mondays 12-1pm, Wednesdays 1-4pm, and by appointment

Office location: 114A Draper Hall

1.  Course Information

1.1.  Course Description

This course provides an introduction to current practices in information retrieval (IR). It is intended to prepare you to understand the underlying theories and algorithms of modern IR systems and to introduce the methodology for the design and evaluation of IR systems. Topics covered include fundamental key concepts in information storage and retrieval, document representation, query language/operation, matching mechanisms and formal retrieval models, output presentation, indexing and searching, user interfaces, and the evaluation of information retrieval system effectiveness. In addition, we will investigate the inner workings of retrieval systems and search engines.

1.2.  Expected Outcomes

Students who successfully complete IIST 433/533 will have gained:

·  Knowledge of the variety and functionality of IR systems, and of the structures and techniques implemented in such systems;

·  Understanding of theories and models of IR, and of the principles of IR system design;

·  Skills in the critical analysis and evaluation of the performance of IR systems, and in the selection and use of systems that contribute effectively and efficiently to the satisfaction of information needs in specific contexts.

1.3.  Prerequisites and Technology Background

There are no formal course requisites. Familiarity with computers and some programming experience are highly desirable, but not necessary.

1.4.  Textbooks

Required: Croft BW, Metzler D, Strohman T. Search Engines: Information Retrieval in Practice. 1st Edition. Boston: Pearson Education, Addison Wesley 2010. ISBN-10: 0136072240, ISBN-13: 978-0136072249.

Optional: Manning CD, Raghavan P, Schutze H. Introduction to Information Retrieval. Cambridge University Press. 2008. ISBN: 0521865719. Available online at: http://nlp.stanford.edu/IR-book/

1.5.  Course Website and Blackboard

Blackboard will be used to provide essential course materials, the most current syllabus, and assignment documents.

2.  Course Requirements

2.1.  Readings

Students are expected to read the assigned materials before each class. For specific readings, please see the semester schedule at the end of this syllabus.

2.2.  Exam

One midterm exam will be given. There is no final exam.

2.3.  Assignments

Homework assignments are given in the form of problem sets. Each problem set will include essay-type questions, questions designed to show understanding of specific concepts that may involve calculations, and hands-on exercises involving existing IR engines. Students should complete each assignment independently and hand-in the work on time.

2.4.  Final Project

The final course project will be an extensive research paper on an area of IR. Oral presentation of research findings will be given to the class.

2.5.  Style Manuals and Guidelines

Reports should be word-processed should be double-spaced. Students are required to cite sources, if any are used in their written reports, according to either the American Psychological Association (APA) or Turabian style manual. Choose only one style manual and use it throughout the report.

American Psychological Association. 2001. Publication manual of the American Psychological Association, 5th Edition. Washington, DC: American Psychological Association.

Turabian, Kate L. 2007. A manual for writers of term papers, theses, and dissertations. 7th Edition. Chicago: University of Chicago Press.

Both style manuals are available in the reference sections of many mainstream bookstores and reserve sections of University Libraries, including the Dewey Library.

3.  Student Performance Evaluation

The assignments, exam, and the project contribute to the final grade as follows:

Assignment / Percentage of Total Grade
Class Participation / 10%
Assignments / 25%
Exam / 30%
Final Project and Presentation / 35%
Total / 100%

Grades are determined on a 100-point scale. An A signifies superior understanding of the course material, B signifies adequate work that meets most requirements of the course, C or lower signifies inadequate work that does not meet the requirements.

Letter Grade / A / A- / B+ / B / B- / C+ / C / C- / D / E
Numeric Range / 95-100 / 90-94 / 85-89 / 80-84 / 75-79 / 70-74 / 65-69 / 60-64 / 50-59 / 0-49

4.  Course Policies

4.1.  Class Attendance and Participation

Students are expected to be prompt and prepared for class as well as participate in the classroom and online discussions. Students are asked to notify the instructor in advance if they cannot attend class, must arrive late or leave early, expect to submit work late, or intend to withdraw from the course.

4.2.  Late Submissions

Late submissions will not be accepted without the express permission of the instructor and students who submit their work late will lose half a letter grade per late day at the discretion of the instructor. A late final project or final exam will be penalized a full letter grade for each day late.

4.3.  Incompletes

A tentative grade of “I” for Incomplete is given only when the student has nearly completed the course but due to circumstances beyond the student’s control cannot complete the course on schedule. The student is responsible for contacting the instructor and requesting an “I” grade in advance of the semester end. The conditions of the “I” grade, including the timeline for the completion of the work will be specified by the instructor. The “I” grade is automatically changed to “E” unless the work is completed as agreed between the student and the instructor.

4.4.  Academic Dishonesty

The instructor has a zero tolerance policy for academic dishonesty, plagiarism, and cheating. Any such activity will be reportedto the Office of Judicial Affairs according to the policies set forth in the current University at Albany Undergraduate Bulletin or University at Albany Graduate Bulletin, whichever is applicable to the student.

4.5.  Students with Disabilities

Reasonable accommodations will be provided for students with documented physical, sensory, systemic, cognitive, learning and psychiatric disabilities. If you believe you have a disability requiring accommodation in this class, please notify the Director of Disabled Student Services (Campus Center 137, 442-5490).

Lecture Topics and Reading Assignments

All reading assignments must be completed prior to the following week’s lectures. The lecture and reading lists are subject to change.

1. Week of August 26 –Topics: Introductions, housekeeping.

2. Week of September 9 –Topics: Architecture of a Search Engine. Reading: CMS Chapter 1&2

3. Week of September 16 –Topics: Crawls and Feeds. Reading: CMS Chapter 3

4. Week of September 23 –Topics: Processing Text. Reading: CMS Chapter 4

5. Week of September 30–Topics: Processing Text. Reading: CMS Chapter 4

6. Week of October 7 – Topics: Ranking with Indexes. Reading: CMS Chapter 5

7. Week of October 21 – Topics: Queries and Interfaces. Reading: CMS Chapter 6

8. Week of October 28 – Midterm. Reading: CMS Chapter 1-6

9. Week of November 4 – Topics: Retrieval Models. Reading: CMS Chapter 7

10. Week of November 11 – Topics: Evaluating Search Engines. Reading: CMS Chapter 8

11. Week of November 18 – TBD

12. Week of November 25 – Topics: Social Search. Reading: CMS Chapter 10

13. Week of December 2 – Presentation of final research projects

14. Week of December 9 – Presentation of final research projects

3