Intelligent Web-search for Job Vacancies.Problem Definition and Bases of Decision

Mikhail Faybisovich

1

Workshop on Computer Science and Information Technologies CSIT’99, Moscow, Russia, 1999

Abstract

The article describes the significance of the information for a job search and the technology that supports this process. The main precondition for the problem definition is individual and iterative search of vacancies relevant to user requirements. To reduce the cyclicity of the process, there has been developed a model of user’s requests and advertisements comparison. The model eliminates the differences between terms used by job seekers and advertisers. It proposes the way to search for advertisements with close but not exactly the same characteristics as those, contained in the request. The article also describes functions of the research version of the software being developed.

Key words: intellectual search, vacancies information, cyclicity, search image, communicability, terms comparison, expert knowledge.

1. Introduction

Almost everyone faces with the problem of job searching. The search results depend on people's awareness regardless whether they look for a job for the fist time or change a job place for any reason. Results and costs of job search as well as the found job itself are mainly determined by the applicant's ability to obtain the information of job vacancies that are most exactly correspond to his (her) requirements on time. Duties, salary, career prospects, social and psychological environment, attendance regime and others can be considered as basic characteristics dependent of individual preferences.

The importance of the up-to-date information of job opportunities constantly increases. Firstly, the number of job characteristics increases as the consequence of the activities complication in majority of professions or as a result of multiform development of business organisation, technologies, equipment and other components of job environment. Secondly, globalisation of economic processes, change of life conditions conduces the development of labour mobility. The range of job alternatives changes constantly. The number and variety of characteristics being analysed (economic, social, cultural etc.) increases too. Thirdly, the information about job opportunities is interesting for expanding range of persons that have active attitude to their life conditions. Such people are trying to estimate their professional positions and prospects using the information of job vacancies regardless of whether or not they need a new job. According to the statements of some personnel management experts [1,2], periodical analysys of the information of job opportunities, supply and demand in various labour market sectors is a requirement for an adequate professional behaviour. That is, exact (full) information in this field is considered to be an important condition for person potential self-realisation. So we may state that the demand for individual service in the job seeking process increases constantly. This demand satisfaction by means of available software is often disordered due to considerable time and material costs. Author could not find any software that would provide automated solution for this problem.

2. Search problem statement


Above the aspects of the demand for information are components of individual job search technologies. Their integrated consideration and systematic comparison allows to form a set of requirements for development of PC means (software) for obtaining the job opportunities information. When expressed in terms of information technologies, these requirements define specific functions of software for search of text documents by content, namely:

1.Wide range of information sources, i.e. the software should operate with accepted forms of advertisements presentation. Traditionally, we can find most advertisements on the pages of special periodicals or special columns of professional, industrial or other media. Development of the Internet induces many publishers to create and develop electronic copies of their periodicals. Besides, many personnel agencies, employment centres and others have their servers and sites in the Web. Usually each publisher establishes his/her own rules of advertisements allocation and design. If the agency handles a great quantity of advertisements, publishers often use classification of duties (for example, “work with PC”), of industries (banking), or personnel status (exclusive vacancies). There are many rules of advertisements arrangement as well as their content structuring and so on.

It is important for the information search that the majority of advertisement sources are visual-oriented. It is a pretty long procedure that requires high-concentrated attention, especially if it is necessary to handle a great number of advertisements.

  1. Actuality of information furnished for users. New job proposals appear every day. The term of their obsolescence can be days, weeks or months (at the very most) depending on the circumstances. Therefore, the time period necessary to collect and present such information cannot exceed a few hours.

Fulfilment of the above mentioned requirement (together with requirement #1) predefines a net search, i.e. the search should be made through Internet servers.

3.General-purpose orientation. It means that developed software should provide everyone with the possibility to form individual requirements for job content and conditions no matter what profession one has or what professional language one uses, so that the search can be executed successfully.

In real life, the user's notion of acceptability of different criteria as well as the choice of the estimation criteria components of each job opportunity depends substantially on the actually available job proposals. Therefore, search of the job vacancies information is generally implemented as iterations. Here emerges the question of providing the process convergence, i.e. successive approximation to such job offers that contain proposals acceptable for the user.

Having analysed the above-discussed requirements, we can evaluate characteristics of the job vacancies information software from the position of user interaction with the process of job proposals information representation. The interaction efficiency, as it was presented in the research [3], can be rated basing on providing consistency of two interaction properties which are fundamentally connected: cyclicity and communicability.

Cyclicity represents resources demand in interaction processes. As it was stated above, the basic component of our processes cyclicity is defined by the properties of search iterations. It is essential to distinguish between a proper cyclicity and an excessive one. The proper cyclicity eliminates discrepancy between user notions and actual job proposals characteristics. The proper cyclicity components increase in considered field of job information space if the user presets search image characteristics too broad or too narrow as opposed to actually acceptable proposals. The example of broad search is a request that contains only minimum salary. User will apparently include additional conditions in the request in response to the search results (if he finds enough acceptable proposals) or reduce the parameter value to bring it closer to the real one. If the search image is preset narrowly, (for example, “database programming in Borland Delphi 3” was chosen), the user has specific result and can wish to know something of programming proposals in other systems of Inprise (Borland) company.

We connect the excessive cyclicity the lack of software capabilities: difficulty in formation of search image, presentation of inadequate information etc. Reduction of excessive cyclicity can be achieved by developing the communicability of interaction, i.e. by communication process multi-variety. In our opinion, the communicability growth is generally achieved by the development of the interaction software. The development of computer technology of job information search in terms of accepted problem statement creates a structure of interaction consisting of two sequential procedures: search image formation and automatic selection of advertisements which are adequate to the search image.

The procedure of search image formation is based on the user interaction with PC by presentation of internal notion of professional requirements as a verbal request. PC component has an auxiliary communicative function that provides user with varying interface possibilities. By initiating user's intellectual activity special means of the procedure can, for example, support selection from multitude of criteria subset significant characteristics, prescription of limiting and variable values of parameters, access to the vocabulary of terms, included into advertisements, input of criterion parameters into request and setting their default values, on-line formation of knowledge and skills description which user wishes to practice etc. The next procedure is executed by the computer, and that's why it is the first-priority object in our development for the excessive cyclicity reduction. Based on this approach, author proposes to use a model of intellectual correspondence of request and advertisement terms for the development and execution of the process. So, the solution for the problem formulated above is connected with enhancement of the information search intellectualisation.

3. Requests and advertisements comparison model

Let’s consider some issues concerning correctness of search image comparison (request content) formulated by user with the vacancy description contained in advertisements. It was noted above that one of the difficulties in selecting the advertisement that correspond to the request is the difference between terms used for requirements description by user and by the employer. We understand the word “term” as one word or phrase, which define the meaning of the job characteristics. So, the above said difficulty is the difficulty of synonym recognition. Search of advertisements with close job characteristics but not similar to the requested ones is the more common problem. The “closeness” concept should be considered from the position of application.

Having considered the above problems, author has deduced that software should contain means for formation and interpretation of expert knowledge of correspondence or degree of correspondence terms for job conditions description. The means based on the model of search images and advertisements collation (recognition) presented below should allow to define identity of term “shovelman” and “digger operator” or recognise that duties of “secretary” and “office manager” are close. We propose to use formal model description as we think that such presentation conduces to adequate consideration and control for engineering, structure and information decisions; it also allows to show accepted assumptions and heuristic rules, which were established for comparison modelling.

Let the request language K be given as a set of structures R on finite terms set  (vocabulary of request terms), i.e. , and advertisements language P given as a set of structures B on terms set  (vocabulary of advertisement terms) – . As a term we understand a word or a phrase in any grammar form. The task of search of acceptable advertisements may be presented as a function, where: IR, where l is an estimation of advertisement correspondence to the request, IR is a set of real numbers. We assume that an advertisement corresponds to the request, if l lcritical i.e. threshold estimation.

Let’s define request as a set-theoretical expression given over a set of request language terms: . Let’s try to present estimation function as an application of function to the results of the estimation of advertisement to each of request terms:

One of the problem solutions is the construction of the function Q by finding an equivalent for each operation on the set of real numbers IR. For example, as follows:

Let’s present advertisements through the set of contained terms. Now, if we can select function T such that

<1>,

then the problem of request and advertisement comparison resolves itself to the comparison between each term of request and each one of advertisement. Having introduced common vocabulary for two languages , we can see that the task of request and advertisement terms comparison is the task of mapping IR. As V vocabulary size is limited, the mapping L may be pre-set as a directed graph with weighted arcs. Their values are assigned on the basis of expert estimations.

Now we can select (by inspection) function T. We can propose the following heuristic procedure:

  1. Let’s define , where .

If we substitute , we have the following results for formula <1>:

<2>.

  1. Let’s define
  2. Let’s consider .

Gmax presents such a way on the graph that is the maximum one among all ways .

Figure 1. Graph sample.

For example on the figure 1 Gmax corresponds path.

  1. Let’s pass from T to T’:

It is proposed to preset , we have:

<3>

We have to note that we cannot consider either vocabulary of the advertisements or one of the search images  as fixed or a priori known. Author considers problems of content reveal of terms included into these vocabularies as well as determination of their expansion rules as forward-looking problems. Some principles of these problems solution have been already stated:

  • Completeness of the advertisement representation. The principle means that terms included into the vocabulary provide the possibility to select all advertisements adequate to the request domain.
  • Advertisements distinguishability. Indexing by set of terms included into the vocabulary allows to distinguish all advertisement adequate to the considered request domain and having different content from the user's point of view.
  • Term's activity. The vocabularies contain only terms, which are actually used for advertisements description.

The above model is considered by author as a basis for further research and experiments. So the following problems are to be analysed:

  • Influence of term use frequency in advertisement on the characteristics of correspondence of advertisement to the request.
  • Problems of synonymy and homonymy.
  • Correlation of importance of various terms appearance in advertisement.
  • Various types of terms relations in advertisement.

4. Content of search software research version

It is proposed to develop the following components of software for vacancies information search as a top-priority task (for research version):

  • Subsystem of advertisements loading from Internet.
  • Subsystem of access to advertisements.
  • Subsystem of request creation.
  • Subsystem of request execution.

Those subsystems selection can be explained by the fact that they form the necessary base for providing the search operating time and so they allow to collect all requisite experimental data (advertisements and user's requests content, search iterations characteristics etc.) to revise and develop requirements for subsystems’ components and functions on the basis of obtained information.

The subsystem of advertisements loading from Internet implements the function of advertisements database formation and updating through loading of advertisements from Internet-servers and their subsequent saving in the internal format of the advertisements database. Since the advertisements loading from Internet is generally time-consuming (up to a few hours providing switched connection), the subsystem of advertisements loading should provide the following:

  • stability to connection break and loading process interruption, i.e. break of connection with Internet must not cause loaded information loss or damage or necessity of its repeated loading.
  • controllability, i.e. possibility to control the loading process by user.
  • adaptation to the characteristics of communication with servers, i.e. possibility to continue loading effectively from other servers in case of troubles with one of the servers or low speed of communication with it.

Information loading is effected simultaneously with system starting (in case if scheduled updating is selected) or in response to user direct command. By default the loading is effected form those servers, for which the time from last updating exceeds the updating period set for the source. The Subsystem asks user to select servers-advertisements sources before loading.

To meet the requirement of adaptation to the speed of communication with the servers, loading is effected from several servers simultaneously (otherwise low speed of loading from one server can prevent loading from others). However, parallel loading from all servers can cause the drop of total loading speed so the following solution is proposed: simultaneous loading is effected from not more that N servers (accordingly expert estimation, optimal value N is within range from 2 to 6), others are in a wait state. Once the loading from one server has been completed, new server loading starts. User can interrupt the loading without loss of loaded information at any moment.

The process of advertisements loading from each server is effected as follows: the loading of all advertisements immediately available on server, then searching and loading of new advertisements. To load the advertisements from a given server, the system uses algorithms and data structures that take into account all special features of advertisement's internal structure. The advertisements' texts are indexed for conversion into the format of the advertisements storage. To compare term in the basis of above models, the Subsystem searches for the selected characteristics of the advertisement in documents as the terms from the vocabulary, used in given advertisement. Created internal image can be conventionally divided into four structural components:

  • Contextual information (advertisement date of issue and receipt, source, etc ).
  • Text.
  • Found characteristics.
  • Terms found in the advertisement.

The subsystem of access to the advertisements provides the input, storage and reading of advertisements loaded from Internet. The necessity of the creation of this Subsystem is connected with the fact that the loading of advertisements from Internet requires considerable time, as it was stated above. It slows on-line requests processing and comparison. So advertisements are stored in memory local for user’s PC, which author has named the advertisements database, to provide on-line interaction. To execute the request, the software operates with the local advertisement copies from the database; it allows to provide high speed and flexibility of the comparison procedure as well as a possibility to update quantities of advertisements in time convenient for user.

The subsystem of request creation interactively supports the process of creating a request, which describes vacancies interesting for the user. The user has a possibility to create a new request “on a blank sheet”, to edit already prepared issue or to create a request by logical operations with existing requests.

Analysis of the representative set of advertisements, conducted for finding the type flags to recognise job proposals content, allows to determine two types of characteristics.