The Process of Atomization

of Business Tasks

for Crowdsourcing

Tim Olsen & Erran Carmel

“Nothing is particularly hard if you divide it into small jobs.” – Henry Ford

Atomization of work portends-- not only the disruption of business process outsourcing-- but the disruption of global labor markets. Small, atomized, tasks that can be completed and paid for in seconds are unprecedented in the history of work. In this paper we build on current industry practices to theorize about the process of task atomization.

While similar terms to atomization have been in circulation in recent years, atomization of work is a term that was introduced in industry. Our first encounter was at the Crowdopolis 2012 trade show. Atomization does not have a formal definition, so, at the outset we posit our initial definition:

The separation of labor into its smallest basic elements so that each element can be performed by a human being, in front of a screen, in a short amount of time and be paid for by the employer.

Atomization of work occurs in the crowdsourcing subset that is termed microtasking. The common microtasks are: content creation, content moderation, categorization, product matching, search relevancy, transcription, and translation. Atomization is particularly applicable on microtasking crowdsourcing platforms such as Microtask.com, Amazon Mechanical Turk, Mobenzi, Crowdflower, Samasource.

We introduce a few “use cases” to illustrate. Use Case #1 is the prototypical atomic task on a microtasking site: the worker is requested to make a quick categorization of a photo. The worker performs the task in a few seconds while sitting at home. The worker may be paid 1-5 cents (0.01 USD).

Use Case #2 is a more complex atomization (as described in Malone 2011[1]) in which workers were tasked to write encyclopedia articles using Crowdforge which uses a crowd of workers available via Mechanical Turk. First, workers were asked to write outlines for the articles. Then others were instructed to collect facts for different sections of the outlines, while others used those facts to write paragraphs for each section. Finally the system automatically concatenated the paragraphs according to the original outlines. Note that the skills required here are substantially greater than in Case #1.

Use Case #3 is an enterprise content management solution developed by Top Image Systems for Deutsche Post. Each form was automatically atomized for data capture. The worker evaluated the atomic segment of unstructured handwritten content and then typed it into the system. In this particular case, 36 million forms containing personal data were captured in only 6 weeks.

According to the “Definitive Guide to Microtasking” (2012) written by a consultancy called Daily Crowdsource, in 2011 the number of active crowdworkers was 917,269, while about 6 times that many had experimented with microtasking.

The Microtask Industry

In many ways Amazon was the founder of this industry when it set up Mechanical Turk (mTurk) in 2005 to find duplicates among its web pages describing products. Soon, other small tasks which computers could not perform were listed on the site, such as merging massive product libraries from different vendors. Due to company nuances computer algorithms could not always discern which products were identical and thus “human computation” was necessary.

Since the founding of mTurk, microtasking platforms have sprinted ahead without developing theoretical foundations for work atomization.

Today many organizations are trying to identify business processes or individual tasks amenable to sourcing through internal or external crowds.. Once a specific project has been identified (and assuming the crowdsourcing platform has been identified), the process of atomization of tasks into several smaller sub-tasks begins. The microsourcing platforms are rather vague about how this takes place, but we know that the process is a human intellectual process and not computer-generated. Once the atomization is designed, APIs are built into the platform and other systems. The workers perform the tasks. Once completed, these sub-tasks are then assembled together to form a finished product.

We now describe several platforms in the micro-sourcing industry. It is important to visit these platforms and get a sense of how they describe their process of atomization. The content below is derived from the respective websites of these key microtasking platforms in the 2012-2013 period.

Crowdflower. Tasks can be generated, posted, and aggregated through a Crowdflower API. Clients upload a spreadsheet of data and each row is turned into a unit for a particular task. Data experts of Crowdflower then set up parameters of tasks, i.e. units and gold (quality control) and post the tasks to workers. Tasks are aggregated using CML (Crowdflower Markup Language). CML, like HTML, is made up of a set of helper tags, which makes defining forms to collect information from labor pools quick. Payments of tasks depend on skill and time required, varying from 0.01 USD to 1 USD.

Microtask uses its proprietary software platform to split projects (usually handwriting recognition and translation) into tiny pieces, such as single words or dates; for example, Microtask atomizes a form to make each question of the form a single task and then distributes the atomic tasks to workers. Tiny tasks take only 1 to 2 seconds to complete. Microtask keeps workers focusing on a single screen without having to surf the internet for additional information, leading to limited skill requirement. In terms of quality control, crowd answers are verified as correct by a system of comparison whereby multiple workers solve the same puzzle and the system compares the answers. If there is no consensus, the segment is sent out again to a larger group of workers until a more definitive answer can be found.

CloudFactory. Inspired by Henry Ford’s assembly line of 84 unskilled workers who produced the first automobiles, Cloudfactory created a system which allows organizations to create their own virtual assembly lines for digital production. With a strong social mission, Cloudfactory has developed their own workforce comprised of workers in Nepal and Kenya.

Samasource is also a platform with a social mission. The SamaHub platform provides a project setup interface that allows Samasource to break down large projects into efficient tasks, which are then distributed to workers around the world. Usually an account manager logs into SamaHub, designs the projects, and loads clients’ data (usually a spreadsheet) into SamaHub. Then the atomic tasks are automatically distributed to workers. The process of atomization is not described.

Serv.io According to case studies provided on Servio website, the Serv.io manager divides a big project into several stages (e.g. research, research review, writing and citing, editing, editorial review). Tasks finished in formal stage are checked and processed in latter stages and qualified tasks are combined as a formal solution.

Mobileworks prices each task automatically according to time and skill required. The lowest payment might be several cents while a few tasks have higher rewards of more than 1 dollar. Mobileworks has developed a specialized service which uses “crowd managers” to break down tasks. The (atomized) tasks are then fed into an automated routing system. The system directs those tasks to the workers best qualified to perform them, drawing on information about individuals’ specialties and past experience.

A simple task, such as pricing flight options for an upcoming trip, could be tackled by one person. More complex tasks, such as compiling a report on the state of the tobacco industry, are broken up and handed to several crowdworkers. Producing such a report might mean recruiting several people to scour the Web for the most relevant information sources, several others to summarize the results succinctly, more workers to compile those into a single report, and a final group to proofread the document.

The Process of Task Atomization

In this section we present a framework for understanding how tasks traditionally performed by one individual can be atomized into several tasks to be performed by the crowd. We divide the atomization life cycle into three high level phases (see Figure 1).

Figure 1: The Atomization Life Cycle

Evaluation Phase

Processes, projects, and individual tasks within larger processes are evaluated to assess fit for crowdsourcing. Organizations considering crowdsourcing are doing so as an alternative to either hiring employees or using (more expensive) traditional sourcing and part-time contracting options. During this stage it is essential for managers to determine if the job-to-be-done is a one-time project or a repeatable process.

Popular candidates for microtasking include large data-intensive projects that are highly granular, such as labeling photos, de-duplicating data, and transcribing handwriting. Dozens of companies have crowdsourced and atomized large projects. Doing the same continuously repeating business processes is still unusual. Also still rare are real-time applications, such as on-going mail processing systems. But, we contend that just as large custom projects gave way to high volume processes in the outsourcing industry 20 years ago, cross currents of today’s crowdsourcing industry portend the same transition.

Although a task can be atomized, it may not be valuable to do so. This is largely dependent on two factors: project size, and worker expertise. If the project is small, it may be more time-consuming to atomize it than to have a worker complete it. If worker expertise is high, such as a Portuguese medical language translator, atomization may discourage and frustrate the expert who may need the context of the entire document to produce a correct translation.

Task Design

Iteration is often used for task design. Atomization (decomposition) is difficult to design because the nature of each task differs in subtle ways and because design methods to perform task atomization are not well understood. Therefore, the common approach in industry is to atomize tasks, have workers perform a small set of work on these tasks, see where problems arise, and iteratively improve task design until problems disappear.

In this section we summarize the main task types on microtasking platforms. Our analysis of the work listed on microtask platforms point to two main task types: parallel and sequential workflow tasks.

Parallel Tasks

Parallel tasks comprise the bulk of tasks on microtask platforms. Large projects often take a large corpus of data from a database and divide it by row to create thousands of tasks. Examples include: finding duplicate product descriptions, labeling photos or videos, writing short sentences, performing sentiment analysis. Mechanical Turk was initially created to perform these task types.

Parallel tasks enable work to be done by thousands of workers simultaneously, enabling the huge projects to be completed in a very short duration. Zappos, the U.S.-based shoe website, was able to proofread five million product reviews for spelling and grammar errors in a period of a few weeks using parallel tasks.

The most important design criteria for a parallel task is the ease of verifying work accuracy. There are multiple ways of verifying work such as using gold standards -- known answers which can assess accuracy. Machine learning can be used to assess the probability of worker accuracy based on prior performance on similar tasks.

Sequential Workflow Tasks

Sequential tasks consist of tasks that are performed in a workflow -- much like an assembly line. Complex tasks are performed by using a sequence of tasks arranged in a workflow. This is possible by series of dividing and merging tasks each of which are verified or voted on by other workers, upon which only the best work is merged together in subsequent tasks by other workers. Castingwords.com uses this type of workflow by dividing large audio files into small segments, and then paying workers to verify or improve segments transcribed by other workers. Zappos has employed the ‘find, fix, verify’ method of having separate workers find errors, fix the error, and verify the modification.

Integration and combination of prior tasks is necessary to allow complex work to be performed. If one task in a workflow has poor design, quality in the entire workflow will suffer as a result. Figure 2 summarizes the design principles and nature of microtask types discussed in this section.

Figure 2: Nature and Design Principles of Parallel and Sequential Microtasks

Integration

Crowdsourced work can be directly integrated into existing work systems such as Enterprise Content Management (ECM) systems and Customer Relationship Management (CRM) systems. Vendors provide solutions that allow organizations to link and intertwine their internal processes and workflows directly with work performed by the crowd. This is possible through Application Programming Interfaces (API) developed by the crowd platforms. Vendors have developed middleware systems which allow work to be routed back and forth between internal enterprise systems and the crowd.

Concluding Remarks

Sourcing has not been framed as a factory process, but crowdsourcing now allows this perspective. This is a profound shift for sourcing.

[1] Malone, Thomas W., Robert J. Laubacher, and Tammy Johns. “The Age of Hyperspecialization.” Harvard Business Review 89, no. 7/8 (2011): 56–65.