The Challenges of Forecasting Demand for E-Commerce

Walton College of Business, University of Arkansas

Undergraduate Student Name: Arley Bejerano Phone: (479) 268 0361 e-mail: BSBA Supply Chain Management BSBA Economics


ABSTRACT

E-commerce behaves differently compared to the traditional retailing industry. Customers’ online purchase experience requires less seller-buyer interaction. Customer satisfaction depends on factors that are outside the realm of the online selling company. A huge customer base, seasonal changing patterns, lack of historical data for new products, disruption in social and behavioral norms, customer shifting buying criteria, and disruptions in demand patterns from competitors make forecasting a very difficult task. Determining which forecast model to apply is difficult and the measuring of forecast accuracy is not reliable all the time.

E-COMMERCE IN PERSPECTIVE

The e-commerce industry is growing considerably. According to Statista.com, a reliable Internet statistics online company, the Business to Consumer e-commerce sales is expected to grow 15.6% worldwide this year and approximately 13% in 2016. The number of online stores is also growing as improvements in technology change the mobility, data interchange, and accessibility in today’s agile and dynamic business environment. Existing e-tailers[i] like Amazon.com and Alibaba.com[ii] are constantly looking for ways to stay ahead in this race to achieve and maintain competitive advantage. As a result, the level of competitiveness in this industry is rapidly changing. Furthermore, one aspect of great concern throughout the entire e-commerce industry is the challenges posed by uncertainty in demand planning and inventory management. It is true that uncertainty is not a new trend; it has challenged replenishment and sourcing managers for a long time. The e-commerce industry, on the other hand, is fairly new and rapidly growing. The old rules that apply to the traditional retailing practices are not as applicable to the e-tailer business as it could be thought. Therefore, demand is a priority on the agendas of managers at all levels for online sellers. An important question to answer is, “What forecasting model could more accurately tackle the uncertainties in this industry.” The major challenges in forecasting demand for e-commerce include customer base, seasonal changing patterns, lack of historical data for new products, disruption in social and behavioral norms, customer shifting buying criteria, and disruptions in demand patterns from competitors (Forrester Consulting).

The era of digitalization has brought many benefits for buyers and opportunities for sellers. On the seller’s side, these opportunities do not come without hardships and challenges. The e-commerce industry has the advantage of great accessibility, almost like an omnipresence that makes the predictive analytics part of today’s modern business environment practically unbearable. Nonetheless, some companies have managed to come up with very clever strategies to gain market share in this endless pool of potential customers. CRM systems, Search Engine Optimizations, Click and Collect, or even Pay-Per-Click marketing strategies are all part of the predictive analytics activities that translate into competitive advantage and core competencies. Everywhere there is a computer or a smart phone with Internet accessibility; there is the possibility of one or several customers. That’s when CRM’s can play an important role. E-tailers store information on personalized data according to purchasing patterns. It does not matter what computer an online buyer uses; IP’s are not always taken into consideration - at least not as a key factor to locate customers – but rather other queues like customer name/last name, address, including country, credit card information, etc. There is a current trend of telecommunication and marketing companies integrating with CRM’s in efforts to improve customer service and increase data collection capabilities. Avaya IP Office is an example of such a company that has being integrated with CRM’s (avaya.com). Along with personal information, CRM’s keep records of items bought by every new and repeating customer. This is not enough though; customers changing their buying patterns are too unpredictable for a company to collect the information necessary on its customer base, especially on a global scale. Yes, forecasts can be made based on historical data on regular customers based on these patterns, but what about new customers? A time series data forecast model lacks the ability to account for exogenous variables that influence buying decisions in the short run. The answer then, is social networking and media marketing. (DeMers) How many times do we run into a pop-up window while surfing the net that reads, “sign in with Google [or] sign in with Facebook?” What do you think that means? E-tailers, blogs even news websites like Forbes.com, Quora.com, the big Amazon.com, and Alibaba.com, all use this marketing device to access customers’ information. From a demand forecasting standpoint what they are doing is merely filling in those independent variables that are not taken into account for new customers. Information is all around us, and social networking is a paradise for online sellers (Carroll).

ACCURACY IN FORECASTING

Before going farther in this analysis of forecasting demand in e-commerce, we need to acknowledge a simple fact, “forecasts are almost always wrong, if not always” (Production and Inventory Management). This phrase, popular within the econometrician and forecasting expert community, points out the sad but real paradox that surrounds this necessary business practice. Two questions arise then: How wrong is a forecast, and what model minimizes the errors associated with the forecast? The differences in supply chain structure between the traditional retailing business and its younger brother e-commerce make it much more important for managers to predict demand as accurate as ‘humanly’ possible. Customer satisfaction levels in online sales do not depend on the human interaction between a costumer and a sales representative, or even a cashier and the consumer. The buying experience in online sales depends more on other factors, like online product availability and short delivery times, among others. This is truly what customers want in an online purchase experience. All these factors are dependent on accurate forecast of demand.

Forecasting demand and inventory levels accurately is a challenge. Measures of forecast accuracy are as important and as useful as the very forecast. As mentioned before, to determine e-commerce’s future demand, we shouldn’t rely solely on historical data.

A solution to forecasting demand in a more accurate manner would be to integrate different factors from marketing and qualitative forecasting into a multiple regression analysis. By combining research methods, like data mining, as well as by relying on the forecaster’s experience, studying economic parameters at the micro, or short-term forecasts, and the macro, or long-term forecasts, independent variables can be drawn to build a good model. However, there is only so much a forecaster and an expert can get from these marketing strategies about the behavior of an irregular customer. Once again, the customer base is too big and the buying criteria are too spread out across many regions. It is almost impossible to create a model that encapsulates so many independent variables and outcomes. A solution to this problem would be to create subgroups according to specific criteria, but by doing so we run into the problem of multi-collinearity[iii] (Hanke and Wichern, 297). In addition, a multiple regression forecast with the characteristics posed by the conditions in the demand for e-commerce could use ‘dummy variables’ to set the boundaries between qualitative forecast biased and the dependent variable (Hanke and Wichern, 297-300). In the case of using a multiple regression analysis combined with qualitative forecasting techniques, the indicators or dummy variables can nullify coefficients that are not significant to the model (Hanke and Wichern, 293). Nevertheless, as mentioned before, this is very difficult due to the dependency on qualitative methods of determining the regressors or independent variables and the very large customer base.

A solution to some of the problems on forecasting demand for e-commerce mentioned before might be obtained by applying another forecasting model. We have established that time series data is not the best forecasting technique due to the lack of historical data on new customers, the seasonal difference across regions and the extremely large customer base issues. We have also recognized the fact that multiple regression analysis is not a very effective method of forecasting future e-commerce demand due to an unrealistic dependency on qualitative selection of independent variables; we will continue to prove this theory. The forecast model that I offer next is a regression with time series data (Hanke and Wichern, 339-367). This is a combination of both time series and regression analysis. It takes the best of both models to predict future demand and only leaves us with the problem of autocorrelation. Why is this model better than the others? Time series models, including the Holt Winters and the exponentially weighed moving average, do not include the effects of external factors like causal models do. The opposite applies to causal models; they don’t take historical patterns like seasonality, cyclicality, trend, and level very seriously. The challenges of forecasting demand for e-commerce apply alternatively to both time series and causal models. However, if we combined both, we can reduce or pool the risk in a way that minimizes the forecasting error and optimizes measures of accuracy, like mean absolute percentage error [MAPE], absolute percentage error [APE], mean square error [MSE] and the correlation coefficient.[iv] Nevertheless, a model like this, capable of integrating time series data and regression analysis is sadly going to keep a few weaknesses from each model. One of such defects is specifically applicable to the regression analysis element of it, autocorrelation (Hanke and Wichern, 347).

PROBLEMS AND SOLUTIONS TO AUTORRELATION

Autocorrelation brings a series of problems, the first being the omitted variable or model specification error (Hanke and Wichern, 348). The solution to this challenge would be to improve the model specification, or simply find the missing variable. This part is not that simple, because the variable may not be available or it is not quantifiable. We say that it is not quantifiable when drawing assumptions about relevant regressors from a qualitative standpoint. We already stated that quantitative forecasts and independent variable selection is a very hard task when forecasting demand for e-commerce, because of the lack of historical data on new and prospective customers and all the other factors previously mentioned. The same problem applies to the model specification solution for autocorrelation on customers demand for e-commerce products. The second problem with autocorrelation in this model is the regression with differences (Hanke and Wichern, 350). In regression with time series data models, we also have the possibility of running into a very highly auto-correlated data. A solution to this problem would be instead of running a regression in terms of the dependent and the independent variables; we use the differences between the dependent variable at time (t) and itself lagged one time. This solution also requires using the difference between the predictors Yt and Yt-1, Yt-k. – we will see how this is not completely a bad circumstance later on (Hanke and Wichern, 350). The third problem with autocorrelation, or serial correlation, is the possibility of having auto-correlated errors or what is known as generalized differences (Hanke and Wichern, 354). This condition is present on a regression analysis with time series data when, Yt =ß0+ ß1Xt+εt and εt =r εt-1+vt (Hanke and Wichern, 340).

Yt: Actual demand for period t

ß0: intercept coefficient

ß1: slope coefficient

Xt: regress-or, in this case second series or the variable Yt lagged k number of times.

εt: error at time t for big samples or a population

vt: independent error following a standard normal distribution z~N(0,σ2y) (Hanke and Wichern, 340).

In the case that the error term ui follows a normal distribution that is not dependent on Xi, the error term is said to be heteroskedastic. This is, the variance of the conditional distribution is not constant but increases/decreases with every observation Xi. In such circumstance, it becomes more difficult to conduct a test statistic without mathematically manipulating the error term. According to FIGURE 1, the error term is indeed heteroskedastic and will interfere with the Durbin-Watson test statistics. We will see why is this a problem later on when testing for autocorrelation.

FIGURE 1. Conditional distribution of the error term and Heteroskedasticity

The solution for this problem of generalized differences, in the available data for e-commerce demand, is to take the correlation between two consecutive errors into the equation, Y’t= ß0 (1-r)+ ß1 X’t + vt, where r is a binomial, or Bernoulli distribution, depicting the correlation between consecutive errors in e-commerce demand forecast (Hanke and Wichern, 354).

These are the three possible problems with the corresponding solutions for a regression analysis of times series data. All of them are relatively big challenges to the forecasting manager when using this model and the solutions although available, are complicated in nature and sometimes unrealistic. However, when modeling data such as the one available for e-commerce, it might not be our choice but rather a last resort when all else has failed. In the first part of this paper we mentioned the difficulties, or rather the impracticality of using a standard multiple regression analysis on exogenous regressors or independent variables. We have also established that standard models of time series data like moving averages, exponentially weighted moving averages, and even the (standard or additive) Holt-Winters Model[v] are not feasible for forecasting demand for e-commerce due to factors like very large customer base, changing seasonal patterns simultaneously across regions, and rapidly changing customer buying criteria (Hanke and Wichern, 126-136). Therefore, given all these challenges, it is left up to me to prove that the most feasible model is a regression analysis on time series data. For this, an important step is to review the test statistics and check for the degree of autocorrelation in the demand for e-commerce data available, and hope that it passes the Durbin-Watson test[vi] (Hanke and Wichern, 344-347). There is one hiccup in this respect though. I apologize for the suspense up until this moment or hopefully, dear reader, you might have realized by now that we cannot do a regression analysis on time series data if there is no independent variable or exogenous regressors associated or predicting the demand for e-commerce. We have already concluded that the use of qualitative methods to find relevant independent variables or regressors is very difficult or unrealistic. Therefore, at this point, we are going to rule out the third model, regression with time series data using regressors. Not all is gloomy news, though. The description of the problems and solutions to autocorrelation of this last mentioned model has shed light on a model that might be our last hope in finding a solution to the challenges of forecasting demand for e-commerce. This final model is called autoregressive model and is built on the idea that autocorrelation is not too bad after all, and could be used as a predicting factor for this type of data. Therefore, the autocorrelation showed in Figures 2 and 4, will allow us to run an efficient autoregressive model that will predict future demand for e-commerce at the macro level; at least in a more efficient manner compared to the rest of the models explained here.