SURVIVALANALYSISWITHMULTIVARIATE ADAPTIVEREGRESSIONSPLINES ONPRODUCT SALESIN

E-COMMERCE

Dewa Ayu Nyoman O.S., Rokhana D.B., Edy Irwansyah.

Binus University

Jl. KebonJeruk No. 27, KebonJeruk, West Jakarta 11530

Telp.(62-21) 535 0660 Fax. (62-21) 535 0644

ABSTRACT

Indonesiahas increased the numberof onlinebuyersthroughe-commerce, the increase wasfrom5.2millionin 2010, rising to 10.6millionpeoplein 2013. Survivalanalysisapproach MultivariateAdaptiveRegressionSplines(MARS) isusedin this studytoexaminethe characteristics ofproduct salesin e-commerce, researchaproductsalesperiod, and examine thefactorsthat affect a product sales period.This research was conductedat PT. OnlinePertamaby usingthe February productsales of 279data.Most productssoldby type ofproductishomeandappliances,based onthe price islower than IDR 50,000, based onthe status oftimeisthe productswithunlimitedtimestatus. In addition, the fastest sellingperiodis1dayand thelongest period of timeto sell outof a productis39days. By usingCoxPHmodellinganalysisconcludedthatthe longer theperiod ofpublicationwilldecrease thechances ofproduct sold.Throughanalyticalmodelling ofsurvivalwithMARSapproachfactorsaffectingthe salesperiodof a productwith a significance valueof 5%isthenumber ofproduct soldinformationwhich soldmorethan135 pieces, 170pieces, and 196 pieces.

Keywords : E-Commerce, Product Sales Period, Survival Analysis with MARS.

Preface

E-Commerce is amarketing systemwithelectronic media. E-Commerce includedistribution, sales, purchasing, marketingand serviceofaproductthat ismade​​inanelectronicsystemvia the Internet. Nielsenresearch firmstates that70% of e-commerce usersinIndonesiause the internetwiththe purpose ofpurchasingproducts online. E-Marketeer, a company engaged inthe researchrecordthatIndonesiahas increased the numberof online shoppers, from5.2millionpeoplein 2011increased to10.6millionin 2013andspent$ 1.8billion USDcosts arisingin the purchase ofproducts online. Indonesia itselfwasupin 2013has hadan onlinestoreof morethan3,000tomorethan550,000and8.5milliononline sellersof productssold online.

This researchis usedtodetermine theperiod ofthe strategyappropriate publicationinasale ofa product, other than thatthis study aimsto analyze thefactors thataffect theperiod ofthe saleof a productof engagementof e-commerce. Statistical methodsthatcanbe usedissurvival analysisMultivariateAdaptiveRegressionapproachSplines(MARS). Analysis ofsurvivalwithMARSapproachis astatisticaltechniqueusedto determine thevariablesthat affect theoutcomeofanearlyincidentto finishthe scene,in this study, the initial eventin question isthe starttimeof a productbegan tobe publishedandthe finalincidentiswhena product issoldorotherwisenotsold. Insurvival analysismodellingwith theMARSapproach, which isaresponse variablesurvival dataor datarelating to thetime(Kleinbaum, 2011). Studies have usedsurvival analysismethodswith theMARSapproachisMonika(2007). The studyanalyzedthe survivaltimeof patientsaftera heart attackand analyzethe factorsthatinfluence thesurvivaltimeof patients. The advantagesofthis studyisthat researchers cancomparewithmodellingmodellingsurvival analysiswithMARSapproach.

Based on the problems above, this research is implemented. The purposeofthis studywas to determinethe characteristics ofproduct salesin e-commerce, knowingthe analysisperiodsalesof a product, knowingwhat factorsaffect theperiod ofthe saleof a product, as well asdeterminewhichdesktop-based application form.

Theoritical Basis

Survival Modelling

According toCollett(2003), survival analysisis onethat describesa processof analysisrelated totime, beginning with thetimeof originorthe starttimeuntil theoccurrence ofaspecified eventoran end pointor afailureevent. AccordingErnawatiandPurhadi(2012), survival analysisis usedwhenthe response variableisusedin the form ofsurvival data(time).

Cox PH Modelling

Coxproportional hazardmodels(Cox PH) isa verypopularmathematicalmodellingwereusedtoanalyzesurvival data(Kleinbaum andKlein, 2011). Cox PH is asemi-parametric modellingmethodsareusedto estimatethe effect ofpredictorvariablesonsurvival data. Cox PHmodelcan be writtenasfollows(Kleinbaum andKlein, 2011):

Where:

is vectorcontaining theppredictor variables.

= avariablebaselinehazardwhen thehazard modelthat describesallof itspredictor variablesis zero.

is thevector oftheregressionparameters.

CoxPHinequation2.1canproducevarious types ofresiduals, includingresidualsmartingaleanddevianceresiduals.

Martingale Residual

Martingaleresidualvalueisbetween-∞to1. thevalueis negativeoncensoreddata. Martingaleresidualscanbean idea of ​​thedifference inthe observations(Ni(t)) with apredictive valueonthe events(h(t)).Martingaleresidualequationcan bedescribedas follows(Muthmainnah, 2007):

Where :

Martingaleresiduals i at t time

1 , For not censored data

0 , for censored data

Estimatesofthe cumulativebaselinehazardfunctionattimetiobtainedinequation (2.1)

Deviance Residual

According toMonika(2007), the slope ofwhich isgeneratedbymartingaleresidualsare veryhigh. Therefore, toprovethatthe slopedoes notaffectthe results, the residualdevianceis formedwhich is atransformationofthemartingaleresidualvalue ofsymmetric(-∞, ∞).Devianceresidualequationcan bedescribedas follows:

where

= Sign of martingale residual

Multivariate Adaptive Regression Splines

MultivariateAdaptiveRegressionSplines(MARS) is anew flexiblemethodformodellinghighdimensionalregressionwiththe data.MARSis amultivariatenonparametricregressionapproach. MARStechniquebecamepopular because itdoes notspecify thespecifictypesuch asthe relationshipbetween thepredictorvariablesandthe response, such aslinear, quadratic, orcubic(Budiantara, Otok, andSuryadi, 2006).MARSequationcan bedescribedas follows:

Where :

= Mains basis function

= Coeefficients m-basis function

M = Basis function maximum

= The degree ofinteraction

=

= Predictor variables

= variable predictors knot value

Survival Analysis with MARS Approach

MultivariateAdaptiveRegressionSplines(MARS) in generalcan beusedintwotypes ofresponse variables, namelybinaryandcontinuous. According toMonika(2007), theresponse variableMARSmodelling canuse theresiduals of theCoxPHmodel, namely: martingaleresidualsordevianceresiduals. So thesurvivalmodellingapproachcan be interpretedasmodellingMARS(equation 2.4) withthe responsevariableisthe residualresultof theCoxPHmodel.

Result dan Discussion

CharacteristicTimeSalesIndependentandDependentVariables

This analysisis usefultoknow thecharacteristics ofthe independentvariablessuch asminimum,maximumandaveragevalues​​ofthe independentordependentvariable.

Table3.1 VariableTimeBasedSalesStatus

Status / N / Average / Minimum / Maximum
Not Censored / 167 / 21,661 / 2 / 39
Censored / 112 / 9,198 / 1 / 39

Characteristics ofproduct salesare presented inTable3.1. From the table itcan be seen thatforthe status ofthe sale ofthe product is notcensoredaveragetimeof itssale of21,661days. The fastest timefora productin thesalesperiodis2days, and thelongest timeforaproduct is in thesalesperiodis39days. Forproduct statuscensoredaveragesalestimeis9.198days. The fastest timeforaproduct is in thesalesperiodis1day, the longest timeperiodthe product is insalesis39days.

Table 3.2 VariableTimebased on TypeandProductPrice Variable

Type of Product / N / Average / Minimum / Maximum
167
GadgetandElectronic / 31 / 9,944 / 2 / 29
Fashion andBeauty / 42 / 10,704 / 1 / 39
BabyandKids / 38 / 7,967 / 3 / 12
Home Appliances / 27 / 7,066 / 1 / 19
FoodandBeverages / 29 / 11,731 / 1 / 28
Price / N / Average / Minimum / Maximum
167
< Rp 50.000 / 75 / 9,16 / 1 / 39
Rp 50.000 - Rp 100.000 / 53 / 9,415 / 1 / 28
Rp 100.000 - Rp150.000 / 10 / 10,2 / 5 / 14
> Rp 150.000 / 29 / 10,517 / 5 / 23

In Table3.2it can be concludedthatbased on thetype of product, a productwhichhas thefastest sellingtimeiskind ofa homeandappliancesthatduring7.066days, thetime requiredin the sale ofa minimum is 1dayanda maximum is19days. Based onprice ofthe product, the producthas thetimewasthe fastest sellingproductsat a price50,000 is9.16daysduringthetime requiredin the sale ofa minimum of 1dayanda maximum of39days. Based on thestatus ofasale, the product thathave thefastesttimeisa productwithlimitedsalesstatusis8.955daysduringthetime requiredin the sale ofa minimum of 1day and amaximum of16days. Based onthe numberof productssoldinformation of a producttermthathasthe fastest sellingtimeis50 pieces thatduring10,672daysto the time requiredin the sale ofa minimum of 1dayand amaximum of39days.

Cox PH Modelling

Inthis reserachconductedan analysis usingCoxPHmodelto determine thefactors thataffect the timingof salesof a product. To get the best result of modellingCoxPH, stepwiseregressionis a good methodthatdoes notinvolvesignificantvariablesin the model. OutputanalysisstepwiseCoxPHmodellingare presentedin Table3.3

Tabel 3.3 Stepwise Cox PH Modelling

Variables / Coefficients / Hazard Ratio / Z-Value / P-Value
FashionandBeautyProductType(X2) / 0,414 / 1,51 / 2,05 / 0,04
Baby and KidsBeautyProductType(X3) / 0,512 / 1,67 / 2,43 / 0,015
Food and BeveragesProductType(X4) / 0,571 / 1,77 / 2,37 / 0,018
Number of productssoldinformation(X6) / 0,005 / 1 / 3,72 / 0,000
Timestatusof sales (X7) / 2,624 / 13,79 / 10,67 / 0,000

ThroughTable 3.3it can be concludedthatthestepwisemodellinganalysisusingCoxPHvalue ofα=5%, the factors which influence thetiming of salesof a productisa variableof typefashionandbeautyproducts(X2), the typeof babyandkidsproducts(X3), the type offoodproductandbeverages(X4), numberof productssold(X6), andthe status ofthe timing of sales(X7). It can beseen from thevalue of|zvalue|thetype variablefashionandbeautyproducts(X2) 2.05typesof babyandkidsproducts(X3) 2.43typesof foodproductsandbeverages(X4) 2.37the number of productssold(X6) and3.72timessalesstatus(X7) is 10.67greater than(z0.05/2) is 1.96. In addition, look alsop-valueof thevariableis worth lessthanα(0.05).

Modellinganalysis ofsurvivalwithMARSapproach

Modellinganalysis ofsurvivalwithMARSapproachbe done by trialanderrorcombined value ofBF(BF), MinimumObservation(MO), andMaximumInteraction(MI). After that, get the best modeltodetermine thelowestGCVvalue.

Martingaleresidualsas theresponsevariable

Based on the resultsof trialanderrorcombinationsBF=10, MI=1, andMO=2, thenget thebest modelwithGCVvalue=0.502. Here is asurvival analysismodellingapproachisformedMARS.

Where

BF1=+

BF2=+

BF3=+

BF4

BF5=+

PlotswereformedbetweenBFonvariablenumber ofsoldproductinformationis presentedin Figure3.1

Figure 3.1 MARSbasisfunctionsplotmartingaleresidualsonthe independentvariablesInformationnumber of products sold(X6)

The increased riskof a productoccurs whena variablenumber of products sold(X6) between0to 89pieces. Furthermore, when thenumberof productssoldbetween89to 135pieces, the risk ofproductsoldhas decreased. When thenumberof productssoldis135 to170pieces were sold, the riskincreased. However, when thenumberof productssoldismorethan170piecessold, the risk has decreaseddramatically to196.factremainswere soldrisktoproductssolddecreasedto 300pieces.

Devianceresidualsas theresponsevariable.

Based on the resultsof trialanderrorcombinationsBF=10, MI=1, andMO=2, thenget best modelwithGCVvalue=0.502. Here is asurvival analysismodellingapproachisformedMARS.

Where

BF1=

BF2=+

BF3

BF4=+

Plotswereformedbetweenbasis functiononvariablenumber ofsoldproductinformationis presentedin Figure3.2

Figure 3.2 MARSbasisfunctionsplotmartingaleresidualsonthe independentvariablesInformationnumber of products sold(X6)

FromFigure 3.2it can be concludedthe increased riskof a productoccurs whena variablenumber of products sold(X6) at between0to 89pieces. Furthermore, when thenumberof productssoldbetween89to 122pieces, the risk ofproductsoldhas decreased. When thenumberof productswere soldis122to 170pieceswere soldback, the riskincreased. However, when thenumberof productssoldismorethan170piecessold, the risk decreased. In fact,the riskremainssoldproductssolddecreaseduntilreaching300 pieces.

MARSmodellingsignificance testing

MARSmodellingwithmartingaleresidualsas the response variable

IntheMARSmodellinghas beenobtained, the significancetestcoveringthe wholefunction testbaseandtesteachbasis function.

Simultaneoussignificance testing.
Significance testisperformed concurrentlyforbasefunctionsthatarebaseonMARSmodelsusing thefollowing hypotheses:

Inthese testsusedα=0.05andα=0.10. Based onMARSprocessing resultsobtainedvalues​​of Fvalue=2.205. Withα=0.05, the valuev1=5andv2=273values ​​obtained F(5%, 5,273) =2.247. Moreover, withα=0.10, the valuev1=5andv2=273values ​​obtainedF(10%, 5,273) =1.868. BecauseFvalueFTable,thenthe decisionis to rejectH0whichmeans there isat least onebasis functionthatcontainspredictorvariablesaffect theresponse variable.

Testingthe significance ofthe individual.

In testingthe significance ofthe individual, the following hypothesis is used:

Withj=1,2,3,4,5

Table 3.4Testsof significanceMARSwithmartingaleresiduals

Parameters / Coefficients / std. Error / T-value / P-value
BF1 / -0,012 / 0,007 / -1,69 / 0,092
BF2 / 0,044 / 0,015 / 2,759 / 0,006
BF3 / -0,072 / 0,026 / -2,739 / 0,006
BF4 / -0,001 / 0,001 / -1,096 / 0,273
BF5 / 0,04 / 0,02 / 1,987 / 0,047

Inthese testsusedα=0.05andα=0.10. Withα=0.05was obtainedvalue of t=1,968 and=1.650.Since mostvaluebasis functionvalues ​​obtained or , thenrejectH0decidedwhichmeansthat thebasis functionsin the modelBF2,BF3, andBF5MARShas an influence onthe responsevariable.

MARSmodellingwithdevianceresidualsas the response variable

Simultaneoussignificance testing.

Significance testisperformed concurrentlyforbasefunctions-functionsthatarebaseonMARSmodelsusing thefollowing hypotheses:

Inthese testsusedα=0.05andα=0.10. Based on the resultsobtainedMARSprocessingvalueof F=3.35, withα=0.05,valuev1= 5, andv2=273. values ​​obtainedF(5%, 5,273) =2.24. Moreover, withα=0.10valuev1=5,andv2=273values ​​obtainedF(10%, 5,273) =1.86. It can be statedthatFvalue 2.24or F value 1.86. So, the decisionis to rejectH0whichmeans there isat least onebasis functionthatcontainspredictorvariablesaffect theresponse variable.

Testingthe significance ofthe individual.

In testingthe significance ofthe individual, the following hypothesis is used:

The following isthe calculation ofthe outputtableT value:

Table3.5Testsof significanceMARSwithresidualdeviance

Parameters / Coefficients / std.Error / T value / P-value
X7 / 0,247 / 0,132 / 1,873 / 0,062
BF1 / -0,027 / 0,014 / -1,908 / 0,057
BF2 / 0,05 / 0,021 / 2,398 / 0,017
BF3 / -0,038 / 0,013 / -2,881 / 0,004
BF4 / -0,004 / 0,002 / -1,81 / 0,007

Inthese testsusedα=0.05andα=0.10. Withα=0.05was obtainedvalue of t=1,968 and=1,650. Since mostvaluebasis functionvalues ​​obtainedor thenrejectH0and decidedthebasis functionsX7, BF1andBF4inMARSmodelshavean influence onthe responsevariable.

TestingMSEonMARSmodelling

MARSmodellingwithmartingaleresidualsorresidualdevianceis said tobe betterif it has avalue ofMeanSquareError(MSE) is smaller. MSEonMARSmodellingwithresidualmartingaleanddevianceresidualsare presentedinTable 3.6.

Table3.6TestingMSEonMARSmodelling

MARS Modelling / MSE
Residual martingale / 0,0025
Residual deviance / 0,0039

Inthis test, theMARSmodellingmartingaleresidualvaluelowervariancewhen compared to theMARSmodellingtheresidualdeviance. Therefore, it can be concludedthat theMARSmodellingwithmartingaleresidualsbetterthan theMARSmodellingtheresidualdeviance.

User Interface

Inthis research, modellingsurvivalwithMARSapproachassistedwithapplications thatcanfacilitate users. InterfacewhichformedaMARSmodellingcoefficientswiththe lowestGCVvaluedetailandvisuallyplotMARSmodellingusingmartingaleresiduals. Display screenapplications canbe seenin Figure3.3and3.4.

Figure 3.3MARS Modelling Interface

Figure 3.4MARS plot interface

Conclusion

Based on the analysisanddiscussion, it canbe concludedas follows:

  1. During the period1February 2014, most productsaresoldbased onitstypeof homeandappliances(31%), based on theprice ofa productat a price50,000 (42%), based on theinformation on the numberof productssoldisa productthat issold50 pieces(65%), based on thestatus ofasaleisthe saleof productswithunlimitedstatus(54%). Period ofthe fastest sellingtimeis1dayandthe longestwas 39days.
  2. Seenfromthe results ofthe stepwiseCoxPHmodelwithα= 0.05 level, the variablesthat significantlyaffectthe timing of sales, among others; type offashionandbeautyproducts(X2), the typeof babyandkidsproducts(X3), the typeof foodproductsandbeverages(X4), numberof productssold(X6) andthe status ofthe salestime(X7).
  3. The bestMARSmodelto determine thefactorsthat affectthe timing of salesof a productis theMARSmodellingwithmartingaleresidualsasthe responsedue tohavinga smallerMSEthan theresidualdeviance. MARSmodellingwithmartingaleresidualshaveGCVminimumresponseis 0.502witha combination of BF=10, MI=1, andMO=2with1variablesthat contribute. Variablessignificanteffect onα=0.05isBF2=(X6-135)+ , BF3 = (X6-170)+, and BF5=(X6196)+.

References

[1].Armesh, H., Salarzehi, H., Yaghoobi, N.M., Heydari, A., dan Nikbin, D. (2010).Impact of Online/Internet Marketing on Computer Industry in Malaysia in Enhancing Consumer Experience.International Journal of Marketing.2(2): 75-86.

[2].Budiantara, I.N., Suryadi, F., Otok, B.W., dan Guritno, S. (2006). Pemodelan B-Spline dan MARS Pada Nilai Ujian Masuk terhadap IPK Mahasiswa Jurusan Disain Komunikasi Visual UK. Jurnal Teknik Industri. 8 (1).

[3]Collett, D. (2003). Modelling Survival Data in Medical Research. London: Chapman & Hall/CRC.

[4].Ernawatiningsih, N.P.L dan Purhadi(2012). Analisis Survival dengan Model Regresi Cox Study Kasus: Pasien Demam Berdarah Dengue di Rumah Sakit Haji Surabaya. Jurnal Matematika. 2(2). ISSN : 1693-1394.

[5].Kleinbaum, G. D., & Klein, M.(2011). Survival Analysis.(3rd edition).New York : Springer Science + Business Media.

[6].Monika, K. (2007). Survival Analysis With Multivariate Adaptive Regression Splines. Dissertation is not published.German: Munchen University.

[7].Muthmainnah. (2007). Perbandingan Model Cox Proporttional Hazrd dan Model Parametrik Berdasarkan Analisis Residual: Studi Kasus pada Data Kanker Paru-Paru yang Diperoleh dari Contoh Data pada Software S-Plus 2000 dan Simuasi untuk Distribusi Eksponensial dan Weibull. Jakarta : Fakultas Sains dan Teknologi Universitas Islam Negeri Syarief Hidayatullah.

[8].Nielsen.2013.Indonesian Online User. Obtain on 27-10-2013 from

[9].Price Area.2013. The State of eCommerce Indonesia. Obtain on 27-10-2013 from

Biography

Dewa Ayu N Octalia Stefani born in Jakarta on October, 6th 1990. Writer finished his S1 study at Bina Nusantara University majoring Statistic and Computer Science in 2014. Right now writer works at IBM Indonesia as Human Resources Analyst.