Data Scientist

Prem Timsina

960 E 18th St , Brooklyn, NY, 11230 ¨ (605) 201-8156 ¨

Data Scientist

Exceptional Data Science professional with over five years of experience designing, building, and managing healthcare big data infrastructure project. A published researcher with documented success over multiple projects. Expertise in employing an array of data analytic tools including Python, SAS, R, Spark, Scala, and Hadoop. Fully versed in data analytics, large data sets and data mining.

Areas of Expertise

Research ● Professional Communication & Presentation ●Big Data Analytics● Author

Large Data Sets● Leadership ●Certified Apache Spark Developer ● SAS● Weka ●

RapidMiner ●Python for Machine Learning ● R ● Scala ● Text Analytics ● Data Mining● SQL● Hive

● Amazon EMR and AWS● Hadoop

● Deep understanding of Healthcare data

● Data Product Development

professional experience

Mount Sinai, New York, NY
Data Scientist: Jan 2017- Present

Current Role:

o Design, build, and manage big data infrastructure

o Design, develop, implement and maintain health care data product: - data pipeline (batch and streaming)

o Research, design, implement, and evaluate advanced statistical approaches and machine-learning models

o Perform ad-hoc exploratory statistics and data mining tasks on very large datasets

o Develop data visualization analyzing small and large datasets

o Analyze large datasets (textual, SQL and NoSQL) to provide strategic direction to the company

Remedy Partners, New York, NY
Data Scientist: Oct, 2015-Jan 2017

Project: Predictive Analytics Data Pipeline.

§ Develop multi-project data pipeline

o Design and develop big data infrastructure for data pipeline

o Data pipeline: data extraction, cleaning, machine learning model building, prediction and visualization

o Streaming pipeline: Real time messaging layer to transformation and prediction layer

§ Coded, tested, debugged, implemented and documented apps using Spark, Scala, MongoDB and Kafka

§ Proficient in MLLib, Spark SQL, Spark Streaming, Apache Kafka, Apache Zeppelin, MongoDB, Amazon EMR, and Hadoop

Key Accomplishments:

§ Built predictive data pipeline model with 40% better than existing workflow

§ Replaced existing system with fault tolerant, scalable and real-time predictive system

Project: Rest Service as High End Data Computation Layer

§ Build Spark Job Server for Data Visualization

o Job-server acts as the big data computation layer of D3-Visualization

o Coded, tested, debugged, and implemented data computation layer for high computation data visualization

o The interactive reports provides strategic direction to the higher level management

§ Spark-JobServer

Key Accomplishments:

§ Replace Pentaho based data visualization with D3+ Spark Job server

§ The report is widely adopted by management

Project: Apache Zeppelin and IPython Based Notebook

§ Build apache zeppelin based notebook for internal data visualization

§ Apache Zeppelin, Spark, Hive

Key Accomplishments:

§ Deliver responsive data visualization for higher management

Dakota State University, Madison, SD
Graduate Research Assistant 2011-2015

Dakota State University, Madison, SD

Big Data Teaching Assistant- Summer 2015

Level- Master Class

Project: Data Analytics and Machine Learning for Automated Knowledge Generation.

§ Text Analytics Project

§ Conduct research with the goal to reduce the costs involved with medical knowledge creation.

§ Explore various techniques to resolve highly imbalanced data sets such as SMOTE, undersampling, oversampling and ensemble techniques.

§ Implement multiple feature extraction techniques.

§ Research semi-supervised learning method for mining data with no adequate training samples.

Key Accomplishments:

§ Publication of study in one journal and performed four conference presentations (Information System Frontiers, AMCIS, HICSS).

§ Research study was featured in the Department of Health and Human Services (HHS).

§ Employed Python, Scala and Apache Spark for machine learning

Project: Developing Intelligent Diabetes Management System

§ Lead a team to develop integrated patient and clinician diabetes management system.

§ Design and build big data infrastructure to collect and disseminate information from multiple sources

§ Utilized tools including Python, Groovy, Grails, and Java.

Key Accomplishments:

§ Research resulted in a pair of journal publications. (International Journal of Medical Informatics, Journal of Diabetes Science and Technology)

§ One project paper was the most downloaded paper in the journal (2013-2014), editor has requested to elaborate and enhance the paper.

Project: Advanced Data Analytics for Modeling Time Series Data.

§ Time-series Data Mining

§ Conduct study with the goal to perform automated analysis of patient data and provide intelligent recommendations.

§ Study techniques for mining dirty datasets.

§ Employ Gaussian Regression, Neural Network, SVM, Naïve Bayes, and K-NN algorithm.

§ Utilized algorithm boosting techniques such as ensemble algorithm, bagging and boosting.

§ Use tools such as SAS Enterprise Miner, Rapid Miner, and Weka.

Key Accomplishments:

§ Two journal articles and one conference presentation. (International Journal of Medical Informatics, Journal of Diabetes Science and Technology, AMCIS)

EducatioN And Honors

Dakota State University, Madison SD, Doctor of Science, GPA 4.0/4.0

Dakota State University, Madison SD, Masters of Science, Information System GPA 4.0/4.0

Tribhuvan University Nepal, Bachelor’s in Computer Engineering, 78.7%

Graduate Research Assistant, Dakota State University

Erasmus Mundus Scholar (fully funded), Lumiere University, France

Undergraduate Scholarship (fully funded), Kantipur Engineering College, Nepal

Certifications and Distinctions

Certified Apache Spark Developer, O’Reilly and Databricks, 2015 (Big Data)

Business Analytics Certification, School of Arts and Sciences, Dakota State University

Two Published Articles (International Journal of Medical Informatics, Journal of Diabetes Science and Technology)

One Accepted Article (Information System Frontiers)

Nine Conference Papers (AMCIS—Chicago (2013) and Savannah (2014), HICSS—Big Island Hawaii (2014) and Kauai Hawaii (2015)

Publications

Related to Job Position

Timsina, P., El-Gayar., & Liu, J (2015). Active Learning For Automation of Knowledge Generation: The Case of Rare and Expensive Training Dataset. Americas Conference of Information System, Puerto Rico, Aug 2015 (Accepted)

Timsina, P., El-Gayar., & Liu, J (2015). Text Analytics for Automation of Medical Systematic Review Creation and Update, Information System Frontiers (Accepted)

Timsina, P., El-Gayar., & Liu, J (2015). Leveraging Advanced Analytics Techniques for Medical Systematic Review Update. IEEE: HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, HICSS-48 2015

Timsina, P., El-Gayar, O., & Nawar, N. (2014). Leveraging Advanced Analytics to Generate Dynamic Medical Systematic Reviews. Twentieth Americas Conference on Information Systems, Savannah, 2014.

El-Gayar, O., & Timsina, P. (2014). Opportunities for Business Intelligence and Big Data Analytics In Evidence Based Medicine. Paper presented at the IEEE: HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, HICSS-47 2014 Conference January 6-9, 2014.

El-Gayar, O., Timsina, P., Nawar, N., & Eid, W. (2013). Mobile Applications for Diabetes Self-Management: Status and Potential. Journal of Diabetes Science and Technology, 7(1), 247–262. Retrieved from http://journalofdst.org/January2013/Abstracts/VOL-7-1-REV2-EL-GAYAR-ABSTRACT.pdf

Timsina, P., El-Gayar, O., & Nawar, N. (2014). Information Technology for Evidence Based Medicine: Status and Future Direction. Twentieth Americas Conference on Information Systems, Savannah, 2014.

El-Gayar, O., Timsina, P., Nawar, N., & Eid, W. (2013). A systematic review of IT for diabetes self-management: Are we there yet? International Journal of Medical Informatics.

El-Gayar, O., Timsina, P., & Nawar, N. (2013). AmHealth Architecture for Diabetes Self-management. Americas Conference of Information Systems 2013, Chicago.

Timsina Prem, Teng Fei, Moalla Nejib, & Bouras Abdelaziz. (2010). Classification Framework for Digital Preservation Platforms. Proceedings of the Fourth International Conference on Software, Knowledge, Information Management and Applications (pp. 9-17). Chai Mai: SKIMA Organizing Committee. Retrieved from http://www.camt.cmu.ac.th/skima2010/docs/2010_SKIMA Proceeeding_.pdf